Convert PDF tables to Excel (XLSX)

Client‑side prototype inspired by iLovePDF's PDF→Excel. Includes page range, per‑page preview, optional OCR for scanned PDFs, and a server handoff option for heavy files.

  • Drag & drop or choose files (PDF)
  • All pages or custom ranges (e.g., 1-3,5)
  • Auto table extraction for digital PDFs
  • OCR fallback for scans (slower, experimental)
  • Per‑page thumbnails + quick include/exclude
  • Download multi‑sheet XLSX (one sheet per page)
Drop PDF here or browse
Max ~50MB client‑side (use Server mode for larger)

Why Choose Our PDF to Excel Converter?

Privacy First

Your files never leave your browser in client mode. Complete data privacy guaranteed.

Fast Processing

Optimized algorithms for quick PDF table extraction and conversion to Excel format.

Advanced Features

OCR support, page selection, and multiple output options for maximum flexibility.

Key Features

Smart Table Detection

Automatically identifies and extracts tables from your PDF documents.

OCR Technology

Convert scanned PDFs with optical character recognition for accurate text extraction.

Selective Page Conversion

Choose specific pages or ranges to convert, saving time and resources.

Multiple Output Options

Export as single or multiple sheets, with proper formatting preserved.

Page previews

Toggle which pages to include

How it works (client mode)

  1. Digital PDFs: Uses PDF.js to read text positions (x,y).
  2. Groups text lines by Y, then clusters X‑gaps to estimate table cells.
  3. Merges rows into a simple grid, then exports via SheetJS.
  4. Scans: Renders page to canvas and runs Tesseract OCR → heuristic CSV → Excel.

Note: Perfect table boundaries can be tricky without server‑side ML; this prototype aims for usable outputs quickly.

FAQ

Does this keep formatting?
It preserves table structure where possible, plain text only. Styling, merged cells and formulas are not retained.
What about large or complex PDFs?
Use Server mode (posts to /api/convert) so you can run heavier extraction/ML or commercial engines.
Is OCR accurate?
OCR is experimental here and best for clear, high‑contrast scans. For production, prefer server OCR (e.g., Tesseract+layout, AWS Textract, Azure Form Recognizer).