How to Extract Tables from PDF to Excel or CSV Free (2026)

📅 June 2026 ⏱ 6 min read 🗂 PDF to Spreadsheet

Tables locked inside PDF files are one of the most frustrating data problems in modern work. You have a financial report with 40 rows of figures, a supplier price list, an invoice summary, or a research dataset — all formatted beautifully inside a PDF — but you need those numbers in Excel to sort, filter, calculate, or chart them. Copy-pasting from a PDF produces garbled columns, missing data, and merged cells that take hours to clean up.

The right tool extracts the table data with its structure intact, delivering a clean spreadsheet you can work with immediately. This guide explains why PDF table extraction is technically hard, when to use manual versus AI extraction, and how to get perfect results with PDFTash.

Why PDF Table Extraction Is Hard

PDF is a presentation format, not a data format. When a table is created in Excel and exported to PDF, the grid structure (columns, rows, cell boundaries) is discarded. What remains in the PDF is a collection of text objects positioned at specific X,Y coordinates on the page. There is no built-in concept of "row 3, column 2" — the PDF simply knows that a number appears at a certain location.

A naive copy-paste or text extraction tool reads these text objects left-to-right, top-to-bottom, producing a jumbled output that mixes multiple columns together. More sophisticated tools use the spatial positions of text objects to infer column boundaries and reconstruct the table structure — but this breaks down when tables have merged cells, irregular column widths, or multi-line cell values.

PDFTash uses an AI model trained on thousands of PDF tables to interpret layout patterns intelligently, handling merged cells, nested tables, spanning headers, and cells with line breaks.

Manual vs AI Extraction: When to Use Each

AI extraction (recommended for most cases): Upload the PDF, select the table pages, and let the AI identify and extract all tables automatically. Best for standard financial reports, invoices, price lists, and data tables with clear column structure.
Manual selection: Draw a box around the specific table you want to extract. Best when the page mixes tables and prose (like a research paper), or when you only need one of several tables on a page, or when the AI misidentifies the boundaries.

Step-by-Step with PDFTash

Go to pdftash.com/pdf-to-csv.
Upload your PDF. Drag and drop or click to browse. Files up to 10 MB are supported on the free plan.
Select pages (optional): If your document has many pages and tables only appear on specific pages, use the page selector to target only those pages. This speeds up processing and reduces noise.
Choose output format: CSV (comma-separated) for maximum compatibility, or Excel (.xlsx) for immediate use with formatting preserved.
Click Extract Tables. PDFTash analyses the page layout, identifies all table regions, reconstructs the row-column structure, and exports each table as a separate sheet or CSV file.
Download your spreadsheet. If multiple tables were found, they are delivered as separate sheets in the Excel file or as individual CSVs in a ZIP archive.

PDFTash preserves column headers and merges as closely as possible to the original PDF layout. Numeric values are exported as numbers (not text), so they are immediately usable in formulas.

Tips for Best Results

Text PDF vs Scanned PDF

The most important factor in extraction quality is whether your PDF is a text PDF (the text is selectable) or a scanned PDF (each page is an image). Text PDFs produce near-perfect extraction results. Scanned PDFs require an OCR step first to convert the image to selectable text before table extraction can work.

To check: try to click and select text in your PDF. If you can highlight individual words, it's a text PDF. If clicking selects the entire page like an image, it's a scanned PDF — run it through PDFTash OCR first, then extract the table from the OCR result.

Clean table borders help

Tables with visible cell borders (grid lines) are extracted more accurately than borderless tables with only whitespace separating columns. If your PDF has a borderless table with irregular spacing, the AI may occasionally misalign a column. Review the preview before downloading.

Multi-page tables

Tables that span multiple pages are detected automatically. PDFTash reassembles them into a single continuous table in the output, rather than splitting them into separate tables per page.

How to Open the CSV in Excel or Google Sheets

In Microsoft Excel:

Open Excel and go to File → Open.
Select the CSV file. The Text Import Wizard opens.
Choose "Delimited" and set the delimiter to "Comma". Click Finish.
Alternatively, go to Data → Get Data → From Text/CSV for a more streamlined import.

In Google Sheets:

Open Google Sheets and go to File → Import.
Upload the CSV file and choose "Comma" as the separator. Click Import Data.

If you downloaded the Excel (.xlsx) format, simply double-click the file and it opens directly in Excel or Google Sheets with no import step.

Frequently Asked Questions

How accurate is PDF table extraction?

For standard text PDFs with clear table borders, PDFTash achieves over 98% structural accuracy — meaning the correct data in the correct cell. Complex tables with merged cells, diagonal headers, or unusual formatting may have minor alignment issues that need manual correction. Scanned PDFs depend on OCR quality first.

Does it work on scanned PDFs (images)?

Not directly. Scanned PDFs contain no text data — each page is an image. You need to run OCR first using PDFTash OCR, which makes the text selectable, and then extract the table from the OCR-processed PDF. PDFTash will prompt you to run OCR if it detects a scanned document.

What if my PDF has multiple tables on one page?

PDFTash detects and extracts all tables on each page independently. Each table is exported as a separate sheet in the Excel output, or as a separate CSV file in the ZIP download. In the preview, each detected table is highlighted with a coloured border so you can verify before downloading.

Is there a free limit on how many tables or pages I can extract?

The free plan supports PDFs up to 10 MB and up to 50 pages per extraction. Most financial reports, invoices, and data exports are well within this limit. For large multi-hundred-page documents, the Pro plan handles files up to 200 MB with no page limit.

What output formats are supported?

PDFTash exports to CSV (.csv) for maximum compatibility and Excel (.xlsx) for direct use in Microsoft Excel or Google Sheets. CSV is recommended if you plan to import the data into a database or analytics tool (SQL, Python, Power BI). Excel is better if you want to work directly with formatting and formulas.

Try it free on PDFTash →

No signup. No watermark. Results in seconds.

Extract Table Free →

RELATED TOOLS

PDF to Excel Free Extract Table from PDF PDF Invoice to Excel OCR PDF Extract Text from PDF