Scanned PDFs contain images, not real text. OCR reads each page image and extracts the text — making it copyable, searchable, and translatable.
Upload your scanned PDF, select the document language, and download editable text in seconds. Works on any image-based PDF.
Extract Text Now →Free · No signup · Files deleted after 2 hours
Powered by Tesseract OCR combined with ocrmypdf for accurate text detection across 10+ languages. The industry-standard open-source OCR stack trusted by millions.
Output text is fully copyable and editable. Download as a .txt file to open in any editor, or use the searchable PDF to copy directly in your PDF reader.
After OCR, use PDFTash Translate PDF to convert extracted text to any language. The translate tool also auto-runs OCR on scanned PDFs — saving you a step.
These steps can significantly improve accuracy before you upload.
Scan at 300 DPI or higher. Lower resolutions lose detail in small characters and accent marks.
High contrast between text and background gives OCR the clearest signal. Avoid colored paper if possible.
Skewed or rotated scans confuse OCR engines. Keep documents flat on the scanner bed.
Always match the language setting to your document — especially for Bengali, Hindi, Arabic, or other non-Latin scripts.
If you cannot copy text from your PDF, it is almost certainly a scanned PDF — meaning each page is stored as a raster image (a photograph of the page), not as real text data. When you print a document and then scan it, or receive a fax saved as PDF, or photograph pages with your phone, the result is an image-only PDF. You need OCR to extract the text from the image and make it selectable and copyable. PDFTash does this instantly and for free.
OCR (Optical Character Recognition) analyzes each page image and identifies individual characters by their visual shape. It uses pattern-matching and trained machine learning models to recognize letters, numbers, punctuation, and special characters — even across different fonts and languages. PDFTash uses Tesseract OCR (the leading open-source OCR engine) combined with ocrmypdf, which handles page preprocessing like deskewing and image optimization before running OCR. The result is significantly higher accuracy than raw Tesseract alone.
Yes. PDFTash processes every single page of your uploaded PDF. A 50-page scanned document will have all 50 pages OCR'd in sequence, and the extracted text will be merged into a single output — either a TXT file with all text concatenated by page, or a searchable PDF with a text layer added to every page while preserving the original scanned appearance.
OCR accuracy on handwritten text is significantly lower than on printed text. PDFTash uses Tesseract, which is optimized for printed and typed documents. For handwriting: clear, printed block letters may work reasonably well (60-80% accuracy). Cursive handwriting will have very low accuracy. For best results, use OCR on typed, printed, or digitally generated documents rather than handwritten ones.
Free users can upload PDFs up to 10MB. Pro users at $2/month can upload PDFs up to 200MB and get priority processing. Note that scanned PDFs are often large because each page is a high-resolution image — a 10-page scan at 300 DPI can easily be 20-50MB. If your file exceeds the free limit, try compressing it first with PDFTash Compress Scanned PDF, which can reduce scan file sizes by 60-90%.