pdf-ocr-extraction
Here are 15 public repositories matching this topic...
AWS Lambda functions to extract text from various binary formats.
-
Updated
Feb 7, 2018 - Python
A powerful and user-friendly tool based on OCRmyPDF, offering a seamless GUI for conversion of image-based PDFs into searchable text.
-
Updated
Oct 28, 2023 - Python
Build a RAG preprocessing pipeline
-
Updated
Apr 7, 2024 - Jupyter Notebook
Recognize page content of a PDF as text using Tesseract and Ghostscript.
-
Updated
Jan 9, 2018 - C#
A professional, modular, and open-source Python command-line tool to extract data from PDFs — including plain text, tables, images, and OCR content — using best-in-class libraries like PyMuPDF, pdfplumber, and pytesseract.
-
Updated
May 10, 2025 - Python
A tool for compare, merge, display difference and make OCR between the PDFs.
-
Updated
Jan 21, 2024 - Python
PDFxtract is a modern web application built with Next.js that allows users to upload PDF files and automatically convert each page into JPG images for easy preview and download.
-
Updated
Jun 22, 2025 - TypeScript
PDF OCR service in docker
-
Updated
Oct 8, 2022 - Java
Utility with collect in one place, some operations that are normally done on PDF files.
-
Updated
Apr 5, 2026 - C#
This is an image/pdf OCR reader. Use it to extract text from either and image or PDF file, this project uses Tesseractjs & PDF-Parser to do OCR.
-
Updated
Jun 2, 2025 - TypeScript
Example Django-Python project which contains OCR, PDF to OCR PDF, Text Similarity/Dissimilarity, PDF to PNG converter modules.
-
Updated
May 21, 2019 - Python
PDFScalpel is a forensic PDF analysis and CTF toolkit for security researchers, digital forensics analysts, and penetration testers, providing deep insight into PDF structure, encryption, malware, steganography, metadata, revisions, and document authenticity.
-
Updated
Feb 3, 2026 - Python
Use Optical Character Recognition technology to convert scanned PDFs into TXT files locally.
-
Updated
Jan 22, 2025 - Python
A smooth flow project that transform document data(e.g.pdf) to an interactive learning experience.
-
Updated
Oct 31, 2025 - Python
Competitive exam question extraction pipeline using llm free-teirs
-
Updated
May 21, 2026 - TypeScript
Improve this page
Add a description, image, and links to the pdf-ocr-extraction topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the pdf-ocr-extraction topic, visit your repo's landing page and select "manage topics."