pdf-to-json
Here are 40 public repositories matching this topic...
Get your documents ready for gen AI
-
Updated
Jun 17, 2026 - Python
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.
-
Updated
Jun 15, 2026 - HTML
Knowledge Agents and Management in the Cloud
-
Updated
May 18, 2026 - TypeScript
Extract and convert data from any document, images, pdfs, word doc, ppt or URL into multiple formats (Markdown, JSON, CSV, HTML) with intelligent structured data extraction and advanced OCR.
-
Updated
Oct 31, 2025 - Python
PDF Verse is a powerful web based PDF Editor with tools for editing, converting, and manipulating PDFs. Merge, compress, add or remove pages, or extract text using OCR technology. Convert PDF to DOC, Excel, PPT, JPG, PNG, Text and many more format as well and vice versa. PDF Verse also has user-friendly interface and wide range of features as well
-
Updated
Jan 1, 2024 - JavaScript
OCR library to extract text & tables from PDF files and images. Convert any image or PDF to CSV / TXT / JSON / Searchable PDF.
-
Updated
Dec 2, 2022 - Jupyter Notebook
The open-source universal adapter for LLMs. Turn messy real-world data into clean, agent-ready context.
-
Updated
Jun 17, 2026 - Python
PDF extraction that checks its own work. #2 reading order accuracy — zero AI, zero GPU, zero cost.
-
Updated
Jun 10, 2026 - Python
Graphlit Platform
-
Updated
Feb 20, 2024
Docling4j brings the functionalities of Docling in document understanding to Java® projects
-
Updated
Mar 31, 2025 - Java
Sao kê của Mặt Trận Tổ Quốc Việt Nam (MTTQ) về việc hỗ trợ đồng bào sau bão Yagi
-
Updated
Oct 3, 2024 - JavaScript
Python client library for Graphlit Platform
-
Updated
Jun 15, 2026 - Python
Fast pure-Rust PDF extraction library and CLI by Clark Labs Inc. — 10–50x faster than pdfplumber for text, word, table, layout, image, and metadata extraction.
-
Updated
Jun 6, 2026 - Rust
Quick way to convert files (PDF, DOCX, HTML, PPTX, Images) to (MD, JSON, YAML) using Docling and Streamlit
-
Updated
Jul 9, 2025 - Python
Build a RAG preprocessing pipeline
-
Updated
Apr 7, 2024 - Jupyter Notebook
Four formats. One engine. PDF, DOCX, XLSX, HTML → Markdown and typed JSON, 15–40× faster than equivalent-quality OSS. Rust core with strictly-typed Python bindings.
-
Updated
May 26, 2026 - Rust
NodeJS library to convert JSON to PDF or vice versa
-
Updated
Jul 8, 2023 - JavaScript
TypeScript client for Graphlit Platform
-
Updated
Jun 16, 2026 - TypeScript
This project for converting books from PDF to Proper JSON objects by separating title and content. After you take your output, you can insert your JSON file in the database easily.
-
Updated
Jun 2, 2018 - JavaScript
Improve this page
Add a description, image, and links to the pdf-to-json topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the pdf-to-json topic, visit your repo's landing page and select "manage topics."