Projects with this topic
Sort by:
-
The Prototype Document Extractor is a lightweight, containerized service designed to extract structured content from PDF files using the Unstructured IO library. It exposes a minimal HTTP API that allows users to submit PDFs and receive parsed content in JSON format. This project includes:
A backend service that handles PDF parsing using Unstructured IO. A Python client library for programmatically interacting with the API from within your code. Docker configurations to run the service in a portable, reproducible environment.Updated