Table Transformer is an advanced open-source tool that leverages state-of-the-art OCR and computer vision techniques to extract structured tabular data from images. It is ideal for enhancing LLM preprocessing, powering data analysis pipelines, and automating your data extraction tasks.
- π Automatic Table Detection: Effortlessly detect tables in images.
- π OCR-based Document Processing: Extract text with high accuracy.
- π§ Integrated Models: Seamlessly combine OCR and table detection models.
- πΎ Flexible Export Options: Export data as DataFrame, HTML, CSV, and more.
- PaddleOCR: For text extraction.
- Hugging Face Table Detection: For table structure detection.
- Python 3.8+
- Conda
-
Clone the Repository
Clone the repository to your local machine:
git clone https://github.com/Sudhanshu1304/table-transformer.git cd table-transformer -
Create and Activate Conda Environment
Create a new conda environment and activate it:
conda create --name myenv python=3.12.7 conda activate myenv
-
Install PaddlePaddle
Install PaddlePaddle in the conda environment:
python -m pip install paddlepaddle==3.0.0rc1 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
-
Install PaddleOCR
Install PaddleOCR:
pip install paddleocr
-
Install Additional Dependencies
Install other required packages:
pip install ultralytics pandas pip install streamlit
project/
βββ src/
β βββ streamlit_app.py # Streamlit application
β βββ table_creator/
β β βββ processing.py # Core processing logic
β βββ models/
β β βββ text.py # table detection and text recognition
β
βββ requirements.txt # Dependencies
βββ README.md # Project documentation
βββ .gitignore # Git ignore configuration
Run the Streamlit app to interact with the tool:
streamlit run src/streamlit_app.pyContributions are welcome! Please fork the repository and submit a pull request with your improvements or new features.
This project is licensed under the MIT License.
Stay updated and connect for any queries or contributions:
- GitHub: Sudhanshu1304
- LinkedIn: Sudhanshu Pandey
- Medium: @sudhanshu.dpandey
If you find this tool useful, please consider giving it a β on GitHub. Your support is greatly appreciated!
Happy Extracting!



