Skip to content
/ pinDef Public

pinDef: A Benchmark Dataset for Extracting Pin Definitions from Electronic Component Datasheets

Notifications You must be signed in to change notification settings

tk-king/pinDef

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pinDef: A Benchmark Dataset for Extracting Pin Definitions from Electronic Component Datasheets

This repository contains code and data for the NeurIPS paper "pinDef: A Benchmark Dataset for Extracting Pin Definitions from Electronic Component Datasheets".

This project provides a benchmark dataset and tools for extracting pin definitions from electronic component datasheets, supporting research and development in automated datasheet analysis.

Installation

This project requires Python 3.10.

1. Install requirements

pip install -r requirements.txt

2. Setup environment

  1. Rename the file .env.sample to .env.
  2. Replace the placeholders in .env with your actual API keys and variables.

3. Insert dataset into MongoDB

The dataset introduced in the paper is presented in the file components.json. To use the dataset in this project, insert it into the database by running the script import_components.py:

python import_components.py

4. Download all the pdf-datasheets

To download all the pdf-datasheets, run the file download_datasheets.py:

python download_datasheets.py

Repository Structure

  • components.json: Dataset of sensor components with pin details and datasheet links.
  • src/: Contains all the Python code needed for the experiments as well as the code for the web server for the webFrontend.
  • webFrontend/: Contains tools to collect new components, review already collected components, and a page to perform manual grading.

Experiment from the paper

Result Table (Table 1)

To obtain the results from the tables, execute the three pipelines:

  • proprietary_pipeline.py
  • vision_pipeline.py
  • text_pipeline.py

Each pipeline processes the sensor component datasheets differently, leveraging various models and techniques.

Execution Policy

The execution_policy controls whether a pipeline step should run or use cached results. It has three modes:

  • OVERWRITE: Always run the step and overwrite any cached results.
  • CACHE: Run the step but use cached results if available.
  • CACHE_ONLY: Only use cached results; do not run the step if no cache exists.

Exception Policy

The exception_policy defines how exceptions during step execution are handled:

  • TRY: Attempt to run the step, save exceptions if they occur, and continue.
  • THROW: Raise exceptions immediately and do not save them.
  • IGNORE: Ignore exceptions, do not save them, and return None.

Together, these policies provide flexible and reliable control over the pipeline executions, allowing for customization based on the use case or experimental needs.

Quantitative Results (Figure 4)

To obtain the quantitative results for Figure 4 in the paper, execute the following notebook in the root of the project:

quantitative_analysis.ipynb

Statistics (Figure 2)

To obtain the statistical data presented in the paper, execute the following notebook in the root of the project:

statistics.ipynb

Web Tools

Execution

  1. Start the backend server:
cd webFrontend/src/server/
fastapi dev src/server/main.py
  1. Install the frontend requirements:
cd webFrontend
npm install
  1. Start the frontend:
cd webFrontend
npm run dev

Usage

The frontend offers three functionalities:

Component Collection

Allows collecting components which are then stored in the MongoDB database. The components can then be downloaded using the script export_components.py.

View Components

Allows reviewing all components in the database.

Human Grading

Allows manual grading of pins. The file random_pins.json is read, which serves as the basis for the experiment in section 3.4 of the paper. The file random_pins.json was generated by the script get_random_pins.py. The agreement of the human vs LLM grading can then be evaluated using the script compare_gradings.py.

About

pinDef: A Benchmark Dataset for Extracting Pin Definitions from Electronic Component Datasheets

Resources

Stars

Watchers

Forks