Skip to content

nlesc-sigs/data-sig

Repository files navigation

Efficient Data Handling (EDH) Special Interest Group

Examples of topics of interest

  • Relational Database Management System (RDMS): Software for creating, managing, and interacting with relational databases
  • AI-driven database management: Using AI to automate and optimize the management of databases, enhancing performance, security, and data access
  • Vector Databases for AI Workloads: Databases that store and manage data as high-dimensional vectors, enabling efficient similarity searches, and are useful for AI workloads
  • Information Retrieval: Discovery and extraction of relevant information from vast collections of data in response to a user’s specific query
  • Geographical Information Systems: Computer systems that analyze and display geographically referenced information
  • Cloud-native data formats: Data formats are designed to store and access large datasets directly in the cloud efficiently, e.g. Cloud Optimized GeoTIFFs (COGs) for raster data and Zarr for multi-dimensional arrays.
  • Linked Data / Ontologies: Linked Data is structured data that is interlinked with other data, suitable for semantic queries and automatic retrieval. Ontologies are formal descriptions of data relationships for organizing and linking data effectively.
  • No SQL: Data organization using various flexible data models like key-value pairs, documents, graphs, and wide-column stores
  • Non-relational databases: Data organization using various flexible data models like key-value pairs, documents, graphs, and wide-column stores
  • Handling Sensor Data: Collecting, processing, and analyzing the information generated by sensors that monitor physical conditions or activities
  • Information Integration: Merging of information from heterogeneous sources with differing conceptual, contextual and typographical representations
  • Data Assimilation: Methods that update information from numerical computer models with information from observations
  • Stream Processing: Analyzing and processing large amounts of real-time data as it flows in from various sources

Propose a topic

We welcome topics to be discussed during the SIG. To propose a topic, open an issue with a brief description of the topic and label it topic.

Bring your own data challenges

Do you have a data handling issue? You can bring your issue to the SIG -- we can look at the issue together and do our best to help you find the most suitable solution.

What do you need?

  • Open an issue on github and label it help wanted.
  • Describe the type of issue you want to address. Make sure to include:
    • What is your final goal ?
    • What is the challenge ?
    • A sample of your data (if possible).
    • Which technologies you are using to store and access the data.

We will discuss your issue during the next SIG meeting.

Share your data solutions

Did you do something really cool with your data ? Share your experiences with the SIG! Open an issue and propose it as a topic and label it topic.

Possible things you might like to share:

  • Tools & Methodologies for storage
  • Tools & Methodologies for access
  • Data FAIRness
  • Data handling

Data SIG meetings

Date Topic Presenter
2026-03-12 Using DuckDB to blur the boundary between storage, compute, and the user Suvayu
2026-04-09 Cloud native data formats Francesco
2026-05-07
2026-06-04
2026-07-02
2026-09-24
2026-11-19
2026-12-17

Past meetings:

About

Linked data, data & modeling SIG

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors