Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 13 additions & 37 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,50 +13,26 @@ jobs:
matrix:
python-version: ["3.11", "3.12"]

# All steps will run for each Python version inside this container
container:
image: python:${{ matrix.python-version }}-slim

steps:
# Check out the code onto the runner host,
# but it's automatically available inside the container.
- name: Checkout repository
uses: actions/checkout@v4
uses: actions/checkout@v5

- name: Cache pip dependencies
uses: actions/cache@v4
- name: Install uv and set the Python version
uses: astral-sh/setup-uv@v6
with:
path: ~/.cache/pip
key: pip-${{ runner.os }}-${{ matrix.python-version }}-${{ hashFiles('**/pyproject.toml') }}
restore-keys: |
pip-${{ runner.os }}-${{ matrix.python-version }}-
python-version: ${{ matrix.python-version }}

- name: Install Hatch
run: pip install --no-cache-dir hatch
- name: Install dependencies
run: uv sync --locked

- name: Cache Hatch environments
uses: actions/cache@v4
with:
path: ~/.local/share/hatch
key: hatch-env-${{ runner.os }}-${{ matrix.python-version }}-${{ hashFiles('**/pyproject.toml') }}
restore-keys: |
hatch-env-${{ runner.os }}-${{ matrix.python-version }}-
- name: Check formatting with Ruff
run: uv run ruff format --check

# TODO(dwnoble): Fix formatting issues in datacommons-schema and uncomment these
#- name: Lint with Ruff
# run: uv run ruff check

- name: Run tests
run: |
# Enable parallel test execution
hatch test --parallel
env:
# Add environment variables to speed up test execution
PYTHONHASHSEED: 0
PYTHONUNBUFFERED: 1

- name: Upload test results
if: always()
uses: actions/upload-artifact@v4
with:
name: test-results-${{ matrix.python-version }}
path: |
.coverage
htmlcov/
if-no-files-found: error
uv run pytest
19 changes: 9 additions & 10 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,17 +1,16 @@
repos:
- repo: local
hooks:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add ruff formatting to pre push

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

# Formats code with ruff via `hatch fmt`
- id: ruff-format-hatch-settings
name: hatch-ruff
language: system
entry: hatch fmt
types: [python]
verbose: true
# Runs tests before git pushing
- id: run-tests
name: Run Tests
entry: hatch test
name: Run tests
entry: uv run pytest
language: system
pass_filenames: false
stages: [pre-push]
- id: ruff-format
name: ruff-format
entry: uv run ruff format
language: system
types: [python]
args: []
stages: [pre-push]
File renamed without changes.
31 changes: 6 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,18 +17,10 @@ This guide covers setting up a local Data Commons, defining schemas in JSON-LD,
Before you begin, ensure you have the following installed:

- [Python](https://www.python.org/downloads/) 3.11 or higher
- [Hatch](https://hatch.pypa.io/latest/) (Python project manager)
- [uv](https://docs.astral.sh/uv/getting-started/installation/) (Python project manager)
- A Google Cloud Platform (GCP) project with Cloud Spanner enabled
- A Cloud Spanner instance and database (using Google Standard SQL) for storing the knowledge graph

### Installing Hatch

You can install Hatch using pip:

```bash
pip install hatch
```

## Setting Up Data Commons

This section will guide you through setting up Data Commons locally and defining your first custom schema and data.
Expand All @@ -49,33 +41,22 @@ The repository contains three main components:
- `datacommons-db`: The database layer for storing and querying data
- `datacommons-schema`: Schema management and validation tools

#### Create a Hatch environment
#### Create a virtual environment with uv

```bash
hatch env create
uv sync
```

#### Run Tests

Run the test suite to verify your setup:

```bash
hatch test
uv run pytest
```

Tests are also run automatically before pushing changes.

#### Enter the Hatch Shell

Activate the project's environment to run local commands. All subsequent commands should be run inside this shell.

```bash
hatch shell
```

To exit the shell, type `exit` or press `ctrl+d`


#### Configure GCP Spanner Environment Variables

Before starting the server, you need to set up your GCP Spanner environment variables. These are required for the application to connect to your Spanner database. The application will initialize a new database from scratch using these settings:
Expand All @@ -90,10 +71,10 @@ Replace the values with your actual GCP project and Spanner instance details. Yo

#### Start Data Commons:

Activate our hatch environment and run the `datacommons-api` command to start a local development server.
Run the `datacommons-api` command using `uv` to start a local development server.

```bash
datacommons-api
uv run datacommons-api
```

This will start the Data Commons API server on port 5000, ready to receive your schema and data.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@

from fastapi import FastAPI

from datacommons.api.endpoints.routers import node_router
from datacommons_api.endpoints.routers import node_router

# FastAPI initialization
app = FastAPI(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,11 @@
import click
import uvicorn

from datacommons.api.app import app
from datacommons.api.core.config import get_config
from datacommons.api.core.logging import get_logger, setup_logging
from datacommons.db.session import initialize_db
from datacommons_api.app import app
from datacommons_api.core.config import get_config
from datacommons_api.core.logging import get_logger, setup_logging
from datacommons_db.session import initialize_db


setup_logging()
logger = get_logger(__name__)
Expand All @@ -35,7 +36,11 @@ def main(host: str, port: int, *, reload: bool = False):

# Initialize the database
logger.info("Initializing database...")
initialize_db(config.GCP_PROJECT_ID, config.GCP_SPANNER_INSTANCE_ID, config.GCP_SPANNER_DATABASE_NAME)
initialize_db(
config.GCP_PROJECT_ID,
config.GCP_SPANNER_INSTANCE_ID,
config.GCP_SPANNER_DATABASE_NAME,
)
logger.info("Starting API server...")
uvicorn.run(
app,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,16 @@
import os
import sys

from datacommons.api.core.logging import get_logger
from datacommons_api.core.logging import get_logger

logger = get_logger(__name__)

# Required environment variables
REQUIRED_ENV_VARS = ["GCP_PROJECT_ID", "GCP_SPANNER_INSTANCE_ID", "GCP_SPANNER_DATABASE_NAME"]
REQUIRED_ENV_VARS = [
"GCP_PROJECT_ID",
"GCP_SPANNER_INSTANCE_ID",
"GCP_SPANNER_DATABASE_NAME",
]


class Config:
Expand All @@ -45,7 +49,11 @@ class ProductionConfig(Config):


# Configuration dictionary
config = {"development": DevelopmentConfig, "production": ProductionConfig, "default": DevelopmentConfig}
config = {
"development": DevelopmentConfig,
"production": ProductionConfig,
"default": DevelopmentConfig,
}


def validate_config_or_exit(config: Config) -> None:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,12 +23,12 @@
Usage:

```python
from datacommons.api.core.logging import get_logger
from datacommons_api.core.logging import get_logger

logger = get_logger(__name__)

logger.info("Hello, world!")
# Output: "INFO [datacommons.api.core.logging] [2021-01-01 12:00:00+0000] Hello, world!"
# Output: "INFO [datacommons_api.core.logging] [2021-01-01 12:00:00+0000] Hello, world!"
```

"""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,9 @@

from collections.abc import Generator

from datacommons.api.core.config import get_config
from datacommons.api.services.graph_service import GraphService
from datacommons.db.session import get_session
from datacommons_api.core.config import get_config
from datacommons_api.services.graph_service import GraphService
from datacommons_db.session import get_session


def with_graph_service() -> Generator[GraphService, None, None]:
Expand All @@ -28,7 +28,11 @@ def with_graph_service() -> Generator[GraphService, None, None]:
GraphService: A GraphService instance
"""
config = get_config()
db = get_session(config.GCP_PROJECT_ID, config.GCP_SPANNER_INSTANCE_ID, config.GCP_SPANNER_DATABASE_NAME)
db = get_session(
config.GCP_PROJECT_ID,
config.GCP_SPANNER_INSTANCE_ID,
config.GCP_SPANNER_DATABASE_NAME,
)
graph_service = GraphService(db)
try:
yield graph_service
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,12 @@

from fastapi import APIRouter, Depends, Query

from datacommons.api.core.constants import DEFAULT_NODE_FETCH_LIMIT
from datacommons.api.core.logging import get_logger
from datacommons.api.endpoints.dependencies import with_graph_service
from datacommons.api.endpoints.responses import UpdateResponse
from datacommons.api.services.graph_service import GraphService
from datacommons.schema.models.jsonld import JSONLDDocument
from datacommons_api.core.constants import DEFAULT_NODE_FETCH_LIMIT
from datacommons_api.core.logging import get_logger
from datacommons_api.endpoints.dependencies import with_graph_service
from datacommons_api.endpoints.responses import UpdateResponse
from datacommons_api.services.graph_service import GraphService
from datacommons_schema.models.jsonld import JSONLDDocument

logger = get_logger(__name__)

Expand All @@ -32,7 +32,9 @@
@router.get("/nodes/", response_model=JSONLDDocument, response_model_exclude_none=True)
def get_nodes(
limit: int = DEFAULT_NODE_FETCH_LIMIT,
type_filter: Annotated[list[str] | None, Query(alias="type", description="Zero or more types")] = None,
type_filter: Annotated[
list[str] | None, Query(alias="type", description="Zero or more types")
] = None,
graph_service: Annotated[GraphService, Depends(with_graph_service)] = None,
) -> JSONLDDocument:
"""
Expand All @@ -44,12 +46,15 @@ def get_nodes(

@router.post("/nodes/", response_model=UpdateResponse, response_model_exclude_none=True)
def insert_nodes(
jsonld: JSONLDDocument, graph_service: Annotated[GraphService, Depends(with_graph_service)] = None
jsonld: JSONLDDocument,
graph_service: Annotated[GraphService, Depends(with_graph_service)] = None,
) -> UpdateResponse:
"""Insert a JSON-LD document into the database"""
try:
graph_service.insert_graph_nodes(jsonld)
return UpdateResponse(success=True, message="Inserted %d nodes successfully" % len(jsonld.graph))
return UpdateResponse(
success=True, message="Inserted %d nodes successfully" % len(jsonld.graph)
)
except Exception as e:
logger.exception("Error inserting nodes")
return UpdateResponse(success=False, message=str(e))
Loading