Skip to content

Commit d904b9e

Browse files
Address review comments
1 parent 745b7c1 commit d904b9e

File tree

2 files changed

+106
-57
lines changed

2 files changed

+106
-57
lines changed

connectors/refiner/README.md

Lines changed: 32 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,9 @@ Refer to the [Connector SDK Setup Guide](https://fivetran.com/docs/connectors/co
2121
- Processes all paginated data automatically using cursor-based pagination for large datasets
2222
- Implements exponential backoff for API reliability (3 retries with progressive delays)
2323
- Flattens nested JSON structures into table columns automatically
24-
- Checkpoint strategy ensures resumability for large datasets (every 1000 records)
24+
- Checkpoints progress during pagination to ensure resumability for large datasets
2525
- Extracts and normalizes nested arrays (questions, answers) into child tables with foreign keys
26-
- User-level data keyed by user ID for joining with product usage data
26+
- Keys user-level data by user ID for joining with product usage data
2727

2828
## Configuration file
2929
The configuration requires your Refiner API key and optionally a start date for the initial sync.
@@ -49,21 +49,22 @@ Note: The `fivetran_connector_sdk:latest` and `requests:latest` packages are pre
4949
## Authentication
5050
The connector uses Bearer token authentication via the `Authorization` header. To obtain your API key:
5151

52-
1. Log in to your Refiner account.
52+
1. Log in to your [Refiner](https://refiner.io) account.
5353
2. Go to **Settings** > **Integrations** > **API**.
5454
3. Copy your API key.
5555
4. Add the API key to your `configuration.json` file as shown above.
5656

5757
The API key is included in every request as `Authorization: Bearer YOUR_API_KEY`.
5858

5959
## Pagination
60-
The connector handles pagination automatically using the Refiner API's page-based pagination structure. The API supports the following pagination parameters:
61-
- `page` - Current page number (starts at 1)
60+
The connector handles pagination automatically using the Refiner API's cursor-based pagination structure. The API supports the following pagination parameters:
61+
- `page` - Current page number (starts at 1) - used as fallback
6262
- `page_length` - Number of items per page (default: 100)
63-
- `next_page_cursor` - Optional cursor for cursor-based pagination
63+
- `next_page_cursor` - Cursor token for cursor-based pagination
6464

65-
The connector uses page-based pagination with automatic detection of the last page:
66-
- Each sync processes all paginated data completely using the `pagination.current_page` and `pagination.last_page` response fields.
65+
The connector uses cursor-based pagination for optimal performance with large datasets:
66+
- Each sync processes all paginated data completely using the `pagination.next_page_cursor` response field.
67+
- Cursor-based pagination is more efficient than page-based pagination for large datasets and is recommended by the Refiner API documentation.
6768
- Pagination state is not persisted between sync runs for cleaner state management.
6869
- Uses the `date_range_start` parameter to filter responses from the API directly for incremental syncs.
6970

@@ -82,30 +83,33 @@ The connector processes survey and response data with an optimized incremental s
8283
- **respondents** - User/contact information keyed by user ID (parent for responses)
8384

8485
### Incremental sync strategy
85-
- Initial sync uses `start_date` from configuration (if provided) or EPOCH time (1970-01-01T00:00:00Z) as fallback
86-
- Incremental syncs use `last_response_sync` timestamp from state to fetch only new/updated responses since last successful sync
87-
- State tracks separate timestamps for surveys and responses
86+
- **Responses**: Incremental sync using `last_response_sync` timestamp from state to fetch only new/updated responses since last successful sync
87+
- **Surveys and Contacts**: Full sync on every run (the Refiner API does not support date filtering for these endpoints)
88+
- Initial response sync uses `start_date` from configuration (if provided) or EPOCH time (1970-01-01T00:00:00Z) as fallback
8889
- Checkpoint every 1000 records during large response syncs to enable resumability
90+
- Checkpoint after each page for surveys and contacts to preserve progress
8991
- Final checkpoint saves the complete state only after successful sync completion
9092

9193
### Data transformation
9294
- **JSON flattening** - Nested dictionaries converted to underscore-separated columns (e.g., `config.theme.color` becomes `config_theme_color`)
93-
- **Array handling** - Arrays converted to JSON strings when stored in parent tables, or normalized to child tables
94-
- **Child table extraction** - Questions extracted from survey config, answers extracted from response data
95+
- **Array handling** - Arrays converted to JSON strings when stored in parent tables, or normalized to child tables when appropriate
96+
- **Child table extraction** - Questions extracted from survey config (`config.form_elements`) and answers extracted from response data are stored in dedicated child tables to preserve relational structure
97+
- **Smart exclusion** - Relational data like `form_elements` is excluded from the flattened parent table to avoid duplication, as it's already normalized into the questions table
9598
- **Foreign keys** - Child tables maintain relationships via parent primary keys (`survey_uuid`, `response_uuid`)
9699
- **Type safety** - Configuration validation ensures required fields exist before processing
97100

98101
### Key functions
99102
- `validate_configuration()` - Validates required API key configuration
100103
- `make_api_request()` - Centralized API calling with retry logic and error handling
101-
- `flatten_dict()` - Recursive JSON structure flattening for table columns
102-
- `fetch_surveys()` - Main survey sync with pagination and question extraction
103-
- `fetch_questions()` - Extract questions from survey configuration
104-
- `fetch_responses()` - Incremental response sync with date-based filtering
105-
- `fetch_answers()` - Extract answers from response data
104+
- `flatten_dict()` - Recursive JSON structure flattening for table columns with smart exclusion of relational data
105+
- `fetch_surveys()` - Main survey sync with pagination, question extraction, and page-level checkpointing
106+
- `fetch_questions()` - Extract questions from survey configuration into child table
107+
- `fetch_contacts()` - Full contact sync with pagination and page-level checkpointing
108+
- `fetch_responses()` - Incremental response sync with date-based filtering and record-level checkpointing
109+
- `fetch_answers()` - Extract answers from response data into child table
106110
- `fetch_respondent()` - Extract or update respondent information
107111

108-
The connector maintains a clean state with `last_survey_sync` and `last_response_sync` timestamps, automatically advancing after each successful sync to ensure reliable incremental syncs without data duplication or gaps.
112+
The connector maintains a clean state with the `last_response_sync` timestamp for incremental response syncing, automatically advancing after each successful sync to ensure reliable incremental syncs without data duplication or gaps. Surveys and contacts are fully synced on each run.
109113

110114
## Error handling
111115
The connector implements comprehensive error handling with multiple layers of protection:
@@ -122,16 +126,18 @@ The connector implements comprehensive error handling with multiple layers of pr
122126

123127
### Data processing safeguards
124128
- Graceful handling of missing or malformed API response structures
125-
- Safe dictionary access patterns with `.get()` to prevent KeyError exceptions
129+
- Safe dictionary access patterns with `.get()` and type checks to prevent AttributeError and KeyError exceptions
126130
- Skips records missing required identifiers (uuid) with warnings
127-
- Proper exception propagation with descriptive RuntimeError messages
131+
- Error handling for malformed timestamps with warning logs
132+
- Proper exception propagation with descriptive RuntimeError messages from API layer
128133

129134
### Checkpoint recovery
130-
- Checkpoints every 1000 records during large syncs enable recovery from interruptions
135+
- Checkpoints after each page during survey and contact syncs to preserve progress
136+
- Checkpoints every 1000 records during large response syncs enable recovery from interruptions
131137
- State tracking allows sync to resume from the last successful checkpoint
132-
- Final checkpoint only saved after a complete successful sync
138+
- Final checkpoint saved after complete successful sync
133139

134-
All exceptions are caught at the top level in the `update()` function and re-raised as `RuntimeError` with descriptive messages, making troubleshooting easier for users and Fivetran support.
140+
Unhandled exceptions in the `update()` function will propagate and be logged by the Fivetran platform for troubleshooting. The connector's error handling strategy focuses on resilience at the API request level and safe data processing with proper validation.
135141

136142
## Tables created
137143

@@ -174,7 +180,8 @@ The connector creates the following tables in your destination:
174180
**respondents** table:
175181
- User/contact information keyed by user ID for joins with product usage data
176182
- Primary key: `user_id`
177-
- Columns: `email`, `name`, `first_seen_at`, `last_seen_at`, `attributes` (JSON)
183+
- Populated from both response data (via `fetch_respondent()`) and dedicated contacts endpoint (via `fetch_contacts()`)
184+
- Columns: `contact_uuid`, `remote_id`, `email`, `display_name`, `first_seen_at`, `last_seen_at`, `last_form_submission_at`, `last_tracking_event_at`, `attributes` (JSON), `segments` (JSON)
178185

179186
## Additional considerations
180187
The examples provided are intended to help you effectively use Fivetran's Connector SDK. While we've tested the code, Fivetran cannot be held responsible for any unexpected or negative consequences that may arise from using these examples. For inquiries, please reach out to our Support team.

0 commit comments

Comments
 (0)