Skip to content

Commit 82c39bb

Browse files
authored
Merge pull request #4627 from Blargian/integrations_improvements_on_train
Improvement: misc. integrations improvements
2 parents 300beec + 74b2a93 commit 82c39bb

File tree

8 files changed

+274
-208
lines changed

8 files changed

+274
-208
lines changed

docs/_snippets/_gather_your_details_http.mdx

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,17 +4,18 @@ import Image from '@theme/IdealImage';
44

55
To connect to ClickHouse with HTTP(S) you need this information:
66

7-
- The HOST and PORT: typically, the port is 8443 when using TLS or 8123 when not using TLS.
7+
| Parameter(s) | Description |
8+
|-------------------------|---------------------------------------------------------------------------------------------------------------|
9+
|`HOST` and `PORT` | Typically, the port is 8443 when using TLS or 8123 when not using TLS. |
10+
|`DATABASE NAME` | Out of the box, there is a database named `default`, use the name of the database that you want to connect to.|
11+
|`USERNAME` and `PASSWORD`| Out of the box, the username is `default`. Use the username appropriate for your use case. |
812

9-
- The DATABASE NAME: out of the box, there is a database named `default`, use the name of the database that you want to connect to.
10-
11-
- The USERNAME and PASSWORD: out of the box, the username is `default`. Use the username appropriate for your use case.
12-
13-
The details for your ClickHouse Cloud service are available in the ClickHouse Cloud console. Select the service that you will connect to and click **Connect**:
13+
The details for your ClickHouse Cloud service are available in the ClickHouse Cloud console.
14+
Select a service and click **Connect**:
1415

1516
<Image img={cloud_connect_button} size="md" alt="ClickHouse Cloud service connect button" border />
1617

17-
Choose **HTTPS**, and the details are available in an example `curl` command.
18+
Choose **HTTPS**. Connection details are displayed in an example `curl` command.
1819

1920
<Image img={connection_details_https} size="md" alt="ClickHouse Cloud HTTPS connection details" border/>
2021

docs/_snippets/_gather_your_details_native.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,14 @@ import Image from '@theme/IdealImage';
44

55
To connect to ClickHouse with native TCP you need this information:
66

7-
- The HOST and PORT: typically, the port is 9440 when using TLS, or 9000 when not using TLS.
8-
9-
- The DATABASE NAME: out of the box there is a database named `default`, use the name of the database that you want to connect to.
10-
11-
- The USERNAME and PASSWORD: out of the box the username is `default`. Use the username appropriate for your use case.
12-
13-
The details for your ClickHouse Cloud service are available in the ClickHouse Cloud console. Select the service that you will connect to and click **Connect**:
7+
| Parameter(s) | Description |
8+
|---------------------------|---------------------------------------------------------------------------------------------------------------|
9+
| `HOST` and `PORT` | Typically, the port is 9440 when using TLS, or 9000 when not using TLS. |
10+
| `DATABASE NAME` | Out of the box there is a database named `default`, use the name of the database that you want to connect to. |
11+
| `USERNAME` and `PASSWORD` | Out of the box the username is `default`. Use the username appropriate for your use case. |
12+
13+
The details for your ClickHouse Cloud service are available in the ClickHouse Cloud console.
14+
Select the service that you will connect to and click **Connect**:
1415

1516
<Image img={cloud_connect_button} size="md" alt="ClickHouse Cloud service connect button" border/>
1617

docs/integrations/data-ingestion/etl-tools/airbyte-and-clickhouse.md

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,9 @@ Please note that the Airbyte source and destination for ClickHouse are currently
3434

3535
<a href="https://www.airbyte.com/" target="_blank">Airbyte</a> is an open-source data integration platform. It allows the creation of <a href="https://airbyte.com/blog/why-the-future-of-etl-is-not-elt-but-el" target="_blank">ELT</a> data pipelines and is shipped with more than 140 out-of-the-box connectors. This step-by-step tutorial shows how to connect Airbyte to ClickHouse as a destination and load a sample dataset.
3636

37-
## 1. Download and run Airbyte {#1-download-and-run-airbyte}
37+
<VerticalStepper headerLevel="h2">
38+
39+
## Download and run Airbyte {#1-download-and-run-airbyte}
3840

3941
1. Airbyte runs on Docker and uses `docker-compose`. Make sure to download and install the latest versions of Docker.
4042

@@ -54,7 +56,7 @@ Please note that the Airbyte source and destination for ClickHouse are currently
5456
Alternatively, you can signup and use <a href="https://docs.airbyte.com/deploying-airbyte/on-cloud" target="_blank">Airbyte Cloud</a>
5557
:::
5658

57-
## 2. Add ClickHouse as a destination {#2-add-clickhouse-as-a-destination}
59+
## Add ClickHouse as a destination {#2-add-clickhouse-as-a-destination}
5860

5961
In this section, we will display how to add a ClickHouse instance as a destination.
6062

@@ -84,7 +86,7 @@ GRANT CREATE ON * TO my_airbyte_user;
8486
```
8587
:::
8688

87-
## 3. Add a dataset as a source {#3-add-a-dataset-as-a-source}
89+
## Add a dataset as a source {#3-add-a-dataset-as-a-source}
8890

8991
The example dataset we will use is the <a href="https://clickhouse.com/docs/getting-started/example-datasets/nyc-taxi/" target="_blank">New York City Taxi Data</a> (on <a href="https://github.com/toddwschneider/nyc-taxi-data" target="_blank">Github</a>). For this tutorial, we will use a subset of this dataset which corresponds to the month of Jan 2022.
9092

@@ -102,7 +104,7 @@ The example dataset we will use is the <a href="https://clickhouse.com/docs/gett
102104

103105
3. Congratulations! You have now added a source file in Airbyte.
104106

105-
## 4. Create a connection and load the dataset into ClickHouse {#4-create-a-connection-and-load-the-dataset-into-clickhouse}
107+
## Create a connection and load the dataset into ClickHouse {#4-create-a-connection-and-load-the-dataset-into-clickhouse}
106108

107109
1. Within Airbyte, select the "Connections" page and add a new connection
108110

@@ -174,3 +176,5 @@ The example dataset we will use is the <a href="https://clickhouse.com/docs/gett
174176
Now that the dataset is loaded on your ClickHouse instance, you can create an new table and use more suitable ClickHouse data types (<a href="https://clickhouse.com/docs/getting-started/example-datasets/nyc-taxi/" target="_blank">more details</a>).
175177
176178
8. Congratulations - you have successfully loaded the NYC taxi data into ClickHouse using Airbyte!
179+
180+
</VerticalStepper>

docs/integrations/data-ingestion/etl-tools/dlt-and-clickhouse.md

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,9 @@ pip install "dlt[clickhouse]"
2424

2525
## Setup guide {#setup-guide}
2626

27-
### 1. Initialize the dlt Project {#1-initialize-the-dlt-project}
27+
<VerticalStepper headerLevel="h3">
28+
29+
### Initialize the dlt Project {#1-initialize-the-dlt-project}
2830

2931
Start by initializing a new `dlt` project as follows:
3032
```bash
@@ -42,7 +44,7 @@ pip install -r requirements.txt
4244

4345
or with `pip install dlt[clickhouse]`, which installs the `dlt` library and the necessary dependencies for working with ClickHouse as a destination.
4446

45-
### 2. Setup ClickHouse Database {#2-setup-clickhouse-database}
47+
### Setup ClickHouse Database {#2-setup-clickhouse-database}
4648

4749
To load data into ClickHouse, you need to create a ClickHouse database. Here's a rough outline of what should you do:
4850

@@ -60,7 +62,7 @@ GRANT SELECT ON INFORMATION_SCHEMA.COLUMNS TO dlt;
6062
GRANT CREATE TEMPORARY TABLE, S3 ON *.* TO dlt;
6163
```
6264

63-
### 3. Add credentials {#3-add-credentials}
65+
### Add credentials {#3-add-credentials}
6466

6567
Next, set up the ClickHouse credentials in the `.dlt/secrets.toml` file as shown below:
6668

@@ -78,8 +80,7 @@ secure = 1 # Set to 1 if using HTTPS, else 0.
7880
dataset_table_separator = "___" # Separator for dataset table names from dataset.
7981
```
8082

81-
:::note
82-
HTTP_PORT
83+
:::note HTTP_PORT
8384
The `http_port` parameter specifies the port number to use when connecting to the ClickHouse server's HTTP interface. This is different from default port 9000, which is used for the native TCP protocol.
8485

8586
You must set `http_port` if you are not using external staging (i.e. you don't set the staging parameter in your pipeline). This is because the built-in ClickHouse local storage staging uses the <a href="https://github.com/ClickHouse/clickhouse-connect">clickhouse content</a> library, which communicates with ClickHouse over HTTP.
@@ -94,6 +95,8 @@ You can pass a database connection string similar to the one used by the `clickh
9495
destination.clickhouse.credentials="clickhouse://dlt:Dlt*12345789234567@localhost:9000/dlt?secure=1"
9596
```
9697

98+
</VerticalStepper>
99+
97100
## Write disposition {#write-disposition}
98101

99102
All [write dispositions](https://dlthub.com/docs/general-usage/incremental-loading#choosing-a-write-disposition)

docs/integrations/data-ingestion/etl-tools/nifi-and-clickhouse.md

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -36,20 +36,23 @@ import CommunityMaintainedBadge from '@theme/badges/CommunityMaintained';
3636

3737
<a href="https://nifi.apache.org/" target="_blank">Apache NiFi</a> is an open-source workflow management software designed to automate data flow between software systems. It allows the creation of ETL data pipelines and is shipped with more than 300 data processors. This step-by-step tutorial shows how to connect Apache NiFi to ClickHouse as both a source and destination, and to load a sample dataset.
3838

39-
## 1. Gather your connection details {#1-gather-your-connection-details}
39+
<VerticalStepper headerLevel="h2">
40+
41+
## Gather your connection details {#1-gather-your-connection-details}
42+
4043
<ConnectionDetails />
4144

42-
## 2. Download and run Apache NiFi {#2-download-and-run-apache-nifi}
45+
## Download and run Apache NiFi {#2-download-and-run-apache-nifi}
4346

44-
1. For a new setup, download the binary from https://nifi.apache.org/download.html and start by running `./bin/nifi.sh start`
47+
For a new setup, download the binary from https://nifi.apache.org/download.html and start by running `./bin/nifi.sh start`
4548

46-
## 3. Download the ClickHouse JDBC driver {#3-download-the-clickhouse-jdbc-driver}
49+
## Download the ClickHouse JDBC driver {#3-download-the-clickhouse-jdbc-driver}
4750

4851
1. Visit the <a href="https://github.com/ClickHouse/clickhouse-java/releases" target="_blank">ClickHouse JDBC driver release page</a> on GitHub and look for the latest JDBC release version
4952
2. In the release version, click on "Show all xx assets" and look for the JAR file containing the keyword "shaded" or "all", for example, `clickhouse-jdbc-0.5.0-all.jar`
5053
3. Place the JAR file in a folder accessible by Apache NiFi and take note of the absolute path
5154

52-
## 4. Add `DBCPConnectionPool` Controller Service and configure its properties {#4-add-dbcpconnectionpool-controller-service-and-configure-its-properties}
55+
## Add `DBCPConnectionPool` Controller Service and configure its properties {#4-add-dbcpconnectionpool-controller-service-and-configure-its-properties}
5356

5457
1. To configure a Controller Service in Apache NiFi, visit the NiFi Flow Configuration page by clicking on the "gear" button
5558

@@ -93,7 +96,7 @@ import CommunityMaintainedBadge from '@theme/badges/CommunityMaintained';
9396

9497
<Image img={nifi08} size="lg" border alt="Controller Services list showing enabled ClickHouse JDBC service" />
9598

96-
## 5. Read from a table using the `ExecuteSQL` processor {#5-read-from-a-table-using-the-executesql-processor}
99+
## Read from a table using the `ExecuteSQL` processor {#5-read-from-a-table-using-the-executesql-processor}
97100

98101
1. Add an ​`​ExecuteSQL` processor, along with the appropriate upstream and downstream processors
99102

@@ -118,7 +121,7 @@ import CommunityMaintainedBadge from '@theme/badges/CommunityMaintained';
118121

119122
<Image img={nifi12} size="lg" border alt="FlowFile content viewer showing query results in formatted view" />
120123

121-
## 6. Write to a table using `MergeRecord` and `PutDatabaseRecord` processor {#6-write-to-a-table-using-mergerecord-and-putdatabaserecord-processor}
124+
## Write to a table using `MergeRecord` and `PutDatabaseRecord` processor {#6-write-to-a-table-using-mergerecord-and-putdatabaserecord-processor}
122125

123126
1. To write multiple rows in a single insert, we first need to merge multiple records into a single record. This can be done using the `MergeRecord` processor
124127

@@ -156,3 +159,5 @@ import CommunityMaintainedBadge from '@theme/badges/CommunityMaintained';
156159
<Image img={nifi15} size="sm" border alt="Query results showing row count in the destination table" />
157160

158161
5. Congratulations - you have successfully loaded your data into ClickHouse using Apache NiFi !
162+
163+
</VerticalStepper>

0 commit comments

Comments
 (0)