Skip to content

Commit 40ac565

Browse files
authored
Merge pull request #4049 from Blargian/backup_restructure
Improvement: restructure backups pages
2 parents a214d55 + 27e0df3 commit 40ac565

File tree

13 files changed

+1076
-12
lines changed

13 files changed

+1076
-12
lines changed

docs/operations_/backup_restore/00_overview.md

Lines changed: 305 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 337 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,337 @@
1+
---
2+
description: 'Details backup/restore to or from a local disk'
3+
sidebar_label: 'Local disk / S3 disk'
4+
slug: /operations/backup/disk
5+
title: 'Backup and Restore in ClickHouse'
6+
doc_type: 'guide'
7+
---
8+
9+
import GenericSettings from '@site/docs/operations_/backup_restore/_snippets/_generic_settings.md';
10+
import S3Settings from '@site/docs/operations_/backup_restore/_snippets/_s3_settings.md';
11+
import ExampleSetup from '@site/docs/operations_/backup_restore/_snippets/_example_setup.md';
12+
import Syntax from '@site/docs/operations_/backup_restore/_snippets/_syntax.md';
13+
14+
# BACKUP / RESTORE to disk {#backup-to-a-local-disk}
15+
16+
## Syntax {#syntax}
17+
18+
<Syntax/>
19+
20+
## Configure backup destinations for disk {#configure-backup-destinations-for-disk}
21+
22+
### Configure a backup destination for local disk {#configure-a-backup-destination}
23+
24+
In the examples below you will see the backup destination specified as `Disk('backups', '1.zip')`.
25+
To use the `Disk` backup engine it is necessary to first add a file specifying
26+
the backup destination at the path below:
27+
28+
```text
29+
/etc/clickhouse-server/config.d/backup_disk.xml
30+
```
31+
32+
For example, the configuration below defines a disk named `backups` and then adds that disk to
33+
the **allowed_disk** list of **backups**:
34+
35+
```xml
36+
<clickhouse>
37+
<storage_configuration>
38+
<disks>
39+
<!--highlight-next-line -->
40+
<backups>
41+
<type>local</type>
42+
<path>/backups/</path>
43+
</backups>
44+
</disks>
45+
</storage_configuration>
46+
<!--highlight-start -->
47+
<backups>
48+
<allowed_disk>backups</allowed_disk>
49+
<allowed_path>/backups/</allowed_path>
50+
</backups>
51+
<!--highlight-end -->
52+
</clickhouse>
53+
```
54+
55+
### Configure a backup destination for S3 disk {#backuprestore-using-an-s3-disk}
56+
57+
It is also possible to `BACKUP`/`RESTORE` to S3 by configuring an S3 disk in the
58+
ClickHouse storage configuration. Configure the disk like this by adding a file to
59+
`/etc/clickhouse-server/config.d` as was done above for the local disk.
60+
61+
```xml
62+
<clickhouse>
63+
<storage_configuration>
64+
<disks>
65+
<s3_plain>
66+
<type>s3_plain</type>
67+
<endpoint></endpoint>
68+
<access_key_id></access_key_id>
69+
<secret_access_key></secret_access_key>
70+
</s3_plain>
71+
</disks>
72+
<policies>
73+
<s3>
74+
<volumes>
75+
<main>
76+
<disk>s3_plain</disk>
77+
</main>
78+
</volumes>
79+
</s3>
80+
</policies>
81+
</storage_configuration>
82+
83+
<backups>
84+
<allowed_disk>s3_plain</allowed_disk>
85+
</backups>
86+
</clickhouse>
87+
```
88+
89+
`BACKUP`/`RESTORE` for S3 disk is done in the same way as for local disk:
90+
91+
```sql
92+
BACKUP TABLE data TO Disk('s3_plain', 'cloud_backup');
93+
RESTORE TABLE data AS data_restored FROM Disk('s3_plain', 'cloud_backup');
94+
```
95+
96+
:::note
97+
- This disk should not be used for `MergeTree` itself, only for `BACKUP`/`RESTORE`
98+
- If your tables are backed by S3 storage and the types of the disks are different,
99+
it doesn't use `CopyObject` calls to copy parts to the destination bucket, instead,
100+
it downloads and uploads them, which is very inefficient. In this case prefer using
101+
the `BACKUP ... TO S3(<endpoint>)` syntax for this use-case.
102+
:::
103+
104+
## Usage examples of backup/restore to local disk {#usage-examples}
105+
106+
### Backup and restore a table {#backup-and-restore-a-table}
107+
108+
<ExampleSetup/>
109+
110+
To backup the table you can run:
111+
112+
```sql title="Query"
113+
BACKUP TABLE test_db.test_table TO Disk('backups', '1.zip')
114+
```
115+
116+
```response title="Response"
117+
┌─id───────────────────────────────────┬─status─────────┐
118+
1. │ 065a8baf-9db7-4393-9c3f-ba04d1e76bcd │ BACKUP_CREATED │
119+
└──────────────────────────────────────┴────────────────┘
120+
```
121+
122+
The table can be restored from the backup using the following command if the table is empty:
123+
124+
```sql title="Query"
125+
RESTORE TABLE test_db.test_table FROM Disk('backups', '1.zip')
126+
```
127+
128+
```response title="Response"
129+
┌─id───────────────────────────────────┬─status───┐
130+
1. │ f29c753f-a7f2-4118-898e-0e4600cd2797 │ RESTORED │
131+
└──────────────────────────────────────┴──────────┘
132+
```
133+
134+
:::note
135+
The above `RESTORE` would fail if the table `test.table` contains data.
136+
The setting `allow_non_empty_tables=true` allows `RESTORE TABLE` to insert data
137+
into non-empty tables. This will mix earlier data in the table with the data extracted from the backup.
138+
This setting can therefore cause data duplication in the table, and should be used with caution.
139+
:::
140+
141+
To restore the table with data already in it, run:
142+
143+
```sql
144+
RESTORE TABLE test_db.table_table FROM Disk('backups', '1.zip')
145+
SETTINGS allow_non_empty_tables=true
146+
```
147+
148+
Tables can be restored, or backed up, with new names:
149+
150+
```sql
151+
RESTORE TABLE test_db.table_table AS test_db.test_table_renamed FROM Disk('backups', '1.zip')
152+
```
153+
154+
The backup archive for this backup has the following structure:
155+
156+
```text
157+
├── .backup
158+
└── metadata
159+
└── test_db
160+
└── test_table.sql
161+
```
162+
163+
<!-- TO DO:
164+
Explanation here about the backup format. See Issue 24a
165+
https://github.com/ClickHouse/clickhouse-docs/issues/3968
166+
-->
167+
168+
Formats other than zip can be used. See ["Backups as tar archives"](#backups-as-tar-archives)
169+
below for further details.
170+
171+
### Incremental backups to disk {#incremental-backups}
172+
173+
A base backup in ClickHouse is the initial, full backup from which the following
174+
incremental backups are created. Incremental backups only store the changes
175+
made since the base backup, so the base backup must be kept available to
176+
restore from any incremental backup. The base backup destination can be set with setting
177+
`base_backup`.
178+
179+
:::note
180+
Incremental backups depend on the base backup. The base backup must be kept available
181+
to be able to restore from an incremental backup.
182+
:::
183+
184+
To make an incremental backup of a table, first make a base backup:
185+
186+
```sql
187+
BACKUP TABLE test_db.test_table TO Disk('backups', 'd.zip')
188+
```
189+
190+
```sql
191+
BACKUP TABLE test_db.test_table TO Disk('backups', 'incremental-a.zip')
192+
SETTINGS base_backup = Disk('backups', 'd.zip')
193+
```
194+
195+
All data from the incremental backup and the base backup can be restored into a
196+
new table `test_db.test_table2` with command:
197+
198+
```sql
199+
RESTORE TABLE test_db.test_table AS test_db.test_table2
200+
FROM Disk('backups', 'incremental-a.zip');
201+
```
202+
203+
### Securing a backup {#assign-a-password-to-the-backup}
204+
205+
Backups written to disk can have a password applied to the file.
206+
The password can be specified using the `password` setting:
207+
208+
```sql
209+
BACKUP TABLE test_db.test_table
210+
TO Disk('backups', 'password-protected.zip')
211+
SETTINGS password='qwerty'
212+
```
213+
214+
To restore a password-protected backup, the password must again
215+
be specified using the `password` setting:
216+
217+
```sql
218+
RESTORE TABLE test_db.test_table
219+
FROM Disk('backups', 'password-protected.zip')
220+
SETTINGS password='qwerty'
221+
```
222+
223+
### Backups as tar archives {#backups-as-tar-archives}
224+
225+
Backups can be stored not only as zip archives, but also as tar archives.
226+
The functionality is the same as for zip, except that password protection is not
227+
supported for tar archives. Additionally, tar archives support a variety of
228+
compression methods.
229+
230+
To make a backup of a table as a tar:
231+
232+
```sql
233+
BACKUP TABLE test_db.test_table TO Disk('backups', '1.tar')
234+
```
235+
236+
to restore from a tar archive:
237+
238+
```sql
239+
RESTORE TABLE test_db.test_table FROM Disk('backups', '1.tar')
240+
```
241+
242+
To change the compression method, the correct file suffix should be appended to
243+
the backup name. For example, to compress the tar archive using gzip run:
244+
245+
```sql
246+
BACKUP TABLE test_db.test_table TO Disk('backups', '1.tar.gz')
247+
```
248+
249+
The supported compression file suffixes are:
250+
- `tar.gz`
251+
- `.tgz`
252+
- `tar.bz2`
253+
- `tar.lzma`
254+
- `.tar.zst`
255+
- `.tzst`
256+
- `.tar.xz`
257+
258+
### Compression settings {#compression-settings}
259+
260+
The compression method and level of compression can be specified using
261+
setting `compression_method` and `compression_level` respectively.
262+
263+
<!-- TO DO:
264+
More information needed on these settings and why you would want to do this
265+
-->
266+
267+
```sql
268+
BACKUP TABLE test_db.test_table
269+
TO Disk('backups', 'filename.zip')
270+
SETTINGS compression_method='lzma', compression_level=3
271+
```
272+
273+
### Restore specific partitions {#restore-specific-partitions}
274+
275+
If specific partitions associated with a table need to be restored, these can be specified.
276+
277+
Let's create a simple partitioned table into four parts, insert some data into it and then
278+
take a backup of only the first and fourth partitions:
279+
280+
<details>
281+
282+
<summary>Setup</summary>
283+
284+
```sql
285+
CREATE IF NOT EXISTS test_db;
286+
287+
-- Create a partitioend table
288+
CREATE TABLE test_db.partitioned (
289+
id UInt32,
290+
data String,
291+
partition_key UInt8
292+
) ENGINE = MergeTree()
293+
PARTITION BY partition_key
294+
ORDER BY id;
295+
296+
INSERT INTO test_db.partitioned VALUES
297+
(1, 'data1', 1),
298+
(2, 'data2', 2),
299+
(3, 'data3', 3),
300+
(4, 'data4', 4);
301+
302+
SELECT count() FROM test_db.partitioned;
303+
304+
SELECT partition_key, count()
305+
FROM test_db.partitioned
306+
GROUP BY partition_key
307+
ORDER BY partition_key;
308+
```
309+
310+
```response
311+
┌─count()─┐
312+
1. │ 4 │
313+
└─────────┘
314+
┌─partition_key─┬─count()─┐
315+
1. │ 1 │ 1 │
316+
2. │ 2 │ 1 │
317+
3. │ 3 │ 1 │
318+
4. │ 4 │ 1 │
319+
└───────────────┴─────────┘
320+
```
321+
322+
</details>
323+
324+
Run the following command to back up partitions 1 and 4:
325+
326+
```sql
327+
BACKUP TABLE test_db.partitioned PARTITIONS '1', '4'
328+
TO Disk('backups', 'partitioned.zip')
329+
```
330+
331+
Run the following command to restore partitions 1 and 4:
332+
333+
```sql
334+
RESTORE TABLE test_db.partitioned PARTITIONS '1', '4'
335+
FROM Disk('backups', 'partitioned.zip')
336+
SETTINGS allow_non_empty_tables=true
337+
```

0 commit comments

Comments
 (0)