ClickHouse Backup and Restore on Local Storage or NAS
TOC
OverviewPrerequisitesEnvironment RequirementsAccess RequirementsDirectory RequirementsBackup StrategyProcedureCreate a Full BackupValidate Backup SuccessCheck Backup Task StatusCheck the Backup DirectoryCreate an Incremental BackupArchive the Backup FilesRestoreRestore PrerequisitesRestore ProcedureStop WritesPrepare ClickHouse According to the Failure ScopePrepare Backup Files on Each ClickHouse NodeCheck Cluster ReadinessRestore Tables One by OneCheck Restore Task StatusValidate Restore SuccessValidate Total Row CountValidate Partition-Level DataValidate Replica StatusStart the Related ComponentsRecommendationsOverview
This document describes how to back up and restore ClickHouse tables in the observability database by using local storage or a NAS-mounted directory. This procedure applies to clusters that use the ReplicatedMergeTree table engine.
For ClickHouse, a NAS mount is treated as a local filesystem path. Therefore, you can use the same backup and restore method for both local storage and NAS.
This document provides the following guidance:
- Create a full backup.
- Create an incremental backup based on a full backup.
- Store backup data in a local directory or NAS mount path.
- Restore data from a local or NAS backup.
- Validate backup and restore results.
This document uses the observability.audit table as an example. You can apply the same procedure to other tables.
The following tables are common examples:
audit: stores audit data.event: stores event data.log_kubernetes: stores Kubernetes logs.log_platform: stores platform service logs.log_system: stores node-level system logs.log_workload: stores application and workload logs.
The storage types differ as follows:
- LocalVolume: ClickHouse data is stored in a node-local directory, such as
/cpaas/data/clickhouse/.BACKUP ... TO File(...)writes backup files to the ClickHouse backup directory under this local data directory, such as/cpaas/data/clickhouse/backups. - StorageClass, such as TopoLVM, NFS, or Ceph: ClickHouse data is stored on the corresponding StorageClass volume.
BACKUP ... TO File(...)writes backup files to the ClickHouse backup directory on that volume. - NAS archive path: backup files can be copied from the ClickHouse backup directory to a mounted NAS path or another custom backup directory for long-term retention.
For both LocalVolume and StorageClass deployments, use the actual ClickHouse backup directory as the source path when archiving backup files and as the destination path when copying backup files back for restore.
Prerequisites
Before you start, make sure the following conditions are met.
Environment Requirements
Access Requirements
All SQL statements in this document use the built-in ClickHouse administrator account default.
You can run the SQL statements on any healthy ClickHouse instance. For consistency, this document uses a single ClickHouse Pod as an example.
Before you run SQL statements, connect to the target Pod:
Then connect to ClickHouse in the container:
The default user already has the required privileges for this procedure, including BACKUP, RESTORE, SELECT, and ALTER.
Directory Requirements
Make sure the following conditions are met:
- The ClickHouse process has read and write access to the target directory.
- The target directory has sufficient available capacity.
- If NAS is used, the mount point is restored automatically after Pod or host restarts.
- In Kubernetes, the directory is mounted through PVC, hostPath, or CSI.
Backup Strategy
Use the following backup strategy:
- Run the backup operation only once on any healthy replica.
- Use
BACKUP TABLEto create a consistent snapshot without stopping the service. - Use
base_backupfor file-level deduplication in incremental backups. - Keep the base full backup accessible when you restore an incremental backup.
- Use one full backup per week and one incremental backup per day to balance restore complexity and storage cost.
Procedure
The following examples use an initial full backup and a daily incremental backup.
Create a Full Backup
A full backup creates the baseline for subsequent incremental backups.
Run the following command on any healthy ClickHouse instance:
Notes:
File(...)writes the backup to the ClickHouse backup directory on the ClickHouse data volume.- For LocalVolume deployments, the backup directory is typically
/cpaas/data/clickhouse/backupson the host node. - For StorageClass deployments, such as TopoLVM, NFS, or Ceph, the backup directory is on the corresponding StorageClass volume.
compression_method = 'zstd'compresses the backup content with zstd to reduce storage usage.- If you need to archive the backup to NAS or another custom backup directory, copy the backup after the backup task is complete.
Validate Backup Success
After the backup is complete, validate the result.
Check Backup Task Status
Run the following query on the ClickHouse instance where the backup command was executed:
Expected result:
Locate the backup record for observability.audit through the name field. The backup is successful if the following conditions are met:
status = 'BACKUP_CREATED'erroris emptyend_timehas a valuenum_files > 0total_size > 0
Check the Backup Directory
Run the following command on the node or in the container where the ClickHouse backup directory is available.
For LocalVolume deployments, the backup directory is typically /cpaas/data/clickhouse/backups:
For StorageClass deployments, check the backup directory on the corresponding StorageClass volume.
Expected result:
- The backup directory exists.
- The directory contains files or folders generated by this backup.
- The file count and total size are greater than 0.
Create an Incremental Backup
An incremental backup uses base_backup and writes only new or changed data files.
Run the following command on any healthy ClickHouse instance:
Notes:
- The incremental backup depends on the backup specified by
base_backup. - This document recommends that each daily incremental backup uses the latest full backup as its
base_backup, instead of using the previous incremental backup as the base. This keeps the restore dependency simple: restoring a daily incremental backup only requires the incremental backup and its corresponding full backup. - Keep the dependent full backup when you restore the incremental backup.
- Create a new full backup periodically to avoid relying on the same baseline for too long.
Archive the Backup Files
BACKUP ... TO File(...) writes the backup files to the ClickHouse backup directory on the ClickHouse data volume where the backup command is executed.
The backup directory depends on the storage type:
- For LocalVolume deployments, the ClickHouse data directory is on the host node, typically
/cpaas/data/clickhouse/, and the backup files are generated under/cpaas/data/clickhouse/backups. - For StorageClass deployments, such as TopoLVM, NFS, or Ceph, the ClickHouse data directory is on the corresponding StorageClass volume, and the backup files are generated under the ClickHouse backup directory on that volume.
After the backup succeeds, copy the backup files from the actual ClickHouse backup directory to a custom backup directory or NAS mount path for retention.
For LocalVolume deployments, the source path is typically /cpaas/data/clickhouse/backups:
For StorageClass deployments, replace /cpaas/data/clickhouse/backups with the actual ClickHouse backup directory on the StorageClass volume.
Notes:
/my_dircan be a designated local archive directory or a NAS mount path.- Use an archive path outside the ClickHouse data directory. Otherwise, the backup files might be deleted when the ClickHouse data directory is cleaned during disaster recovery.
- After each full or incremental backup, copy the corresponding backup directory to the archive location.
- After the incremental backup is copied successfully, you can delete the local incremental backup under the ClickHouse backup directory if it is no longer needed.
- Delete the local full backup under the ClickHouse backup directory only after the next full backup succeeds and the retention policy allows cleanup.
- If you later restore with
RESTORE ... ON CLUSTER ... FROM File(...), make sure the required full and incremental backup directories are copied back to the ClickHouse backup directory on each ClickHouse host node before running the restore command.
Restore
Use this procedure when table data is corrupted, the data directory is deleted, or the table state is abnormal.
Restore Prerequisites
Before you start the restore procedure, make sure the following conditions are met:
- You have a valid full backup or incremental backup.
- If you restore from an incremental backup, the corresponding full backup is available in the restore path.
- You know the archive directory that stores the latest incremental backup and the corresponding full backup.
- You have access to the Kubernetes cluster and the host nodes where ClickHouse instances are scheduled.
Restore Procedure
This procedure restores ClickHouse data table by table. Choose the preparation steps according to the failure scope, and then run the same table restore procedure for each required table.
Stop Writes
Stop razor first to prevent new data from being written during the restore.
Log in to the cluster master node and create a ResourcePatch to stop razor:
Prepare ClickHouse According to the Failure Scope
Choose one of the following preparation paths according to the failure scope. After the preparation is complete, continue with the same table-by-table restore procedure.
Case A: Table Data Is Corrupted but the Data Directory Is Healthy
Use this case when one or more tables are corrupted, accidentally deleted, or have abnormal data, while the ClickHouse data directory and Keeper state are still healthy.
In this case, do not stop ClickHouse and do not clean the ClickHouse data directory. Continue with the table restore procedure directly.
Case B: Data Directory or Keeper Metadata Is Damaged
Use this case only when the ClickHouse data directory is damaged, the node is rebuilt, or the Keeper metadata is unavailable.
In this deployment, ClickHouse Keeper is integrated with ClickHouse and its data is also stored under the ClickHouse data directory. Therefore, cleaning the ClickHouse data directory also removes the local ClickHouse Keeper data.
Warning: Cleaning the data directory deletes local ClickHouse data on the target nodes. Confirm that the backup is available before you continue.
Clean the actual ClickHouse data directory for each ClickHouse instance according to the storage type. For LocalVolume deployments, the directory is typically
/cpaas/data/clickhouse/on the host node. For StorageClass deployments, clean the ClickHouse data directory on the corresponding StorageClass volume or PV. Do not use/cpaas/data/clickhouse/for StorageClass deployments unless it is confirmed to be the actual mounted data path.
Log in to the cluster master node and create a ResourcePatch to stop ClickHouse:
Confirm that all ClickHouse Pods have stopped:
Clean the ClickHouse data directory for each ClickHouse instance according to the storage type.
For LocalVolume deployments, run the following command on each host node where a ClickHouse instance is deployed:
For StorageClass deployments, clean the ClickHouse data directory on the corresponding StorageClass volume or PV. The exact path depends on the StorageClass and CSI implementation. Use the actual mounted ClickHouse data path instead of the LocalVolume example path.
Start only the ClickHouse components by deleting the ClickHouse ResourcePatch. Do not start razor yet.
Confirm that all ClickHouse Pods are running:
Prepare Backup Files on Each ClickHouse Node
Make sure the backup files are available in the ClickHouse backup directory on each ClickHouse host node.
If the backup files were archived to a custom backup directory or NAS mount path and the local files under the ClickHouse backup directory have been deleted, copy the latest incremental backup and the corresponding full backup back to the ClickHouse backup directory on each ClickHouse host node before you restore the table.
For LocalVolume deployments, the destination path is typically /cpaas/data/clickhouse/backups:
For StorageClass deployments, copy the backup files back to the actual ClickHouse backup directory on the StorageClass volume.
Notes:
- If you restore from an incremental backup, prepare the dependent full backup at the same time.
- The backup files must be placed under the ClickHouse backup directory on each ClickHouse host node so that
RESTORE ... ON CLUSTER ... FROM File(...)can access them on every ClickHouse instance. - Replace
/my_dir,/cpaas/data/clickhouse/backups,audit_full_20260423, andaudit_incr_20260424with the actual archive path, ClickHouse backup directory, and backup directory names.
Check Cluster Readiness
Before you run the restore command, make sure the ClickHouse cluster configuration and macros are available.
Check the replicated cluster configuration:
Check the local ClickHouse macros:
Make sure the cluster contains the expected ClickHouse replicas and the macros such as shard and replica are available.
Restore Tables One by One
Run the following restore procedure for each table that needs to be restored.
Drop the target table on the ClickHouse cluster:
Restore the table from the local or NAS backup:
Notes:
- For a
ReplicatedMergeTreetable in this 3-replica deployment, useON CLUSTER 'replicated'so that the restore operation is distributed to all ClickHouse replicas in the cluster. - If you restore from an incremental backup, ClickHouse reads the dependent base backup automatically.
- Keep the corresponding full backup directory accessible during the restore.
- Replace
observability.auditand the backup directory name with the actual table and backup that you want to restore. - Repeat the same procedure for every table that needs to be restored. The table list is determined by the customer.
Check Restore Task Status
After each restore command is executed, check the restore task status before you start any related components:
Expected result:
The restore is successful if the following conditions are met:
statusindicates that the restore operation has completed successfully.erroris empty.end_timehas a value.
Validate Restore Success
After the restore is complete, validate the result from the data, partition, and replica perspectives.
Validate Total Row Count
Run the following query on each ClickHouse instance:
Expected result:
The total_rows value is identical on all ClickHouse instances.
Validate Partition-Level Data
Run the following query on each ClickHouse instance:
Expected result:
- The
partitionlist is identical on all ClickHouse instances. - The
rowsvalue for each partition is identical on all ClickHouse instances.
active_parts is only used to observe the physical part layout. A different number of parts does not necessarily indicate a restore failure.
Validate Replica Status
Run the following query on any ClickHouse instance:
Expected result:
For a 3-replica cluster, the restore is successful if the following conditions are met:
total_replicas = 3active_replicas = 3queue_size = 0absolute_delayis close to0
Start the Related Components
After the restore has been validated successfully, start razor again by deleting the ResourcePatch:
Recommendations
Use the following recommendations in production:
- Maintain the backup chain with one full backup per week and one incremental backup per day.
- Use a consistent naming convention for backup directories, such as
full_YYYYMMDDandincr_YYYYMMDD. - Run restore drills in a test environment on a regular basis.
- Define a retention policy for expired incremental backups and historical full backups.