Best Practices¶
Production-proven guidelines for using Housekeeper effectively in real-world ClickHouse deployments.
Project Organization¶
Schema Structure¶
Use Modular Organization¶
db/
├── main.sql # Main orchestration file
├── schemas/
│ ├── analytics/ # Analytics database
│ │ ├── schema.sql # Creates database and imports subdirectories
│ │ ├── tables/
│ │ │ ├── events.sql
│ │ │ ├── users.sql
│ │ │ └── sessions.sql
│ │ ├── views/
│ │ │ ├── daily_summary.sql
│ │ │ └── user_reports.sql
│ │ └── dictionaries/
│ │ └── segments_dict.sql
│ ├── reporting/ # Reporting database
│ │ ├── schema.sql # Creates database and imports subdirectories
│ │ ├── tables/
│ │ │ ├── dashboards.sql
│ │ │ └── metrics.sql
│ │ └── views/
│ │ └── executive_summary.sql
│ └── reference/ # Reference data database
│ ├── schema.sql # Creates database and imports subdirectories
│ ├── tables/
│ │ ├── countries.sql
│ │ └── categories.sql
│ └── dictionaries/
│ └── countries_dict.sql
└── migrations/ # Generated migrations
├── 20240101120000.sql
└── housekeeper.sum
Import Order Matters¶
-- db/main.sql - Order by database dependencies
-- 1. Reference data first (no dependencies)
-- housekeeper:import schemas/reference/schema.sql
-- 2. Core analytics database
-- housekeeper:import schemas/analytics/schema.sql
-- 3. Reporting database (depends on analytics)
-- housekeeper:import schemas/reporting/schema.sql
Database Schema Files¶
Each database's schema.sql creates the database and imports its objects:
-- schemas/analytics/schema.sql
-- Create the analytics database
CREATE DATABASE analytics ENGINE = Atomic COMMENT 'Analytics database';
-- Import tables first (no dependencies within database)
-- housekeeper:import tables/users.sql
-- housekeeper:import tables/events.sql
-- housekeeper:import tables/sessions.sql
-- Import dictionaries (may depend on tables)
-- housekeeper:import dictionaries/segments_dict.sql
-- Import views last (depend on tables and dictionaries)
-- housekeeper:import views/daily_summary.sql
-- housekeeper:import views/user_reports.sql
Reference Database Example¶
-- schemas/reference/schema.sql
-- Create the reference database
CREATE DATABASE reference ENGINE = Atomic COMMENT 'Reference data';
-- Import tables
-- housekeeper:import tables/countries.sql
-- housekeeper:import tables/categories.sql
-- Import dictionaries
-- housekeeper:import dictionaries/countries_dict.sql
Environment Management¶
Environment-Specific Configurations¶
# environments/production.yaml
clickhouse:
version: "25.7" # Pin specific version
config_dir: "db/config.d"
cluster: "production_cluster"
entrypoint: db/main.sql
dir: db/migrations
CI/CD Integration¶
# .github/workflows/schema-validation.yml
name: Schema Validation
on:
pull_request:
paths:
- 'db/**'
- 'housekeeper.yaml'
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Housekeeper
run: |
go install github.com/pseudomuto/housekeeper@latest
- name: Validate Schema Syntax
run: housekeeper schema compile
- name: Test Migration Generation
run: |
# Start test ClickHouse
docker run -d --name test-ch -p 9000:9000 clickhouse/clickhouse-server:25.7
sleep 10
# Generate and validate migration
housekeeper diff --url localhost:9000 --dry-run
- name: Cleanup
run: docker rm -f test-ch
Schema Design¶
Table Design Patterns¶
Time-Series Tables¶
-- Optimal design for high-volume event data
CREATE TABLE analytics.events (
timestamp DateTime64(3), -- Millisecond precision
user_id UInt64,
event_type LowCardinality(String), -- Limited set of values
session_id String,
properties Map(String, String),
-- Materialized columns for performance
date Date MATERIALIZED toDate(timestamp),
hour UInt8 MATERIALIZED toHour(timestamp)
)
ENGINE = MergeTree()
PARTITION BY (toYYYYMM(timestamp), event_type) -- Multi-dimensional partitioning
ORDER BY (timestamp, user_id) -- Most selective columns first
TTL timestamp + INTERVAL 90 DAY -- Data lifecycle management
SETTINGS index_granularity = 8192;
Mutable Data Tables¶
-- User profiles with update tracking
CREATE TABLE users.profiles (
id UInt64,
email String,
name String,
status LowCardinality(String),
metadata Map(String, String),
created_at DateTime,
updated_at DateTime -- Version column for ReplacingMergeTree
)
ENGINE = ReplacingMergeTree(updated_at)
ORDER BY id
SETTINGS index_granularity = 8192;
Aggregation Tables¶
-- Pre-aggregated data for fast queries
CREATE TABLE analytics.daily_user_stats (
date Date,
user_id UInt64,
event_count UInt32,
session_count UInt32,
revenue Decimal64(2)
)
ENGINE = SummingMergeTree((event_count, session_count, revenue))
ORDER BY (date, user_id)
SETTINGS index_granularity = 8192;
Performance Optimization¶
Ordering Key Strategy¶
-- Query pattern: Filter by user, then time range
-- Good: ORDER BY (user_id, timestamp)
SELECT * FROM events WHERE user_id = 123 AND timestamp >= '2024-01-01';
-- Query pattern: Time range analysis across users
-- Good: ORDER BY (timestamp, event_type, user_id)
SELECT event_type, count(*) FROM events
WHERE timestamp BETWEEN '2024-01-01' AND '2024-01-02'
GROUP BY event_type;
Partitioning Strategy¶
-- Good: Balanced partition sizes (100M-1B rows per partition)
PARTITION BY toYYYYMM(timestamp) -- Monthly for moderate volume
-- Good: Multi-dimensional for query pruning
PARTITION BY (toYYYYMM(timestamp), event_type) -- When filtering by both
-- Avoid: Too granular (too many small partitions)
-- PARTITION BY toDate(timestamp) -- Daily might be too granular
-- Avoid: Unbalanced partitions
-- PARTITION BY user_id -- Likely very unbalanced
Index Optimization¶
-- Add skipping indexes for common filters
ALTER TABLE analytics.events
ADD INDEX idx_user_id user_id TYPE minmax GRANULARITY 4;
ALTER TABLE analytics.events
ADD INDEX idx_event_type event_type TYPE set(100) GRANULARITY 1;
-- For string prefix searches
ALTER TABLE analytics.events
ADD INDEX idx_url_prefix page_url TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 1;
Data Types Best Practices¶
Choose Optimal Types¶
CREATE TABLE optimized_table (
-- Use smallest sufficient numeric types
id UInt64, -- Large range needed
age UInt8, -- 0-255 sufficient
price Decimal64(2), -- Financial precision
-- Optimize string storage
status LowCardinality(String), -- Limited values
country_code FixedString(2), -- Known length
description String, -- Variable length
-- Use appropriate temporal precision
created_at DateTime, -- Second precision
event_time DateTime64(3), -- Millisecond precision
birth_date Date, -- Day precision
-- Complex types for flexibility
tags Array(String), -- Multiple values
metadata Map(String, String), -- Key-value pairs
location Tuple(Float64, Float64), -- Structured data
-- Nullable only when necessary
phone Nullable(String), -- Optional field
email String -- Required field (not nullable)
);
Avoid Anti-Patterns¶
-- ❌ Don't use Nullable(LowCardinality(...))
-- column Nullable(LowCardinality(String)) -- Invalid combination
-- ❌ Don't use String for numeric data
-- user_id String -- Should be UInt64
-- ❌ Don't use overly precise types when not needed
-- price Decimal128(38) -- Decimal64(2) usually sufficient
-- ❌ Don't use DateTime64 when DateTime suffices
-- created_at DateTime64(9) -- Usually DateTime is enough
Migration Management¶
Migration Workflow¶
Development Process¶
# 1. Make schema changes
vim db/schemas/analytics/tables/events.sql
# 2. Validate syntax
housekeeper schema compile
# 3. Generate migration
housekeeper diff --config environments/development.yaml
# 4. Review generated SQL
cat db/migrations/20240315143000.sql
# 5. Test in development
housekeeper migrate --config environments/development.yaml
# 6. Commit schema and migration together
git add db/schemas/ db/migrations/
git commit -m "Add user_segment column to events table"
Production Deployment¶
# 1. Validate in staging
housekeeper migrate --config environments/staging.yaml --dry-run
housekeeper migrate --config environments/staging.yaml
# 2. Backup production
housekeeper backup --config environments/production.yaml
# 3. Deploy to production with approval
housekeeper migrate --config environments/production.yaml
Migration Consolidation¶
Over time, accumulating many migration files can become unwieldy. Use snapshots to consolidate:
# Consolidate existing migrations into a snapshot
housekeeper snapshot --description "Q1 2024 consolidation"
# This will:
# - Create a single snapshot file with all migration content
# - Remove individual migration files
# - Update the sum file
Best practices for consolidation: - Timing: Consolidate after major releases or quarterly - Testing: Test snapshot application in a fresh environment - Documentation: Document what's included in each snapshot - Backup: Keep backups of original migrations before consolidation
Bootstrapping Existing Databases¶
When adopting Housekeeper for an existing database:
# 1. Initialize and extract current schema
housekeeper init
housekeeper bootstrap --url production-clickhouse:9000
# 2. Create initial baseline snapshot
housekeeper snapshot --bootstrap --description "Production baseline $(date +%Y-%m-%d)"
# 3. Continue with normal workflow
housekeeper diff # Future changes will be against this baseline
The --bootstrap flag is crucial here - it creates a snapshot from the compiled project schema rather than from migrations (which don't exist yet).
Migration Safety¶
Pre-Migration Checks¶
#!/bin/bash
# pre-migration-check.sh
set -e
echo "Running pre-migration checks..."
# 1. Validate schema syntax
housekeeper schema compile -d production
# 2. Test database connection
housekeeper schema dump --url localhost:9000
# 3. Check migration file integrity
housekeeper rehash -d production
# 4. Generate diff to preview changes
housekeeper diff -d production
# 5. Verify backup is recent
if [ ! -f "backup_$(date +%Y%m%d).sql" ]; then
echo "ERROR: No recent backup found"
exit 1
fi
echo "Pre-migration checks passed ✓"
Post-Migration Validation¶
#!/bin/bash
# post-migration-check.sh
set -e
echo "Running post-migration validation..."
# 1. Verify schema matches expectation
housekeeper schema dump --config production.yaml > current_schema.sql
housekeeper schema compile > expected_schema.sql
if ! diff -q current_schema.sql expected_schema.sql; then
echo "WARNING: Schema differs from expectation"
diff current_schema.sql expected_schema.sql
fi
# 2. Run basic queries to verify data integrity
clickhouse-client --query "SELECT count() FROM analytics.events"
clickhouse-client --query "SELECT max(timestamp) FROM analytics.events"
# 3. Check for any errors in ClickHouse logs
docker logs clickhouse-server 2>&1 | grep -i error | tail -10
echo "Post-migration validation complete ✓"
Rollback Procedures¶
Rollback Strategy¶
-- For simple additions (safe to ignore)
ALTER TABLE analytics.events DROP COLUMN IF EXISTS new_column;
-- For complex changes (may require data migration)
-- 1. Stop application writes
-- 2. Restore from backup
-- 3. Apply reverse migration
-- 4. Verify data integrity
-- 5. Resume application
Rollback Script Template¶
#!/bin/bash
# rollback-migration.sh
MIGRATION_TIMESTAMP=$1
BACKUP_FILE=$2
if [ -z "$MIGRATION_TIMESTAMP" ] || [ -z "$BACKUP_FILE" ]; then
echo "Usage: $0 <migration_timestamp> <backup_file>"
exit 1
fi
echo "Rolling back migration $MIGRATION_TIMESTAMP..."
# 1. Create current state backup
housekeeper backup --output "pre_rollback_$(date +%Y%m%d_%H%M%S).sql"
# 2. Apply reverse migration
housekeeper migrate --rollback-to "$MIGRATION_TIMESTAMP"
# 3. Verify rollback
housekeeper status
echo "Rollback complete. Please verify data integrity."
Cluster Management¶
Cluster Configuration¶
Multi-Shard Setup¶
<!-- db/config.d/cluster.xml -->
<clickhouse>
<remote_servers>
<production_cluster>
<shard>
<replica>
<host>ch-shard1-replica1.internal</host>
<port>9000</port>
</replica>
<replica>
<host>ch-shard1-replica2.internal</host>
<port>9000</port>
</replica>
</shard>
<shard>
<replica>
<host>ch-shard2-replica1.internal</host>
<port>9000</port>
</replica>
<replica>
<host>ch-shard2-replica2.internal</host>
<port>9000</port>
</replica>
</shard>
</production_cluster>
</remote_servers>
</clickhouse>
Cluster-Aware Schema¶
-- All DDL includes ON CLUSTER for distributed execution
CREATE DATABASE analytics ON CLUSTER production_cluster
ENGINE = Atomic;
CREATE TABLE analytics.events ON CLUSTER production_cluster (
-- table definition
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/events', '{replica}')
ORDER BY timestamp;
-- Distributed table for queries across shards
CREATE TABLE analytics.events_distributed ON CLUSTER production_cluster
AS analytics.events
ENGINE = Distributed(production_cluster, analytics, events, rand());
Cluster Deployment¶
Rolling Updates¶
#!/bin/bash
# rolling-cluster-update.sh
CLUSTER_NODES=("ch-node1" "ch-node2" "ch-node3" "ch-node4")
for node in "${CLUSTER_NODES[@]}"; do
echo "Updating node: $node"
# 1. Apply migration to single node
housekeeper migrate --host "$node" --config production.yaml
# 2. Verify node health
clickhouse-client --host "$node" --query "SELECT version()"
# 3. Wait for replication to catch up
sleep 30
echo "Node $node updated successfully"
done
echo "Cluster update complete"
Monitoring and Observability¶
Migration Monitoring¶
Log Monitoring¶
# monitoring/migration-alerts.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: housekeeper-alerts
spec:
groups:
- name: housekeeper
rules:
- alert: MigrationFailed
expr: housekeeper_migration_status != 1
for: 0m
annotations:
summary: "Housekeeper migration failed"
description: "Migration {{ $labels.migration }} failed on {{ $labels.instance }}"
- alert: MigrationTimeout
expr: housekeeper_migration_duration_seconds > 1800
for: 0m
annotations:
summary: "Migration taking too long"
description: "Migration {{ $labels.migration }} has been running for {{ $value }} seconds"
Health Checks¶
#!/bin/bash
# health-check.sh
# Check migration status
STATUS=$(housekeeper status --config production.yaml --format json)
PENDING_MIGRATIONS=$(echo "$STATUS" | jq '.pending_migrations | length')
if [ "$PENDING_MIGRATIONS" -gt 0 ]; then
echo "WARNING: $PENDING_MIGRATIONS pending migrations"
exit 1
fi
# Check schema consistency across cluster
NODES=("node1" "node2" "node3")
REFERENCE_SCHEMA=""
for node in "${NODES[@]}"; do
SCHEMA=$(housekeeper schema dump --host "$node" --format hash)
if [ -z "$REFERENCE_SCHEMA" ]; then
REFERENCE_SCHEMA="$SCHEMA"
elif [ "$SCHEMA" != "$REFERENCE_SCHEMA" ]; then
echo "ERROR: Schema mismatch on $node"
exit 1
fi
done
echo "All nodes have consistent schema ✓"
Performance Monitoring¶
Query Performance¶
-- Monitor query performance after migrations
SELECT
query_duration_ms,
query,
user,
type,
event_time
FROM system.query_log
WHERE event_time >= now() - INTERVAL 1 HOUR
AND query_duration_ms > 10000 -- Queries taking >10s
ORDER BY query_duration_ms DESC
LIMIT 10;
Table Statistics¶
-- Monitor table growth and health
SELECT
database,
table,
formatReadableSize(total_bytes) as size,
total_rows,
active_parts,
total_parts
FROM system.parts
WHERE active = 1
AND database NOT IN ('system', 'information_schema')
ORDER BY total_bytes DESC;
Security Best Practices¶
Access Control¶
Migration User Setup¶
<!-- Dedicated migration user with minimal privileges -->
<users>
<migration_user>
<password_sha256_hex><!-- hashed password --></password_sha256_hex>
<profile>default</profile>
<quota>default</quota>
<allow_databases>
<database>analytics</database>
<database>reporting</database>
</allow_databases>
<access_management>1</access_management>
</migration_user>
</users>
Environment Variable Security¶
# Use secure environment variable management
export CH_MIGRATION_PASSWORD="$(vault kv get -field=password secret/clickhouse/migration)"
# Rotate passwords regularly
vault kv put secret/clickhouse/migration password="$(openssl rand -base64 32)"
Backup Strategy¶
Automated Backups¶
#!/bin/bash
# backup-schema.sh
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR="/backups/clickhouse"
# 1. Create schema backup
housekeeper schema dump --config production.yaml > "$BACKUP_DIR/schema_$DATE.sql"
# 2. Create data backup (for critical tables)
clickhouse-client --query "BACKUP TABLE analytics.users TO S3('s3://backups/users_$DATE.tar')"
# 3. Verify backup integrity
if [ -s "$BACKUP_DIR/schema_$DATE.sql" ]; then
echo "Schema backup created: schema_$DATE.sql"
else
echo "ERROR: Schema backup failed"
exit 1
fi
# 4. Cleanup old backups (keep 30 days)
find "$BACKUP_DIR" -name "schema_*.sql" -mtime +30 -delete
Documentation¶
Schema Documentation¶
Inline Documentation¶
-- Use comprehensive comments
CREATE DATABASE analytics
ENGINE = Atomic
COMMENT 'Analytics database v2.1 - Contains user behavior and business metrics';
CREATE TABLE analytics.events (
id UUID DEFAULT generateUUIDv4() COMMENT 'Unique event identifier',
user_id UInt64 COMMENT 'Reference to users.profiles.id',
event_type LowCardinality(String) COMMENT 'Event category: page_view, click, purchase, etc.',
timestamp DateTime COMMENT 'Event occurrence time in UTC',
-- Derived fields for analytics
date Date MATERIALIZED toDate(timestamp) COMMENT 'Partition key and aggregation dimension',
hour UInt8 MATERIALIZED toHour(timestamp) COMMENT 'Hourly aggregation dimension'
)
ENGINE = MergeTree()
PARTITION BY (toYYYYMM(timestamp), event_type)
ORDER BY (timestamp, user_id)
TTL timestamp + INTERVAL 90 DAY DELETE
COMMENT 'User interaction events with 90-day retention policy';
Migration Documentation¶
-- Migration header template
-- Schema migration generated at 2024-03-15 14:30:22 UTC
--
-- Changes:
-- - Add user_segment column to events table for ML-based user categorization
-- - Add index on user_segment for filtering performance
-- - Update daily_stats view to include segmentation metrics
--
-- Impact:
-- - New column is nullable and backwards compatible
-- - Index creation may take 10-15 minutes on production
-- - View changes affect reporting queries (verify downstream systems)
--
-- Rollback:
-- - DROP COLUMN user_segment from events table
-- - DROP INDEX idx_user_segment
-- - Restore previous view definition from git history
Team Documentation¶
Schema Change Process¶
# Schema Change Process
## 1. Planning
- [ ] Document business requirement
- [ ] Design schema changes
- [ ] Estimate impact and downtime
- [ ] Plan rollback strategy
## 2. Development
- [ ] Create feature branch
- [ ] Implement schema changes
- [ ] Add tests for new functionality
- [ ] Update documentation
## 3. Review
- [ ] Code review by senior engineer
- [ ] DBA review for performance impact
- [ ] Security review for sensitive data
- [ ] Documentation review
## 4. Testing
- [ ] Test migration in development
- [ ] Performance test with realistic data
- [ ] Test rollback procedure
- [ ] Validate downstream systems
## 5. Deployment
- [ ] Deploy to staging environment
- [ ] Validate staging deployment
- [ ] Schedule production maintenance window
- [ ] Deploy to production
- [ ] Monitor for issues
## 6. Post-Deployment
- [ ] Verify all systems operational
- [ ] Monitor performance metrics
- [ ] Update team on completion
- [ ] Document lessons learned
These best practices ensure reliable, maintainable, and performant ClickHouse deployments with Housekeeper. Adapt them to your specific environment and requirements.