Docker Integration¶
Learn how Housekeeper's Docker integration provides seamless ClickHouse container management for migration testing and validation.
Overview¶
Housekeeper includes comprehensive Docker integration that allows you to: - Spin up temporary ClickHouse instances for testing - Apply migrations against real ClickHouse servers - Validate schema changes before production deployment - Run integration tests with full ClickHouse feature support
Architecture¶
┌─────────────────────┐ ┌─────────────────────┐
│ Housekeeper │ │ Docker Engine │
│ Migration Tool │ │ │
└─────────┬───────────┘ └─────────┬───────────┘
│ │
│ 1. Start Container │
├─────────────────────────>│
│ │
│ 2. Wait for Ready │
│<─────────────────────────┤
│ │
│ 3. Apply Migrations │
├─────────────────────────>│
│ │
│ 4. Extract Schema │
│<─────────────────────────┤
│ │
│ 5. Cleanup │
├─────────────────────────>│
│ │
└─────────┴───────────────────────────┴───────────┘
┌─────────────────────────────────────────────────┐
│ Container Lifecycle │
├─────────────────────────────────────────────────┤
│ 1. Pull Image (clickhouse/clickhouse-server) │
│ 2. Mount Config Volume (db/config.d) │
│ 3. Start Container with Health Check │
│ 4. Wait for ClickHouse Ready │
│ 5. Execute SQL Commands/Files │
│ 6. Extract Results │
│ 7. Stop and Remove Container │
└─────────────────────────────────────────────────┘
Container Management¶
Basic Container Operations¶
// Create new Docker manager
dm := docker.New()
// Start ClickHouse container
ctx := context.Background()
if err := dm.Start(ctx); err != nil {
log.Fatal("Failed to start ClickHouse:", err)
}
defer dm.Stop(ctx)
// Get connection details
dsn, err := dm.GetDSN() // TCP: localhost:9000
httpDSN, err := dm.GetHTTPDSN() // HTTP: http://localhost:8123
Advanced Configuration¶
// Custom Docker options
opts := docker.DockerOptions{
Version: "25.7", // Specific ClickHouse version
ConfigDir: "/path/to/config", // Custom config directory
ContainerName: "test-clickhouse", // Custom container name
TCPPort: 9001, // Custom TCP port
HTTPPort: 8124, // Custom HTTP port
Memory: "2g", // Memory limit
NetworkMode: "bridge", // Network mode
}
dm := docker.NewWithOptions(opts)
Project Integration¶
// Integration with Housekeeper project
proj := project.New(project.ProjectParams{
Dir: "/path/to/project",
Formatter: format.New(format.Defaults),
})
// Initialize project (creates config directory)
if err := proj.Initialize(); err != nil {
log.Fatal(err)
}
// Create Docker manager with project configuration
dm := proj.NewDockerManager()
// Container automatically uses project's ClickHouse configuration
if err := dm.Start(ctx); err != nil {
log.Fatal(err)
}
Configuration Mounting¶
Project Configuration Structure¶
project/
├── housekeeper.yaml
└── db/
├── config.d/
│ ├── _clickhouse.xml # Generated by Housekeeper
│ ├── cluster.xml # Custom cluster config
│ ├── users.xml # Custom user config
│ └── logging.xml # Custom logging config
├── main.sql
└── migrations/
Automatic Config Generation¶
Housekeeper automatically generates db/config.d/_clickhouse.xml
:
<clickhouse>
<!-- Cluster configuration -->
<remote_servers>
<my_cluster>
<shard>
<replica>
<host>localhost</host>
<port>9000</port>
</replica>
</shard>
</my_cluster>
</remote_servers>
<!-- Keeper/Zookeeper for ReplicatedMergeTree -->
<keeper_server>
<tcp_port>9181</tcp_port>
<server_id>1</server_id>
<log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>
<snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>
<coordination_settings>
<operation_timeout_ms>10000</operation_timeout_ms>
<session_timeout_ms>30000</session_timeout_ms>
<raft_logs_level>information</raft_logs_level>
</coordination_settings>
<raft_configuration>
<server>
<id>1</id>
<hostname>localhost</hostname>
<port>9234</port>
</server>
</raft_configuration>
</keeper_server>
<!-- Macros for ReplicatedMergeTree -->
<macros>
<cluster>my_cluster</cluster>
<shard>01</shard>
<replica>replica1</replica>
</macros>
</clickhouse>
Volume Mounting¶
The Docker integration automatically mounts configuration:
# Container is started with:
docker run -d \
--name housekeeper-clickhouse \
-p 9000:9000 \
-p 8123:8123 \
-v /project/db/config.d:/etc/clickhouse-server/config.d \
clickhouse/clickhouse-server:25.7
This enables:
- Cluster Support: Full distributed DDL capabilities
- ReplicatedMergeTree: Replicated table engines work correctly
- Custom Settings: Your specific ClickHouse configuration
- Production Parity: Container behaves like your target environment
Migration Testing Workflow¶
Complete Migration Test¶
func TestMigrationWorkflow(t *testing.T) {
// Initialize project
proj := project.New(project.ProjectParams{
Dir: t.TempDir(),
Formatter: format.New(format.Defaults),
})
err := proj.Initialize()
require.NoError(t, err)
// Write test schema
schema := `
CREATE DATABASE test_db ENGINE = Atomic;
CREATE TABLE test_db.users (
id UInt64,
name String,
created_at DateTime DEFAULT now()
) ENGINE = MergeTree() ORDER BY id;
`
err = os.WriteFile(filepath.Join(proj.Dir, "db/main.sql"), []byte(schema), 0644)
require.NoError(t, err)
// Start ClickHouse container
dm := proj.NewDockerManager()
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
defer cancel()
err = dm.Start(ctx)
require.NoError(t, err)
defer dm.Stop(ctx)
// Apply migration
err = proj.ApplyMigrations(ctx, dm)
require.NoError(t, err)
// Verify schema was applied
client, err := clickhouse.NewClient(ctx, dm.GetDSN())
require.NoError(t, err)
defer client.Close()
tables, err := client.GetTables(ctx)
require.NoError(t, err)
// Verify table exists
found := false
for _, table := range tables.Statements {
if table.CreateTable != nil && table.CreateTable.Name == "users" {
found = true
break
}
}
require.True(t, found, "users table should exist")
// Test data operations
err = dm.Exec(ctx, "INSERT INTO test_db.users (id, name) VALUES (1, 'Alice')")
require.NoError(t, err)
result, err := dm.Query(ctx, "SELECT count() FROM test_db.users")
require.NoError(t, err)
require.Equal(t, "1", strings.TrimSpace(result))
}
Schema Evolution Testing¶
func TestSchemaEvolution(t *testing.T) {
proj := setupProject(t)
dm := proj.NewDockerManager()
ctx := context.Background()
// Start container
err := dm.Start(ctx)
require.NoError(t, err)
defer dm.Stop(ctx)
// Apply initial schema
initialSchema := `
CREATE DATABASE analytics ENGINE = Atomic;
CREATE TABLE analytics.events (
id UUID DEFAULT generateUUIDv4(),
timestamp DateTime,
event_type String
) ENGINE = MergeTree() ORDER BY timestamp;
`
err = writeSchema(proj, initialSchema)
require.NoError(t, err)
err = proj.ApplyMigrations(ctx, dm)
require.NoError(t, err)
// Insert test data
err = dm.Exec(ctx, `
INSERT INTO analytics.events (timestamp, event_type) VALUES
('2024-01-01 12:00:00', 'page_view'),
('2024-01-01 12:01:00', 'click')
`)
require.NoError(t, err)
// Evolve schema - add new column
evolvedSchema := `
CREATE DATABASE analytics ENGINE = Atomic;
CREATE TABLE analytics.events (
id UUID DEFAULT generateUUIDv4(),
timestamp DateTime,
event_type String,
user_id UInt64 DEFAULT 0 -- New column
) ENGINE = MergeTree() ORDER BY timestamp;
`
err = writeSchema(proj, evolvedSchema)
require.NoError(t, err)
// Apply evolution migration
err = proj.ApplyMigrations(ctx, dm)
require.NoError(t, err)
// Verify data integrity after schema change
result, err := dm.Query(ctx, "SELECT count(), max(user_id) FROM analytics.events")
require.NoError(t, err)
parts := strings.Fields(strings.TrimSpace(result))
require.Equal(t, "2", parts[0], "Should have 2 events")
require.Equal(t, "0", parts[1], "New column should have default value")
// Test new column functionality
err = dm.Exec(ctx, `
INSERT INTO analytics.events (timestamp, event_type, user_id)
VALUES ('2024-01-01 12:02:00', 'purchase', 123)
`)
require.NoError(t, err)
result, err = dm.Query(ctx, "SELECT user_id FROM analytics.events WHERE event_type = 'purchase'")
require.NoError(t, err)
require.Equal(t, "123", strings.TrimSpace(result))
}
Advanced Features¶
Multi-Container Testing¶
func TestClusterMigration(t *testing.T) {
// Start multiple ClickHouse containers for cluster testing
containers := []docker.Manager{}
for i := 0; i < 3; i++ {
opts := docker.DockerOptions{
Version: "25.7",
ContainerName: fmt.Sprintf("ch-node-%d", i+1),
TCPPort: 9000 + i,
HTTPPort: 8123 + i,
}
dm := docker.NewWithOptions(opts)
containers = append(containers, dm)
err := dm.Start(context.Background())
require.NoError(t, err)
defer dm.Stop(context.Background())
}
// Test cluster-aware migration
schema := `
CREATE DATABASE cluster_db ON CLUSTER test_cluster ENGINE = Atomic;
CREATE TABLE cluster_db.distributed_events ON CLUSTER test_cluster (
id UInt64,
data String
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/events', '{replica}')
ORDER BY id;
`
// Apply to each node
for _, dm := range containers {
err := dm.ExecSQL(context.Background(), schema)
require.NoError(t, err)
}
}
Performance Testing¶
func TestMigrationPerformance(t *testing.T) {
proj := setupProject(t)
dm := proj.NewDockerManager()
ctx := context.Background()
err := dm.Start(ctx)
require.NoError(t, err)
defer dm.Stop(ctx)
// Create large table for performance testing
largeTableSchema := `
CREATE DATABASE perf_test ENGINE = Atomic;
CREATE TABLE perf_test.large_table (
id UInt64,
data String,
timestamp DateTime
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY id;
`
err = writeSchema(proj, largeTableSchema)
require.NoError(t, err)
// Measure migration time
start := time.Now()
err = proj.ApplyMigrations(ctx, dm)
migrationTime := time.Since(start)
require.NoError(t, err)
t.Logf("Migration took %v", migrationTime)
// Insert test data
start = time.Now()
err = dm.Exec(ctx, `
INSERT INTO perf_test.large_table
SELECT number, toString(number), now() - INTERVAL number HOUR
FROM numbers(1000000)
`)
insertTime := time.Since(start)
require.NoError(t, err)
t.Logf("Data insertion took %v", insertTime)
// Test schema evolution on large table
evolvedSchema := `
CREATE DATABASE perf_test ENGINE = Atomic;
CREATE TABLE perf_test.large_table (
id UInt64,
data String,
timestamp DateTime,
category LowCardinality(String) DEFAULT 'default'
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY id;
`
err = writeSchema(proj, evolvedSchema)
require.NoError(t, err)
start = time.Now()
err = proj.ApplyMigrations(ctx, dm)
evolutionTime := time.Since(start)
require.NoError(t, err)
t.Logf("Schema evolution took %v", evolutionTime)
// Verify data integrity
result, err := dm.Query(ctx, "SELECT count() FROM perf_test.large_table")
require.NoError(t, err)
require.Equal(t, "1000000", strings.TrimSpace(result))
}
ReplicatedMergeTree Testing¶
func TestReplicatedMergeTree(t *testing.T) {
proj := setupProject(t)
dm := proj.NewDockerManager()
ctx := context.Background()
err := dm.Start(ctx)
require.NoError(t, err)
defer dm.Stop(ctx)
// Test ReplicatedMergeTree with keeper
replicatedSchema := `
CREATE DATABASE replicated_db ENGINE = Atomic;
CREATE TABLE replicated_db.replicated_table (
id UInt64,
data String,
created_at DateTime DEFAULT now()
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/shard1/replicated_table', 'replica1')
ORDER BY id;
`
err = writeSchema(proj, replicatedSchema)
require.NoError(t, err)
err = proj.ApplyMigrations(ctx, dm)
require.NoError(t, err)
// Test replication functionality
err = dm.Exec(ctx, `
INSERT INTO replicated_db.replicated_table (id, data) VALUES
(1, 'test data 1'),
(2, 'test data 2')
`)
require.NoError(t, err)
// Verify data
result, err := dm.Query(ctx, "SELECT count() FROM replicated_db.replicated_table")
require.NoError(t, err)
require.Equal(t, "2", strings.TrimSpace(result))
// Test that table appears in system.replicas
result, err = dm.Query(ctx, `
SELECT count() FROM system.replicas
WHERE table = 'replicated_table'
`)
require.NoError(t, err)
require.Equal(t, "1", strings.TrimSpace(result))
}
Error Handling and Troubleshooting¶
Common Issues and Solutions¶
Container Start Failures¶
func handleContainerStartError(err error) {
switch {
case strings.Contains(err.Error(), "port is already allocated"):
log.Println("Port conflict - try different port or stop existing container")
case strings.Contains(err.Error(), "no such image"):
log.Println("Image not found - pulling ClickHouse image...")
// Auto-pull image
case strings.Contains(err.Error(), "timeout"):
log.Println("Container start timeout - may need more time or resources")
default:
log.Printf("Unexpected error: %v", err)
}
}
ClickHouse Ready Check¶
func waitForClickHouseReady(ctx context.Context, dsn string) error {
client, err := clickhouse.NewClient(ctx, dsn)
if err != nil {
return err
}
defer client.Close()
timeout := time.After(60 * time.Second)
ticker := time.NewTicker(1 * time.Second)
defer ticker.Stop()
for {
select {
case <-timeout:
return fmt.Errorf("timeout waiting for ClickHouse to be ready")
case <-ticker.C:
err := client.Ping(ctx)
if err == nil {
return nil
}
case <-ctx.Done():
return ctx.Err()
}
}
}
Migration Failures¶
func handleMigrationError(err error, migration string) error {
// Log container state for debugging
logs, logErr := docker.GetContainerLogs("housekeeper-clickhouse")
if logErr == nil {
log.Printf("ClickHouse logs:\n%s", logs)
}
// Provide specific error guidance
switch {
case strings.Contains(err.Error(), "Table already exists"):
return fmt.Errorf("table already exists - migration may have been partially applied: %w", err)
case strings.Contains(err.Error(), "Syntax error"):
return fmt.Errorf("SQL syntax error in migration %s: %w", migration, err)
case strings.Contains(err.Error(), "Memory limit"):
return fmt.Errorf("memory limit exceeded - consider increasing container memory: %w", err)
default:
return fmt.Errorf("migration failed: %w", err)
}
}
CI/CD Integration¶
GitHub Actions Workflow¶
# .github/workflows/migration-test.yml
name: Migration Testing
on:
pull_request:
paths:
- 'db/**'
jobs:
test-migrations:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Go
uses: actions/setup-go@v4
with:
go-version: '1.21'
- name: Install Housekeeper
run: go install github.com/pseudomuto/housekeeper@latest
- name: Test Migration Generation
run: |
# Test that migrations can be generated
housekeeper diff --dry-run
- name: Test Migration Application
run: |
# Start ClickHouse container
docker run -d \
--name test-clickhouse \
-p 9000:9000 \
-e CLICKHOUSE_USER=default \
-e CLICKHOUSE_PASSWORD= \
clickhouse/clickhouse-server:25.7
# Wait for ready
sleep 10
# Start development server (applies migrations automatically)
housekeeper dev up
# Verify schema
housekeeper schema dump --url localhost:9000 > applied_schema.sql
housekeeper schema compile > expected_schema.sql
# Compare schemas (allowing for minor formatting differences)
if ! diff -w applied_schema.sql expected_schema.sql; then
echo "Schema mismatch detected"
exit 1
fi
- name: Cleanup
if: always()
run: |
docker stop test-clickhouse || true
docker rm test-clickhouse || true
Docker Compose Testing¶
# docker-compose.test.yml
version: '3.8'
services:
clickhouse:
image: clickhouse/clickhouse-server:25.7
ports:
- "9000:9000"
- "8123:8123"
volumes:
- ./db/config.d:/etc/clickhouse-server/config.d
environment:
CLICKHOUSE_USER: default
CLICKHOUSE_PASSWORD: ""
healthcheck:
test: ["CMD", "clickhouse-client", "--query", "SELECT 1"]
interval: 5s
timeout: 3s
retries: 5
migration-test:
build: .
depends_on:
clickhouse:
condition: service_healthy
command: |
bash -c "
housekeeper dev up &&
housekeeper schema dump --url clickhouse:9000 > /tmp/applied.sql &&
housekeeper schema compile > /tmp/expected.sql &&
diff -w /tmp/applied.sql /tmp/expected.sql
"
volumes:
- .:/workspace
working_dir: /workspace
Best Practices¶
Resource Management¶
// Always use context with timeout
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
defer cancel()
// Always cleanup containers
defer func() {
if err := dm.Stop(ctx); err != nil {
log.Printf("Warning: failed to stop container: %v", err)
}
}()
// Use appropriate resource limits
opts := docker.DockerOptions{
Memory: "2g", // Limit memory usage
CPUs: "1.0", // Limit CPU usage
ShmSize: "128m", // Shared memory size
}
Test Organization¶
// Use test helpers for common patterns
func setupTestProject(t *testing.T) (*project.Project, docker.Manager) {
proj := project.New(project.ProjectParams{
Dir: t.TempDir(),
Formatter: format.New(format.Defaults),
})
err := proj.Initialize()
require.NoError(t, err)
dm := proj.NewDockerManager()
return proj, dm
}
// Use subtests for organized testing
func TestMigrationSuite(t *testing.T) {
proj, dm := setupTestProject(t)
ctx := context.Background()
err := dm.Start(ctx)
require.NoError(t, err)
defer dm.Stop(ctx)
t.Run("BasicSchema", func(t *testing.T) {
// Test basic schema application
})
t.Run("SchemaEvolution", func(t *testing.T) {
// Test schema changes
})
t.Run("DataMigration", func(t *testing.T) {
// Test data migration scenarios
})
}
The Docker integration makes it easy to test ClickHouse schemas and migrations in a consistent, reproducible environment that closely matches production deployments.
Next Steps¶
- Overview - High-level system architecture
- Parser Architecture - Understand DDL parsing
- Migration Generation - Learn about migration algorithms
- Best Practices - Production deployment patterns