Skip to content

Docker Integration

Learn how Housekeeper's Docker integration provides seamless ClickHouse container management for migration testing and validation.

Overview

Housekeeper includes comprehensive Docker integration that allows you to: - Spin up temporary ClickHouse instances for testing - Apply migrations against real ClickHouse servers - Validate schema changes before production deployment - Run integration tests with full ClickHouse feature support

Architecture

┌─────────────────────┐    ┌─────────────────────┐
│   Housekeeper      │    │   Docker Engine     │
│   Migration Tool   │    │                     │
└─────────┬───────────┘    └─────────┬───────────┘
          │                          │
          │ 1. Start Container       │
          ├─────────────────────────>│
          │                          │
          │ 2. Wait for Ready       │
          │<─────────────────────────┤
          │                          │
          │ 3. Apply Migrations     │
          ├─────────────────────────>│
          │                          │
          │ 4. Extract Schema       │
          │<─────────────────────────┤
          │                          │
          │ 5. Cleanup              │
          ├─────────────────────────>│
          │                          │
└─────────┴───────────────────────────┴───────────┘

┌─────────────────────────────────────────────────┐
│              Container Lifecycle                │
├─────────────────────────────────────────────────┤
│ 1. Pull Image (clickhouse/clickhouse-server)   │
│ 2. Mount Config Volume (db/config.d)           │
│ 3. Start Container with Health Check           │
│ 4. Wait for ClickHouse Ready                   │
│ 5. Execute SQL Commands/Files                  │
│ 6. Extract Results                             │
│ 7. Stop and Remove Container                   │
└─────────────────────────────────────────────────┘

Container Management

Basic Container Operations

// Create new Docker manager
dm := docker.New()

// Start ClickHouse container
ctx := context.Background()
if err := dm.Start(ctx); err != nil {
    log.Fatal("Failed to start ClickHouse:", err)
}
defer dm.Stop(ctx)

// Get connection details
dsn, err := dm.GetDSN()           // TCP: localhost:9000
httpDSN, err := dm.GetHTTPDSN()   // HTTP: http://localhost:8123

Advanced Configuration

// Custom Docker options
opts := docker.DockerOptions{
    Version:        "25.7",           // Specific ClickHouse version
    ConfigDir:      "/path/to/config", // Custom config directory
    ContainerName:  "test-clickhouse", // Custom container name
    TCPPort:        9001,             // Custom TCP port
    HTTPPort:       8124,             // Custom HTTP port
    Memory:         "2g",             // Memory limit
    NetworkMode:    "bridge",         // Network mode
}

dm := docker.NewWithOptions(opts)

Project Integration

// Integration with Housekeeper project
proj := project.New(project.ProjectParams{
    Dir:       "/path/to/project",
    Formatter: format.New(format.Defaults),
})

// Initialize project (creates config directory)
if err := proj.Initialize(); err != nil {
    log.Fatal(err)
}

// Create Docker manager with project configuration
dm := proj.NewDockerManager()

// Container automatically uses project's ClickHouse configuration
if err := dm.Start(ctx); err != nil {
    log.Fatal(err)
}

Configuration Mounting

Project Configuration Structure

project/
├── housekeeper.yaml
└── db/
    ├── config.d/
    │   ├── _clickhouse.xml     # Generated by Housekeeper
    │   ├── cluster.xml         # Custom cluster config
    │   ├── users.xml           # Custom user config
    │   └── logging.xml         # Custom logging config
    ├── main.sql
    └── migrations/

Automatic Config Generation

Housekeeper automatically generates db/config.d/_clickhouse.xml:

<clickhouse>
    <!-- Cluster configuration -->
    <remote_servers>
        <my_cluster>
            <shard>
                <replica>
                    <host>localhost</host>
                    <port>9000</port>
                </replica>
            </shard>
        </my_cluster>
    </remote_servers>

    <!-- Keeper/Zookeeper for ReplicatedMergeTree -->
    <keeper_server>
        <tcp_port>9181</tcp_port>
        <server_id>1</server_id>
        <log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>
        <snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>

        <coordination_settings>
            <operation_timeout_ms>10000</operation_timeout_ms>
            <session_timeout_ms>30000</session_timeout_ms>
            <raft_logs_level>information</raft_logs_level>
        </coordination_settings>

        <raft_configuration>
            <server>
                <id>1</id>
                <hostname>localhost</hostname>
                <port>9234</port>
            </server>
        </raft_configuration>
    </keeper_server>

    <!-- Macros for ReplicatedMergeTree -->
    <macros>
        <cluster>my_cluster</cluster>
        <shard>01</shard>
        <replica>replica1</replica>
    </macros>
</clickhouse>

Volume Mounting

The Docker integration automatically mounts configuration:

# Container is started with:
docker run -d \
  --name housekeeper-clickhouse \
  -p 9000:9000 \
  -p 8123:8123 \
  -v /project/db/config.d:/etc/clickhouse-server/config.d \
  clickhouse/clickhouse-server:25.7

This enables: - Cluster Support: Full distributed DDL capabilities - ReplicatedMergeTree: Replicated table engines work correctly
- Custom Settings: Your specific ClickHouse configuration - Production Parity: Container behaves like your target environment

Migration Testing Workflow

Complete Migration Test

func TestMigrationWorkflow(t *testing.T) {
    // Initialize project
    proj := project.New(project.ProjectParams{
        Dir:       t.TempDir(),
        Formatter: format.New(format.Defaults),
    })

    err := proj.Initialize()
    require.NoError(t, err)

    // Write test schema
    schema := `
        CREATE DATABASE test_db ENGINE = Atomic;
        CREATE TABLE test_db.users (
            id UInt64,
            name String,
            created_at DateTime DEFAULT now()
        ) ENGINE = MergeTree() ORDER BY id;
    `

    err = os.WriteFile(filepath.Join(proj.Dir, "db/main.sql"), []byte(schema), 0644)
    require.NoError(t, err)

    // Start ClickHouse container
    dm := proj.NewDockerManager()

    ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
    defer cancel()

    err = dm.Start(ctx)
    require.NoError(t, err)
    defer dm.Stop(ctx)

    // Apply migration
    err = proj.ApplyMigrations(ctx, dm)
    require.NoError(t, err)

    // Verify schema was applied
    client, err := clickhouse.NewClient(ctx, dm.GetDSN())
    require.NoError(t, err)
    defer client.Close()

    tables, err := client.GetTables(ctx)
    require.NoError(t, err)

    // Verify table exists
    found := false
    for _, table := range tables.Statements {
        if table.CreateTable != nil && table.CreateTable.Name == "users" {
            found = true
            break
        }
    }
    require.True(t, found, "users table should exist")

    // Test data operations
    err = dm.Exec(ctx, "INSERT INTO test_db.users (id, name) VALUES (1, 'Alice')")
    require.NoError(t, err)

    result, err := dm.Query(ctx, "SELECT count() FROM test_db.users")
    require.NoError(t, err)
    require.Equal(t, "1", strings.TrimSpace(result))
}

Schema Evolution Testing

func TestSchemaEvolution(t *testing.T) {
    proj := setupProject(t)
    dm := proj.NewDockerManager()
    ctx := context.Background()

    // Start container
    err := dm.Start(ctx)
    require.NoError(t, err)
    defer dm.Stop(ctx)

    // Apply initial schema
    initialSchema := `
        CREATE DATABASE analytics ENGINE = Atomic;
        CREATE TABLE analytics.events (
            id UUID DEFAULT generateUUIDv4(),
            timestamp DateTime,
            event_type String
        ) ENGINE = MergeTree() ORDER BY timestamp;
    `

    err = writeSchema(proj, initialSchema)
    require.NoError(t, err)

    err = proj.ApplyMigrations(ctx, dm)
    require.NoError(t, err)

    // Insert test data
    err = dm.Exec(ctx, `
        INSERT INTO analytics.events (timestamp, event_type) VALUES
        ('2024-01-01 12:00:00', 'page_view'),
        ('2024-01-01 12:01:00', 'click')
    `)
    require.NoError(t, err)

    // Evolve schema - add new column
    evolvedSchema := `
        CREATE DATABASE analytics ENGINE = Atomic;
        CREATE TABLE analytics.events (
            id UUID DEFAULT generateUUIDv4(),
            timestamp DateTime,
            event_type String,
            user_id UInt64 DEFAULT 0  -- New column
        ) ENGINE = MergeTree() ORDER BY timestamp;
    `

    err = writeSchema(proj, evolvedSchema)
    require.NoError(t, err)

    // Apply evolution migration
    err = proj.ApplyMigrations(ctx, dm)
    require.NoError(t, err)

    // Verify data integrity after schema change
    result, err := dm.Query(ctx, "SELECT count(), max(user_id) FROM analytics.events")
    require.NoError(t, err)

    parts := strings.Fields(strings.TrimSpace(result))
    require.Equal(t, "2", parts[0], "Should have 2 events")
    require.Equal(t, "0", parts[1], "New column should have default value")

    // Test new column functionality
    err = dm.Exec(ctx, `
        INSERT INTO analytics.events (timestamp, event_type, user_id) 
        VALUES ('2024-01-01 12:02:00', 'purchase', 123)
    `)
    require.NoError(t, err)

    result, err = dm.Query(ctx, "SELECT user_id FROM analytics.events WHERE event_type = 'purchase'")
    require.NoError(t, err)
    require.Equal(t, "123", strings.TrimSpace(result))
}

Advanced Features

Multi-Container Testing

func TestClusterMigration(t *testing.T) {
    // Start multiple ClickHouse containers for cluster testing
    containers := []docker.Manager{}

    for i := 0; i < 3; i++ {
        opts := docker.DockerOptions{
            Version:       "25.7",
            ContainerName: fmt.Sprintf("ch-node-%d", i+1),
            TCPPort:       9000 + i,
            HTTPPort:      8123 + i,
        }

        dm := docker.NewWithOptions(opts)
        containers = append(containers, dm)

        err := dm.Start(context.Background())
        require.NoError(t, err)
        defer dm.Stop(context.Background())
    }

    // Test cluster-aware migration
    schema := `
        CREATE DATABASE cluster_db ON CLUSTER test_cluster ENGINE = Atomic;
        CREATE TABLE cluster_db.distributed_events ON CLUSTER test_cluster (
            id UInt64,
            data String
        ) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/events', '{replica}')
        ORDER BY id;
    `

    // Apply to each node
    for _, dm := range containers {
        err := dm.ExecSQL(context.Background(), schema)
        require.NoError(t, err)
    }
}

Performance Testing

func TestMigrationPerformance(t *testing.T) {
    proj := setupProject(t)
    dm := proj.NewDockerManager()
    ctx := context.Background()

    err := dm.Start(ctx)
    require.NoError(t, err)
    defer dm.Stop(ctx)

    // Create large table for performance testing
    largeTableSchema := `
        CREATE DATABASE perf_test ENGINE = Atomic;
        CREATE TABLE perf_test.large_table (
            id UInt64,
            data String,
            timestamp DateTime
        ) ENGINE = MergeTree() 
        PARTITION BY toYYYYMM(timestamp)
        ORDER BY id;
    `

    err = writeSchema(proj, largeTableSchema)
    require.NoError(t, err)

    // Measure migration time
    start := time.Now()
    err = proj.ApplyMigrations(ctx, dm)
    migrationTime := time.Since(start)

    require.NoError(t, err)
    t.Logf("Migration took %v", migrationTime)

    // Insert test data
    start = time.Now()
    err = dm.Exec(ctx, `
        INSERT INTO perf_test.large_table 
        SELECT number, toString(number), now() - INTERVAL number HOUR
        FROM numbers(1000000)
    `)
    insertTime := time.Since(start)

    require.NoError(t, err)
    t.Logf("Data insertion took %v", insertTime)

    // Test schema evolution on large table
    evolvedSchema := `
        CREATE DATABASE perf_test ENGINE = Atomic;
        CREATE TABLE perf_test.large_table (
            id UInt64,
            data String,
            timestamp DateTime,
            category LowCardinality(String) DEFAULT 'default'
        ) ENGINE = MergeTree() 
        PARTITION BY toYYYYMM(timestamp)
        ORDER BY id;
    `

    err = writeSchema(proj, evolvedSchema)
    require.NoError(t, err)

    start = time.Now()
    err = proj.ApplyMigrations(ctx, dm)
    evolutionTime := time.Since(start)

    require.NoError(t, err)
    t.Logf("Schema evolution took %v", evolutionTime)

    // Verify data integrity
    result, err := dm.Query(ctx, "SELECT count() FROM perf_test.large_table")
    require.NoError(t, err)
    require.Equal(t, "1000000", strings.TrimSpace(result))
}

ReplicatedMergeTree Testing

func TestReplicatedMergeTree(t *testing.T) {
    proj := setupProject(t)
    dm := proj.NewDockerManager()
    ctx := context.Background()

    err := dm.Start(ctx)
    require.NoError(t, err)
    defer dm.Stop(ctx)

    // Test ReplicatedMergeTree with keeper
    replicatedSchema := `
        CREATE DATABASE replicated_db ENGINE = Atomic;
        CREATE TABLE replicated_db.replicated_table (
            id UInt64,
            data String,
            created_at DateTime DEFAULT now()
        ) ENGINE = ReplicatedMergeTree('/clickhouse/tables/shard1/replicated_table', 'replica1')
        ORDER BY id;
    `

    err = writeSchema(proj, replicatedSchema)
    require.NoError(t, err)

    err = proj.ApplyMigrations(ctx, dm)
    require.NoError(t, err)

    // Test replication functionality
    err = dm.Exec(ctx, `
        INSERT INTO replicated_db.replicated_table (id, data) VALUES
        (1, 'test data 1'),
        (2, 'test data 2')
    `)
    require.NoError(t, err)

    // Verify data
    result, err := dm.Query(ctx, "SELECT count() FROM replicated_db.replicated_table")
    require.NoError(t, err)
    require.Equal(t, "2", strings.TrimSpace(result))

    // Test that table appears in system.replicas
    result, err = dm.Query(ctx, `
        SELECT count() FROM system.replicas 
        WHERE table = 'replicated_table'
    `)
    require.NoError(t, err)
    require.Equal(t, "1", strings.TrimSpace(result))
}

Error Handling and Troubleshooting

Common Issues and Solutions

Container Start Failures

func handleContainerStartError(err error) {
    switch {
    case strings.Contains(err.Error(), "port is already allocated"):
        log.Println("Port conflict - try different port or stop existing container")

    case strings.Contains(err.Error(), "no such image"):
        log.Println("Image not found - pulling ClickHouse image...")
        // Auto-pull image

    case strings.Contains(err.Error(), "timeout"):
        log.Println("Container start timeout - may need more time or resources")

    default:
        log.Printf("Unexpected error: %v", err)
    }
}

ClickHouse Ready Check

func waitForClickHouseReady(ctx context.Context, dsn string) error {
    client, err := clickhouse.NewClient(ctx, dsn)
    if err != nil {
        return err
    }
    defer client.Close()

    timeout := time.After(60 * time.Second)
    ticker := time.NewTicker(1 * time.Second)
    defer ticker.Stop()

    for {
        select {
        case <-timeout:
            return fmt.Errorf("timeout waiting for ClickHouse to be ready")

        case <-ticker.C:
            err := client.Ping(ctx)
            if err == nil {
                return nil
            }

        case <-ctx.Done():
            return ctx.Err()
        }
    }
}

Migration Failures

func handleMigrationError(err error, migration string) error {
    // Log container state for debugging
    logs, logErr := docker.GetContainerLogs("housekeeper-clickhouse")
    if logErr == nil {
        log.Printf("ClickHouse logs:\n%s", logs)
    }

    // Provide specific error guidance
    switch {
    case strings.Contains(err.Error(), "Table already exists"):
        return fmt.Errorf("table already exists - migration may have been partially applied: %w", err)

    case strings.Contains(err.Error(), "Syntax error"):
        return fmt.Errorf("SQL syntax error in migration %s: %w", migration, err)

    case strings.Contains(err.Error(), "Memory limit"):
        return fmt.Errorf("memory limit exceeded - consider increasing container memory: %w", err)

    default:
        return fmt.Errorf("migration failed: %w", err)
    }
}

CI/CD Integration

GitHub Actions Workflow

# .github/workflows/migration-test.yml
name: Migration Testing

on:
  pull_request:
    paths:
      - 'db/**'

jobs:
  test-migrations:
    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v4

    - name: Setup Go
      uses: actions/setup-go@v4
      with:
        go-version: '1.21'

    - name: Install Housekeeper
      run: go install github.com/pseudomuto/housekeeper@latest

    - name: Test Migration Generation
      run: |
        # Test that migrations can be generated
        housekeeper diff --dry-run

    - name: Test Migration Application
      run: |
        # Start ClickHouse container
        docker run -d \
          --name test-clickhouse \
          -p 9000:9000 \
          -e CLICKHOUSE_USER=default \
          -e CLICKHOUSE_PASSWORD= \
          clickhouse/clickhouse-server:25.7

        # Wait for ready
        sleep 10

        # Start development server (applies migrations automatically)
        housekeeper dev up

        # Verify schema
        housekeeper schema dump --url localhost:9000 > applied_schema.sql
        housekeeper schema compile > expected_schema.sql

        # Compare schemas (allowing for minor formatting differences)
        if ! diff -w applied_schema.sql expected_schema.sql; then
          echo "Schema mismatch detected"
          exit 1
        fi

    - name: Cleanup
      if: always()
      run: |
        docker stop test-clickhouse || true
        docker rm test-clickhouse || true

Docker Compose Testing

# docker-compose.test.yml
version: '3.8'

services:
  clickhouse:
    image: clickhouse/clickhouse-server:25.7
    ports:
      - "9000:9000"
      - "8123:8123"
    volumes:
      - ./db/config.d:/etc/clickhouse-server/config.d
    environment:
      CLICKHOUSE_USER: default
      CLICKHOUSE_PASSWORD: ""
    healthcheck:
      test: ["CMD", "clickhouse-client", "--query", "SELECT 1"]
      interval: 5s
      timeout: 3s
      retries: 5

  migration-test:
    build: .
    depends_on:
      clickhouse:
        condition: service_healthy
    command: |
      bash -c "
        housekeeper dev up &&
        housekeeper schema dump --url clickhouse:9000 > /tmp/applied.sql &&
        housekeeper schema compile > /tmp/expected.sql &&
        diff -w /tmp/applied.sql /tmp/expected.sql
      "
    volumes:
      - .:/workspace
    working_dir: /workspace

Best Practices

Resource Management

// Always use context with timeout
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
defer cancel()

// Always cleanup containers
defer func() {
    if err := dm.Stop(ctx); err != nil {
        log.Printf("Warning: failed to stop container: %v", err)
    }
}()

// Use appropriate resource limits
opts := docker.DockerOptions{
    Memory:     "2g",           // Limit memory usage
    CPUs:       "1.0",          // Limit CPU usage
    ShmSize:    "128m",         // Shared memory size
}

Test Organization

// Use test helpers for common patterns
func setupTestProject(t *testing.T) (*project.Project, docker.Manager) {
    proj := project.New(project.ProjectParams{
        Dir:       t.TempDir(),
        Formatter: format.New(format.Defaults),
    })

    err := proj.Initialize()
    require.NoError(t, err)

    dm := proj.NewDockerManager()

    return proj, dm
}

// Use subtests for organized testing
func TestMigrationSuite(t *testing.T) {
    proj, dm := setupTestProject(t)

    ctx := context.Background()
    err := dm.Start(ctx)
    require.NoError(t, err)
    defer dm.Stop(ctx)

    t.Run("BasicSchema", func(t *testing.T) {
        // Test basic schema application
    })

    t.Run("SchemaEvolution", func(t *testing.T) {
        // Test schema changes
    })

    t.Run("DataMigration", func(t *testing.T) {
        // Test data migration scenarios
    })
}

The Docker integration makes it easy to test ClickHouse schemas and migrations in a consistent, reproducible environment that closely matches production deployments.

Next Steps