Quick Start¶

This guide will get you up and running with Housekeeper in under 10 minutes.

Step 1: Initialize a New Project¶

Create a new directory for your ClickHouse project and initialize it:

# Create and enter project directory
mkdir my-clickhouse-project
cd my-clickhouse-project

# Initialize the project structure
housekeeper init

This creates the following structure:

my-clickhouse-project/
├── housekeeper.yaml        # Project configuration
├── db/
│   ├── main.sql            # Main schema entrypoint
│   ├── config.d/           # ClickHouse configuration files
│   │   └── _clickhouse.xml # Generated cluster configuration
│   ├── migrations/         # Generated migration files (empty initially)
│   └── schemas/            # Organized schema files (optional)

Step 2: Define Your Schema¶

Edit the db/main.sql file to define your ClickHouse schema:

-- Create analytics database
CREATE DATABASE analytics ENGINE = Atomic COMMENT 'Analytics database';

-- Create users table
CREATE TABLE analytics.users (
    id UInt64,
    email String,
    name String,
    created_at DateTime DEFAULT now(),
    metadata Map(String, String) DEFAULT map()
) ENGINE = ReplacingMergeTree()
ORDER BY id
SETTINGS index_granularity = 8192;

-- Create events table  
CREATE TABLE analytics.events (
    id UUID DEFAULT generateUUIDv4(),
    user_id UInt64,
    event_type LowCardinality(String),
    timestamp DateTime DEFAULT now(),
    properties Map(String, String) DEFAULT map()
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (timestamp, event_type)
TTL timestamp + INTERVAL 90 DAY
SETTINGS index_granularity = 8192;

-- Create daily summary view
CREATE MATERIALIZED VIEW analytics.daily_summary
ENGINE = MergeTree() ORDER BY date
AS SELECT 
    toDate(timestamp) as date,
    event_type,
    count() as event_count,
    uniq(user_id) as unique_users
FROM analytics.events
GROUP BY date, event_type;

Step 3: Compile and Validate¶

Compile your schema to ensure it's valid:

# Compile schema and check for syntax errors
housekeeper schema compile

# View the compiled output
housekeeper schema compile --out compiled_schema.sql
cat compiled_schema.sql

Modular Schemas

For larger projects, you can split your schema into multiple files using import directives. See Project Setup for details.

Step 4: Start Development Server and Generate Migration¶

Housekeeper uses a development workflow where you start a local ClickHouse server, apply existing migrations, then compare your schema:

# Start local ClickHouse development server
housekeeper dev up

This starts a Docker container with ClickHouse and applies any existing migrations. Since this is your first time, there are no migrations yet.

Now generate a migration by comparing your schema:

# Generate migration by comparing schema with current database state
housekeeper diff

Since this is your first migration, Housekeeper will create a migration file containing all your schema definitions. The file will be created in db/migrations/ with a timestamp prefix like 20240101120000.sql.

Example migration content:

-- Schema migration generated at 2024-01-01 12:00:00 UTC

-- Create database 'analytics'
CREATE DATABASE analytics ENGINE = Atomic COMMENT 'Analytics database';

-- Create table 'analytics.users'
CREATE TABLE analytics.users (
    id UInt64,
    email String,
    name String,
    created_at DateTime DEFAULT now(),
    metadata Map(String, String) DEFAULT map()
) ENGINE = ReplacingMergeTree()
ORDER BY id
SETTINGS index_granularity = 8192;

-- Create table 'analytics.events'
CREATE TABLE analytics.events (
    id UUID DEFAULT generateUUIDv4(),
    user_id UInt64,
    event_type LowCardinality(String),
    timestamp DateTime DEFAULT now(),
    properties Map(String, String) DEFAULT map()
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (timestamp, event_type)
TTL timestamp + INTERVAL 90 DAY
SETTINGS index_granularity = 8192;

-- Create materialized view 'analytics.daily_summary'
CREATE MATERIALIZED VIEW analytics.daily_summary
ENGINE = MergeTree() ORDER BY date
AS SELECT 
    toDate(timestamp) as date,
    event_type,
    count() as event_count,
    uniq(user_id) as unique_users
FROM analytics.events
GROUP BY date, event_type;

Step 5: Development Workflow¶

The development workflow in Housekeeper is:

Start dev server - Starts ClickHouse with existing migrations applied
Make schema changes - Edit your schema files
Generate migration - Compare schema with database state
Restart dev server - Apply new migration and continue

# Stop the current dev server
housekeeper dev down

# Start fresh with all migrations (including the new one)
housekeeper dev up

Your new migration is automatically applied when the dev server starts.

Step 6: Make Changes and Generate Another Migration¶

Let's add a new table to demonstrate the migration workflow:

Add this to your db/main.sql:

-- Add after the events table
CREATE TABLE analytics.products (
    id UInt64,
    name String,
    category_id UInt32,
    price Decimal64(2),
    created_at DateTime DEFAULT now()
) ENGINE = ReplacingMergeTree()
ORDER BY id;

Generate a new migration:

# Generate migration for the new changes
housekeeper diff

Housekeeper will detect that you've added a new table and generate a migration containing just the new CREATE TABLE statement.

Next Steps¶

Congratulations! You've successfully:

✅ Created a Housekeeper project
✅ Defined a ClickHouse schema
✅ Generated your first migration
✅ Applied changes to ClickHouse
✅ Created an incremental migration

Learn More¶

Project Setup - Organize complex schemas with imports and modular structure
Schema Management - Learn best practices for schema design
Migration Process - Understand how migrations work under the hood
Configuration - Customize Housekeeper for your environment

Working with Existing Databases¶

If you have an existing ClickHouse database, use the bootstrap workflow:

# Step 1: Extract schema from existing database
housekeeper bootstrap --url localhost:9000

# Step 2: Create initial snapshot (no migrations required)
housekeeper snapshot --bootstrap --description "Initial baseline"

# Step 3: Now you can generate migrations normally
housekeeper diff

The --bootstrap flag creates a snapshot from your project schema instead of existing migrations, solving the chicken-and-egg problem when starting with an existing database.

Common Next Steps¶

Set up CI/CD: Integrate Housekeeper into your deployment pipeline
Cluster Configuration: Configure for distributed ClickHouse deployments
Migration Consolidation: Use housekeeper snapshot to consolidate old migrations
Team Workflow: Establish schema change processes with your team

Getting Help¶

Check out the Examples for more schema patterns
Read the Troubleshooting Guide for common issues
Open an issue if you need help