Configuration¶
Learn how to configure Housekeeper for your environment and customize its behavior.
Configuration File¶
Housekeeper uses a YAML configuration file (housekeeper.yaml
) to define project settings:
# Basic configuration
clickhouse:
version: "25.7" # ClickHouse version for Docker
config_dir: "db/config.d" # ClickHouse configuration directory
cluster: "cluster" # Default cluster name
ignore_databases: [] # Databases to exclude from schema operations
entrypoint: db/main.sql # Main schema file
dir: db/migrations # Migration output directory
Configuration Sections¶
ClickHouse Settings¶
Configure ClickHouse-specific options:
clickhouse:
# Docker container version
version: "25.7" # Specific version for development containers
# Configuration directory for cluster setup
config_dir: "db/config.d" # Relative to project root
# Default cluster name for ON CLUSTER operations
cluster: "cluster" # Used for distributed DDL statements
# Databases to exclude from schema operations
ignore_databases: # Useful for test/staging databases
- testing_db
- temp_analytics
Schema Configuration¶
Configure schema-related settings:
# Schema entrypoint and organization
entrypoint: db/main.sql # Main schema file with imports
dir: db/migrations # Migration output directory
The configuration is intentionally simple - Housekeeper follows convention over configuration principles.
Ignoring Databases¶
The ignore_databases
configuration allows you to exclude specific databases from schema operations like diff
and dump
. This is particularly useful for:
- Testing databases: Keep test databases separate from production schemas
- Temporary databases: Exclude temporary or experimental databases
- System databases: Additional system databases beyond the defaults
clickhouse:
ignore_databases:
- testing_db # Development testing database
- staging_temp # Temporary staging experiments
- analytics_v1 # Old version being phased out
Ignored databases will be completely excluded from:
- Schema dumps (housekeeper schema dump
)
- Migration generation (housekeeper diff
)
- Bootstrap operations (housekeeper bootstrap
)
You can also specify ignored databases via the command line for one-off operations:
# Exclude databases when dumping schema
housekeeper schema dump --url localhost:9000 \
--ignore-databases testing_db \
--ignore-databases temp_db
Note: System databases (default
, system
, information_schema
, INFORMATION_SCHEMA
) are always excluded automatically.
Environment-Specific Configuration¶
Development Configuration¶
# development.yaml
clickhouse:
version: "25.7"
config_dir: "db/config.d"
cluster: "dev_cluster"
entrypoint: db/main.sql
dir: db/migrations
Staging Configuration¶
# staging.yaml
clickhouse:
version: "25.7"
config_dir: "db/config.d"
cluster: "staging_cluster"
entrypoint: db/main.sql
dir: db/migrations
Production Configuration¶
# production.yaml
clickhouse:
version: "25.7" # Pin specific version
config_dir: "db/config.d"
cluster: "production_cluster"
entrypoint: db/main.sql
dir: db/migrations
Environment Variables¶
Use environment variables for sensitive configuration:
Connection Credentials¶
The recommended way to configure database connections is using the HOUSEKEEPER_DATABASE_URL
environment variable:
# Recommended: Single connection URL (used by all commands)
export HOUSEKEEPER_DATABASE_URL="localhost:9000"
# Or with full DSN including authentication
export HOUSEKEEPER_DATABASE_URL="clickhouse://myuser:secretpassword@localhost:9000/mydb"
# Or with TCP protocol and parameters
export HOUSEKEEPER_DATABASE_URL="tcp://localhost:9000?username=myuser&password=secretpassword&database=mydb"
Once set, all commands will use this connection automatically:
# No need to specify --url flag when environment variable is set
housekeeper migrate
housekeeper status
housekeeper schema dump
housekeeper bootstrap
Alternative: Command-Line Flags¶
You can also specify the connection directly using the --url
flag:
housekeeper migrate --url localhost:9000
housekeeper status --url "clickhouse://user:pass@host:9000/db"
Configuration in YAML¶
Environment variables are used by the CLI tools for connection parameters, but are not part of the YAML configuration file. The HOUSEKEEPER_DATABASE_URL
environment variable or --url
flag provides the connection details when running commands.
Environment-Specific Variables¶
# Development
export HOUSEKEEPER_ENV=development
export CH_HOST=localhost
export CH_PASSWORD=devpassword
# Staging
export HOUSEKEEPER_ENV=staging
export CH_HOST=staging-clickhouse.internal
export CH_PASSWORD=stagingpassword
# Production
export HOUSEKEEPER_ENV=production
export CH_HOST=prod-clickhouse.example.com
export CH_PASSWORD=productionpassword
Command-Line Overrides¶
Override configuration values via command-line flags:
# Override connection settings
housekeeper diff \
--host clickhouse-prod.example.com \
--port 9440 \
--database analytics \
--username admin \
--cluster production
# Start development server with specific project directory
housekeeper dev up -d production
# Use specific project directory (which contains housekeeper.yaml)
housekeeper diff -d production
ClickHouse Configuration¶
Cluster Configuration¶
Configure ClickHouse clusters in db/config.d/_clickhouse.xml
:
<clickhouse>
<!-- Cluster configuration -->
<remote_servers>
<my_cluster>
<shard>
<replica>
<host>clickhouse-1.example.com</host>
<port>9000</port>
</replica>
<replica>
<host>clickhouse-2.example.com</host>
<port>9000</port>
</replica>
</shard>
<shard>
<replica>
<host>clickhouse-3.example.com</host>
<port>9000</port>
</replica>
<replica>
<host>clickhouse-4.example.com</host>
<port>9000</port>
</replica>
</shard>
</my_cluster>
</remote_servers>
<!-- Zookeeper configuration for ReplicatedMergeTree -->
<zookeeper>
<node>
<host>zookeeper-1.example.com</host>
<port>2181</port>
</node>
<node>
<host>zookeeper-2.example.com</host>
<port>2181</port>
</node>
<node>
<host>zookeeper-3.example.com</host>
<port>2181</port>
</node>
</zookeeper>
<!-- Macros for replica names -->
<macros>
<cluster>my_cluster</cluster>
<shard>01</shard>
<replica>replica1</replica>
</macros>
</clickhouse>
User Configuration¶
Configure users and permissions:
<clickhouse>
<users>
<!-- Migration user with DDL permissions -->
<migration_user>
<password>your_secure_password</password>
<profile>default</profile>
<quota>default</quota>
<allow_databases>
<database>analytics</database>
<database>reporting</database>
</allow_databases>
<access_management>1</access_management>
</migration_user>
<!-- Read-only user for applications -->
<app_user>
<password>app_password</password>
<profile>readonly</profile>
<quota>default</quota>
<allow_databases>
<database>analytics</database>
</allow_databases>
</app_user>
</users>
</clickhouse>
Validation¶
Validate your configuration:
# Validate schema syntax
housekeeper schema compile
# Test database connection by dumping schema
housekeeper schema dump --url localhost:9000
# Show available commands and options
housekeeper --help
Configuration Best Practices¶
Security¶
- Use Environment Variables: Never commit passwords to version control
- Principle of Least Privilege: Use dedicated migration users with minimal permissions
- TLS in Production: Always use secure connections in production
- Certificate Validation: Don't skip TLS certificate verification in production
Environment Management¶
- Separate Configurations: Use different config files for each environment
- Version Control: Commit configuration templates, not actual credentials
- Documentation: Document configuration changes and their purpose
- Validation: Always validate configuration before deployment
Performance¶
- Connection Pooling: Configure appropriate connection pool sizes
- Timeouts: Set reasonable timeouts for your environment
- Batch Sizes: Optimize batch sizes for your migration patterns
- Resource Limits: Set appropriate memory and connection limits
Maintenance¶
- Regular Reviews: Review and update configurations regularly
- Monitoring: Monitor configuration-related metrics
- Backup: Backup configuration files along with schemas
- Change Management: Track configuration changes like code changes
Next Steps¶
- Schema Management - Learn about schema design patterns
- Migration Process - Understand how migrations work
- Cluster Management - Configure distributed ClickHouse
- Troubleshooting - Solve configuration issues