Troubleshooting¶
This guide helps you diagnose and resolve common issues when using Housekeeper with ClickHouse.
Installation Issues¶
Binary Download Problems¶
Issue: Cannot download or execute Housekeeper binary
Solutions:
# Check if binary is executable
chmod +x housekeeper
# Verify binary integrity
housekeeper --version
# Download specific version
curl -L https://github.com/pseudomuto/housekeeper/releases/download/v1.0.0/housekeeper_linux_amd64 -o housekeeper
Go Installation Issues¶
Issue: Build from source fails
Solutions:
# Verify Go version (requires 1.21+)
go version
# Clean module cache
go clean -modcache
# Rebuild with verbose output
go build -v ./cmd/housekeeper
Connection Issues¶
DSN Format Problems¶
Issue: Cannot connect to ClickHouse
Common Formats:
# Start development server and generate diff
housekeeper dev up
housekeeper diff
# For schema extraction, use schema dump with different connection strings:
# Basic host:port
housekeeper schema dump --url "localhost:9000"
# With authentication
housekeeper schema dump --url "clickhouse://user:password@host:9000/database"
# TCP protocol
housekeeper schema dump --url "tcp://host:9000?username=user&password=pass"
# Secure connection
housekeeper schema dump --url "clickhouse://user:pass@host:9440/db?secure=true"
Network Connectivity¶
Issue: Connection timeouts or refused connections
Diagnosis:
# Test basic connectivity
telnet clickhouse-host 9000
# Check ClickHouse status
curl http://clickhouse-host:8123/ping
# Test authentication
curl -u user:password http://clickhouse-host:8123/ping
SSL/TLS Issues¶
Issue: SSL connection failures
Solutions:
# Skip SSL verification (not recommended for production)
housekeeper schema dump --url "clickhouse://user:pass@host:9440/db?secure=true&skip_verify=true"
# Use proper certificates
housekeeper schema dump --url "clickhouse://user:pass@host:9440/db?secure=true&ca_cert=/path/to/ca.pem"
Parsing Errors¶
Syntax Errors¶
Issue: SQL parsing fails
Example Error:
Solutions:
-
Check ClickHouse version compatibility:
-
Validate SQL syntax:
-
Common syntax issues:
Unsupported Features¶
Issue: Parser doesn't recognize specific ClickHouse syntax
Workarounds:
- Check supported features in the project documentation
- File an issue on GitHub for missing features
- Use alternative syntax if available
Import Resolution¶
Issue: Schema imports not found
Example Error:
Solutions:
-- ❌ Incorrect - wrong path
-- housekeeper:import ../missing/file.sql
-- ✅ Correct - relative to current file
-- housekeeper:import ./tables/users.sql
-- ✅ Correct - absolute path from project root
-- housekeeper:import schemas/tables/users.sql
Migration Issues¶
No Differences Detected¶
Issue: Expected changes not generating migrations
Diagnosis:
# Debug schema parsing
housekeeper schema --input ./db/main.sql --debug
# Compare parsed schemas
housekeeper fmt --input ./current.sql > current_formatted.sql
housekeeper fmt --input ./target.sql > target_formatted.sql
diff current_formatted.sql target_formatted.sql
Common Causes:
- Whitespace differences: Use
fmtcommand to normalize - Comment differences: Comments don't affect migrations
- Order differences: Statement order matters for comparison
Validation Errors¶
Issue: Migration validation fails
Example Errors:
Error: unsupported operation: dictionary ALTER operations not supported
Error: cluster configuration changes not supported
Error: engine type changes not supported
Solutions:
-
Dictionary changes: Use CREATE OR REPLACE instead of ALTER
-
Cluster changes: Keep cluster configuration consistent
-
Engine changes: Use DROP+CREATE manually
Migration File Issues¶
Issue: Generated migration files have problems
Solutions:
# Validate migration syntax
housekeeper fmt --input ./db/migrations/20240806143022.sql
# Test migration on staging
clickhouse-client --queries-file ./db/migrations/20240806143022.sql
# Check migration integrity
housekeeper status --migrations ./db/migrations/
Partial Migration Failures & Recovery¶
Housekeeper automatically handles partial migration failures, but you may need to troubleshoot issues with the recovery process.
Understanding Partial Failures¶
What happens when a migration fails partway through?
-- Migration: 20240101120000_setup_analytics.sql
CREATE DATABASE analytics; -- ✅ Statement 1: Success
CREATE TABLE analytics.events (...); -- ✅ Statement 2: Success
CREATE TABLE analytics.users (...); -- ❌ Statement 3: Failed
CREATE VIEW analytics.summary (...); -- ⏸ Statement 4: Not executed
Housekeeper automatically: 1. Records successful statements (1-2) in revision table 2. Records the error at statement 3 3. Stores statement hashes for integrity validation 4. Sets up automatic resume for next migration run
Automatic Recovery Process¶
Issue: You want to understand what will be resumed
Diagnosis:
# Check status to see partial migrations
housekeeper status --url localhost:9000 --verbose
# See what would be resumed with dry-run
housekeeper migrate --url localhost:9000 --dry-run
# Examine revision table directly
echo "SELECT version, applied, total, error, executed_at FROM housekeeper.revisions WHERE error IS NOT NULL ORDER BY executed_at DESC;" | clickhouse-client
Example Output:
Found 1 partially applied migration(s) that will be resumed:
⚠️ 20240101120000_setup_analytics: 2/4 statements applied
Last error: Table 'users' already exists
Will resume with 2 remaining statement(s)
Troubleshooting Resume Issues¶
Issue: Automatic resume fails with validation errors
Common Errors:
- Hash Mismatch (Migration File Modified):
Solutions:
# ❌ Don't modify migration files after partial execution
# ✅ Either restore original file or create new migration
# Option 1: Restore original migration file
git checkout db/migrations/20240101120000_setup_analytics.sql
# Option 2: Create new migration with remaining changes
housekeeper diff # Generates new migration with remaining changes
- Statement Count Mismatch:
Solutions:
# Migration file had statements added/removed after partial execution
# Restore original file or clean up partial state
# Option 1: Restore original file
git checkout db/migrations/20240101120000_setup_analytics.sql
# Option 2: Manually clean up partial state (advanced)
DELETE FROM housekeeper.revisions WHERE version = '20240101120000_setup_analytics';
- Dependency Issues During Resume:
Solution: Check if dependent objects were created successfully
-- Check which objects exist
SELECT name FROM system.tables WHERE database = 'analytics';
-- Verify expected objects are present before resuming
Manual Recovery (Advanced)¶
Issue: Automatic recovery is not working and you need manual intervention
⚠️ Warning: Only use manual recovery if automatic resume fails
# 1. Examine current state
housekeeper status --url localhost:9000 --verbose
housekeeper schema dump --url localhost:9000
# 2. Check revision table
echo "SELECT * FROM housekeeper.revisions WHERE version = '20240101120000_setup_analytics';" | clickhouse-client --vertical
# 3. Options for manual recovery:
# Option A: Delete partial revision and restart migration
echo "DELETE FROM housekeeper.revisions WHERE version = '20240101120000_setup_analytics';" | clickhouse-client
housekeeper migrate --url localhost:9000 # Restarts from beginning
# Option B: Fix the underlying issue and resume automatically
# (Fix whatever caused the original failure, then run)
housekeeper migrate --url localhost:9000 # Will resume automatically
# Option C: Mark migration as completed manually (if statements were applied outside Housekeeper)
echo "UPDATE housekeeper.revisions SET applied = total, error = NULL WHERE version = '20240101120000_setup_analytics';" | clickhouse-client
Prevention Best Practices¶
- Test migrations in development first
- Don't modify migration files after they've been partially executed
- Monitor ClickHouse logs during migration execution
- Ensure sufficient resources (disk space, memory) before running large migrations
- Use
--dry-runto preview what will be executed
Recovery Verification¶
After successful recovery:
# Verify all migrations are completed
housekeeper status --url localhost:9000
# Check that schema matches expectations
housekeeper diff # Should show "No differences found"
# Verify specific objects exist
housekeeper schema dump --url localhost:9000 | grep "CREATE TABLE"
Docker Integration Issues¶
Container Startup Problems¶
Issue: ClickHouse container fails to start
Solutions:
// Increase startup timeout
container := docker.NewWithOptions(docker.DockerOptions{
Version: "25.7",
StartupTimeout: 60 * time.Second,
})
// Check container logs
logs, err := container.GetLogs()
fmt.Println(logs)
Volume Mount Issues¶
Issue: Configuration not loaded in container
Solutions:
// Ensure config directory exists
if err := os.MkdirAll("./config.d", 0755); err != nil {
log.Fatal(err)
}
// Mount with proper permissions
container := docker.NewWithOptions(docker.DockerOptions{
ConfigDir: "./config.d", // Must contain _clickhouse.xml
})
Port Conflicts¶
Issue: Port already in use errors
Solutions:
# Find processes using ClickHouse ports
lsof -i :9000
lsof -i :8123
# Kill existing ClickHouse processes
pkill clickhouse-server
# Use Docker port mapping
docker run -p 9001:9000 -p 8124:8123 clickhouse/clickhouse-server
Performance Issues¶
Slow Parsing¶
Issue: Large schema files take too long to parse
Solutions:
-
Split large files:
-
Remove unnecessary comments and whitespace:
-
Use streaming for very large files:
Memory Usage¶
Issue: High memory consumption during parsing
Solutions:
- Process files individually
- Increase system memory
- Use pagination for very large schemas
ClickHouse-Specific Issues¶
Permission Errors¶
Issue: Insufficient privileges for DDL operations
Solutions:
-- Grant necessary permissions
GRANT CREATE ON *.* TO user;
GRANT DROP ON *.* TO user;
GRANT ALTER ON *.* TO user;
-- For cluster operations
GRANT CLUSTER ON *.* TO user;
Cluster Issues¶
Issue: Distributed DDL operations fail
Diagnosis:
-- Check cluster configuration
SELECT * FROM system.clusters WHERE cluster = 'production';
-- Monitor distributed DDL queue
SELECT * FROM system.distributed_ddl_queue;
-- Check replication status
SELECT * FROM system.replicas;
Solutions:
-
Increase timeouts:
-
Check ZooKeeper connectivity:
-
Verify network connectivity between nodes
Disk Space Issues¶
Issue: Operations fail due to insufficient disk space
Solutions:
-- Check disk usage
SELECT * FROM system.disks;
-- Clean up old data
OPTIMIZE TABLE table_name FINAL;
-- Drop unused tables/databases
DROP TABLE IF EXISTS unused_table;
Debug Commands¶
Schema Analysis¶
# Parse and format schema
housekeeper fmt schema.sql
# Compile and show schema structure
housekeeper schema compile
# Generate migration (runs against development server)
housekeeper diff
Connection Testing¶
# Test basic connection
housekeeper schema dump --url localhost:9000
# Test cluster connection with cluster injection
housekeeper schema dump --url localhost:9000 --cluster test_cluster
Migration Debugging¶
# Validate migration files
housekeeper status --migrations ./db/migrations/
# Rehash migration files
housekeeper rehash --migrations ./db/migrations/
# Test migration syntax
clickhouse-client --dry-run --queries-file migration.sql
Getting Help¶
Log Collection¶
When reporting issues, include:
- Housekeeper version:
housekeeper --version - ClickHouse version:
SELECT version() - Full command and error output
- Relevant schema files (sanitized)
- System information: OS, architecture
Common Solutions¶
- Update to latest version
- Check GitHub issues for similar problems
- Verify ClickHouse compatibility
- Test with minimal reproduction case
Filing Issues¶
When filing GitHub issues:
- Use issue templates
- Provide minimal reproduction
- Include version information
- Describe expected vs actual behavior
- Share relevant logs and configurations
Community Resources¶
- GitHub Discussions: General questions and usage help
- GitHub Issues: Bug reports and feature requests
- Documentation: Complete API and usage guides
- Examples: Reference implementations and patterns