Partitioning and Indexing Strategies

**Referenced Files in This Document** - [[protocols-db.md]](file/ui-web/.agent/rules/protocols-db.md) - [[schema.sql (bi-tenant)]](file/bi-tenant/docs/database/schema.sql) - [[schema.sql (bi-sys)]](file/bi-sys/docs/database/schema.sql) - [[xcbi.sql]](file/bi-chat/bi-chat/src/db/xcbi.sql) - [[table-templates.md]](file/bi-basic/.agent/skills/bi-database-design/references/table-templates.md) - [[Infrastructure.md]](file/ui-web-docs/pages/zh/xcbi-dev/.md) - [[db_client.py]](file/mcp-server-starrocks/src/mcp-server-starrocks/db-client.py)

Introduction
Project Structure
Core Components
Architecture Overview
Detailed Component Analysis
Dependency Analysis
Performance Considerations
Troubleshooting Guide
Conclusion

Introduction

This document explains StarRocks partitioning and indexing strategies used in the BI Analysis Platform. It focuses on:

Hash partitioning and bucket allocation via DISTRIBUTED BY HASH for balanced query distribution
Bitmap index usage for low-cardinality filtering columns such as status, deleted_at, and tenant_id
Primary key and unique index patterns for data integrity
Index selection criteria aligned with query patterns and trade-offs between maintenance overhead and query performance
Partition sizing recommendations, bucket number optimization, and maintenance procedures
Impact of partitioning on data loading performance and query execution plans

Project Structure

The repository contains multiple modules that define StarRocks schemas and operational guidelines:

Database design rules and recommendations for StarRocks
Concrete StarRocks table definitions across modules (tenant, system, chat analytics)
Infrastructure notes on StarRocks usage and index strategies
Tools for profiling StarRocks queries

Diagram sources

Section sources

Core Components

Partitioning and distribution
- Hash distribution by high-cardinality keys ensures even data spread across backends
- Dynamic partitions enable automated monthly partition creation and retention policies
Indexing
- Bitmap indexes on low-cardinality filters (status, deleted_at, tenant_id) accelerate predicate evaluation
- Persistent index enabled for primary-key tables to improve scan performance
Query profiling
- Query dump and profile collection support performance diagnostics and plan inspection

Key implementation references:

Hash distribution and dynamic partitioning in tenant and system modules
Bitmap indexes on status and deleted_at fields
Persistent index property on primary-key tables
Query profiling utilities

Section sources

Architecture Overview

The BI platform leverages StarRocks for analytical workloads with:

Fact tables using duplicate key for event logs with dynamic partitions
Dimension tables using primary key with persistent index and bitmap indexes
Distributed hashing on high-cardinality keys to balance load
Query profiling to validate execution plans and optimize further

Diagram sources

Detailed Component Analysis

Hash Partitioning and Bucket Allocation

Distribution strategy
- Tables distribute data using hash on high-cardinality keys (e.g., tenant_id, user_id, log_id) to achieve uniform write/read distribution
- Bucket count set to moderate values to balance parallelism and metadata overhead
Dynamic partitions
- Monthly partitioning with automatic creation and retention windows reduces manual maintenance and supports time-based pruning
Recommendations
- Align bucket count with cluster size and expected row counts per tablet
- Prefer colocate join for frequently joined large tables by aligning distribution keys and bucket counts

Diagram sources

Section sources

Bitmap Index Implementation

Purpose
- Accelerate filtering on low-cardinality columns (status, deleted_at, tenant_id) commonly used in WHERE clauses
Placement
- Created on dimension and fact tables where these filters are frequent
Trade-offs
- Bitmap indexes reduce scan time but increase storage and update overhead during writes
- Recommended for static or slowly changing attributes; avoid on very high-cardinality columns

Diagram sources

Section sources

Primary Key Design Patterns and Unique Index Strategies

Primary key tables
- Use primary key model for business entities requiring updatable/deletable records
- Define ORDER BY key as the primary key to leverage prefix index and sort efficiency
- Enable persistent index to improve read performance on scans
Unique constraints
- Enforce uniqueness on identifiers (e.g., tenant_id, user_id) to maintain referential integrity
Recommendations
- Keep primary key columns non-null and constrained
- Place frequently filtered columns early in sort key to maximize prefix index hit rate

Diagram sources

Section sources

Index Selection Criteria Based on Query Patterns

Choose bitmap indexes for:
- Low-cardinality filters (status, is_deleted, tenant_id)
- Frequent equality or IN predicates
Avoid bitmap indexes for:
- Very high-cardinality columns (e.g., IDs used solely for joins)
Bloom filter considerations:
- Enable bloom filters on high-cardinality columns used in equality predicates to reduce probe cost
Persistent index:
- Beneficial for primary-key tables with wide scans and frequent reads

Diagram sources

Section sources

Partition Sizing Recommendations and Bucket Number Optimization

Partition sizing
- Use monthly partitions for time-series logs; adjust window size based on retention needs and ingestion volume
- Ensure partition pruning is effective by pushing time filters to leverage range partitions
Bucket optimization
- Target 1–10 GB per tablet; compute buckets ≈ total_size / 5 GB
- For small tables (<100M rows), set buckets to 3–5 × number of BE nodes or use auto bucketing
- Avoid too few buckets (metadata pressure) or too many buckets (reduced parallelism)
Co-location joins
- Align distribution keys and bucket counts for frequently joined large tables to eliminate network shuffle

Diagram sources

[protocols-db.md]

Section sources

[protocols-db.md]

Index Maintenance Procedures

Monitoring
- Track write amplification caused by bitmap and persistent index updates
- Observe partition pruning effectiveness and bucket skew
Tuning
- Adjust bucket counts and tablet sizes based on growth trends
- Rebuild or reorganize partitions periodically for skewed distributions
Operational notes
- Use dynamic partitions to automate lifecycle management
- Apply retention policies to drop old partitions regularly

[No sources needed since this section provides general guidance]

Impact on Data Loading Performance and Query Execution Plans

Loading
- Hash distribution balances write throughput; ensure adequate bucket count to prevent hotspots
- Dynamic partitions simplify retention and reduce administrative overhead
Query plans
- Bitmap indexes reduce row groups scanned for low-cardinality filters
- Persistent index improves scan performance on primary-key tables
- Use query profiling to inspect execution plans and validate index usage

Diagram sources

[db_client.py]

Section sources

Dependency Analysis

Table definitions depend on:
- Design rules for partitioning, distribution, and index selection
- Module-specific schemas that demonstrate practical usage
Query profiling depends on:
- StarRocks client utilities to capture execution dumps and profiles

Diagram sources

Section sources

Performance Considerations

Prefer monthly dynamic partitions for time-series logs to simplify lifecycle management
Use bitmap indexes on low-cardinality filters; avoid on high-cardinality columns
Enable persistent index for primary-key tables with frequent scans
Monitor and tune bucket counts to maintain balanced parallelism and tablet sizes
Leverage query profiling to validate plan efficiency and index usage

[No sources needed since this section provides general guidance]

Troubleshooting Guide

Slow queries on low-cardinality filters
- Verify bitmap indexes exist and are being used
- Confirm partition pruning is active by pushing time filters
Skewed distribution or hotspots
- Increase bucket count or adjust distribution key
- Reassess partition boundaries and retention windows
Excessive write overhead
- Limit bitmap indexes on highly active columns
- Consider bloom filters for equality predicates on high-cardinality columns
Query plan inspection
- Use profiler utilities to capture execution dumps and measure durations

Section sources

Conclusion

The BI Analysis Platform employs StarRocks partitioning and indexing best practices:

Hash distribution with dynamic partitions for balanced, scalable ingestion and pruning
Bitmap indexes on low-cardinality filters to accelerate common queries
Persistent index and primary key modeling for integrity and read performance
Practical tuning guidelines for bucket sizing, co-location joins, and maintenance These strategies collectively improve query performance while controlling maintenance overhead and ensuring predictable scaling.

brownfield

Multi Agent System Architecture

Data Flow And Processing Architecture

Microservices Architecture And Design Patterns

Core Services

Bi Analysis Analytics Engine

Bi Basic Foundation Data Services

Bi Server Business Orchestration Center

Management Services

Bi Sys System Management Service

Bi Tenant Multi Tenant Management Service

Shared Infrastructure

External Data Integration

Jushuitan Erp Integration

Leke Erp Integration

Kafka Data Synchronization Pipeline

Starrocks Olap Database

Admin Panel Ui Web Admin

Tenant Console Ui Web

Partitioning and Indexing Strategies

Table of Contents

Introduction

Project Structure

Core Components

Architecture Overview

Detailed Component Analysis

Hash Partitioning and Bucket Allocation

Bitmap Index Implementation

Primary Key Design Patterns and Unique Index Strategies

Index Selection Criteria Based on Query Patterns

Partition Sizing Recommendations and Bucket Number Optimization

Index Maintenance Procedures

Impact on Data Loading Performance and Query Execution Plans

Dependency Analysis

Performance Considerations

Troubleshooting Guide

Conclusion

Bi Analysis Analytics Engine

Bi Basic Foundation Data Services

Bi Server Business Orchestration Center

Bi Sys System Management Service

Bi Tenant Multi Tenant Management Service

Jushuitan Erp Integration

Leke Erp Integration

Partitioning and Indexing Strategies ​

Table of Contents ​

Introduction ​

Project Structure ​

Core Components ​

Architecture Overview ​

Detailed Component Analysis ​

Hash Partitioning and Bucket Allocation ​

Bitmap Index Implementation ​

Primary Key Design Patterns and Unique Index Strategies ​

Index Selection Criteria Based on Query Patterns ​

Partition Sizing Recommendations and Bucket Number Optimization ​

Index Maintenance Procedures ​

Impact on Data Loading Performance and Query Execution Plans ​

Dependency Analysis ​

Performance Considerations ​

Troubleshooting Guide ​

Conclusion ​

Partitioning and Indexing Strategies

Table of Contents

Introduction

Project Structure

Core Components

Architecture Overview

Detailed Component Analysis

Hash Partitioning and Bucket Allocation

Bitmap Index Implementation

Primary Key Design Patterns and Unique Index Strategies

Index Selection Criteria Based on Query Patterns

Partition Sizing Recommendations and Bucket Number Optimization

Index Maintenance Procedures

Impact on Data Loading Performance and Query Execution Plans

Dependency Analysis

Performance Considerations

Troubleshooting Guide

Conclusion