StarRocks OLAP Database Architecture
Table of Contents
- Introduction
- Project Structure
- Core Components
- Architecture Overview
- Detailed Component Analysis
- Dependency Analysis
- Performance Considerations
- Troubleshooting Guide
- Conclusion
- Appendices
Introduction
This document describes the StarRocks OLAP database architecture within the project, focusing on ingestion mechanisms, table design strategies, partitioning and distribution, query optimization, and operational practices. It synthesizes real implementations present in the repository (StreamLoad client, MCP server integration, configuration, and design guidelines) to provide a practical guide for building and operating StarRocks-based analytics systems.
Project Structure
The StarRocks-related components are organized across:
- StreamLoad client library for real-time ingestion
- Configuration and environment integration for database connections
- MCP server for controlled agent access to StarRocks
- Design guidelines and entity documentation for table modeling and distribution
Diagram sources
- [client.go]
- [option.go]
- [response.go]
- [bi-common.yaml]
- [init.py]
- [architecture-mcp-server-starrocks.md]
- [protocols-db.md]
- [数据库设计.md]
Section sources
- [client.go]
- [bi-common.yaml]
- [init.py]
- [architecture-mcp-server-starrocks.md]
- [protocols-db.md]
- [数据库设计.md]
Core Components
- StreamLoad client: Provides typed configuration, request construction, and response parsing for HTTP-based ingestion into StarRocks.
- Load options: Encapsulate ingestion parameters such as format, column mapping, partition targeting, strict mode, timeouts, and JSON handling.
- Response model: Standardized ingestion outcome with success/failure detection and timing metrics.
- Configuration: Centralized connection settings and pool tuning for StarRocks.
- MCP server: Controlled access to StarRocks for AI agents with column-level protections.
- Design rules: Enforce StarRocks-specific modeling (Primary Key vs Aggregate/Duplicate Key), partitioning, bucketing, and sort keys.
Section sources
- [client.go]
- [option.go]
- [response.go]
- [bi-common.yaml]
- [architecture-mcp-server-starrocks.md]
- [protocols-db.md]
Architecture Overview
The ingestion pipeline integrates HTTP StreamLoad with application clients, while the MCP server mediates secure, auditable access to StarRocks for agents. Configuration is shared across services and environments.
Diagram sources
Detailed Component Analysis
StreamLoad Client
The client encapsulates:
- Configuration: host, port, credentials, timeout, retry, and BE proxy for local development.
- Request building: supports CSV/JSON, label generation, column mapping, and arbitrary options.
- Redirect handling: custom HTTP client replaces BE host during 307 redirects when a proxy is configured.
- Response parsing: standardized ingestion metrics and success/failure checks.
Diagram sources
Section sources
Load Options
Options enable flexible ingestion:
- Label, columns, separators, row delimiter
- Max filter ratio and strict mode
- Partition targeting and timeout
- JSON outer array stripping and JSON paths
Diagram sources
Section sources
Response Handling
The response model standardizes ingestion outcomes and provides convenience checks for success and failure.
Section sources
Configuration and Environment Integration
Connection settings and pool tuning are centralized and validated across services:
- YAML-based configuration with driver, host, port, credentials, and pool parameters.
- Proto-based configuration structures expose StarRocks optimization flags and pool settings.
- Tests confirm conversion and environment-driven setup.
Diagram sources
- [bi-common.yaml]
- [setup_test.go]
- [env_test.go]
- [config_test.go]
- [conf.pb.go (bi-basic)]
- [conf.pb.go (bi-notify)]
Section sources
- [bi-common.yaml]
- [setup_test.go]
- [env_test.go]
- [config_test.go]
- [conf.pb.go (bi-basic)]
- [conf.pb.go (bi-notify)]
MCP Server for Controlled Access
The MCP server exposes StarRocks tools to external agents with security and isolation:
- Tools: list tables, describe table, read query
- Column-level protection via masking/aggregation rules
- Auditability through the MCP protocol
Diagram sources
Section sources
Table Modeling and Design Guidelines
The project enforces StarRocks modeling best practices:
- Primary Key model for transactional metadata requiring upsert/delete
- Aggregate Key model for pre-aggregated reporting
- Duplicate Key model for high-volume detail records
- Explicit partitioning for large tables and colocated joins for frequently joined tables
- Sort keys and prefix index considerations for filtering and compression
Diagram sources
Section sources
Dependency Analysis
Key dependencies and relationships:
- StreamLoad client depends on HTTP transport and configuration
- Response parsing decouples ingestion logic from result interpretation
- Configuration is consumed by setup routines and validated in tests
- MCP server depends on connection configuration and design rules for safe access
Diagram sources
- [client.go]
- [response.go]
- [bi-common.yaml]
- [setup_test.go]
- [architecture-mcp-server-starrocks.md]
- [protocols-db.md]
Section sources
- [client.go]
- [response.go]
- [bi-common.yaml]
- [setup_test.go]
- [architecture-mcp-server-starrocks.md]
- [protocols-db.md]
Performance Considerations
- Ingestion batching: aim for balanced batch sizes to avoid excessive transactions or memory pressure.
- Strict mode and filter ratios: tune to balance data quality and throughput.
- Partitioning and bucket sizing: align with scan volume reduction and parallelism targets.
- Colocated joins: eliminate network shuffle for frequently joined large tables.
- Sort keys and prefix index: place selective, shorter columns early to improve filtering and compression.
[No sources needed since this section provides general guidance]
Troubleshooting Guide
Common issues and remedies:
- Authentication failures: verify credentials and port (FE HTTP port for StreamLoad).
- Redirect loops or BE address resolution: configure BE proxy for local development.
- Import failures: inspect response status and message; consult error URL for details.
- Timeout exceeded: increase timeout for large batches.
- Duplicate label errors: ensure unique labels per import task.
Section sources
Conclusion
The repository demonstrates a production-ready StarRocks architecture integrating HTTP-based ingestion, robust configuration management, secure agent access, and strong design guidelines. By adhering to the outlined modeling, partitioning, and operational practices, teams can achieve reliable, high-performance analytics at scale.
[No sources needed since this section summarizes without analyzing specific files]
Appendices
Appendix A: StreamLoad End-to-End Flow
Diagram sources
Appendix B: MCP Server Tooling
- list_tables: enumerate accessible tables
- describe_table: retrieve schema with metadata
- read_query: execute controlled SQL queries
Section sources
Appendix C: Entity Synchronization Script
A script exists to synchronize StarRocks entities, supporting schema alignment and operational workflows.
Section sources