Database Schema and Data Model
Table of Contents
- Introduction
- Project Structure
- Core Components
- Architecture Overview
- Detailed Component Analysis
- Dependency Analysis
- Performance Considerations
- Troubleshooting Guide
- Conclusion
Introduction
This document provides comprehensive database schema and data model documentation for the bi-analysis StarRocks OLAP system. It covers table structures, field definitions, data types, primary keys, indexes, constraints, and the metric data model. It also explains dimension grouping, hierarchical relationships, design principles, partitioning strategies, and data lifecycle management. The document connects configuration tables with the underlying analytical data model and clarifies the relationship between configuration metadata and physical data sources.
Project Structure
The database design for bi-analysis consists of four core configuration tables that define data sources, dimensions, metrics, and field metadata. These tables are designed for StarRocks with Primary Key tables, BITMAP indexes, and soft-delete semantics.
Diagram sources
Section sources
Core Components
This section documents the core configuration tables and their roles in the bi-analysis data model.
- anls_table: Defines data source tables used for analysis, including associations with dimensions and date types.
- anls_dimension: Defines analytical dimensions and supports mutual exclusion rules via c_mutex_keys.
- anls_field: Defines metric fields, formulas, functions, and their relationships to data sources and dimensions.
- anls_field_meta: Provides metadata for available fields in source tables, enabling dynamic metric creation and validation.
- anls_date: Provides a complete date dimension table for time-series analysis.
- anls_template: Stores reusable analysis templates per tenant.
- anls_user_home_layout: Stores user-specific dashboard layouts.
- anls_field_visible_template: Stores personalized field visibility and ordering configurations.
Section sources
Architecture Overview
The bi-analysis database design centers on configuration-driven metric definition and flexible dimension modeling. The configuration tables guide how metrics are constructed, validated, and presented, while the date dimension table supports time-based analytics.
Diagram sources
Detailed Component Analysis
Data Source Configuration Table (anls_table)
Purpose:
- Define data source tables used for analysis.
- Associate dimensions and date types to tables.
Key fields and constraints:
- Primary key: id
- Tenant scoping via tenant_id
- Unique constraint on c_table_name enforced by index
- Soft delete via deleted_at
- Status flag and sort order
Indexes:
- Unique index on c_table_name
- BITMAP index on status
- BITMAP index on deleted_at
Partitioning and distribution:
- Distributed by HASH(id) with 8 buckets
- Replication set to 1
Operational notes:
- Used to link metrics to source tables and dimensions.
Section sources
Dimension Configuration Table (anls_dimension)
Purpose:
- Define analytical dimensions and enforce mutual exclusivity.
Key fields and constraints:
- Primary key: id
- Unique key: c_key
- Mutual exclusion: c_mutex_keys stores comma-separated keys to exclude when selecting dimensions
- Soft delete via deleted_at
- Status flag and sort order
Indexes:
- Unique index on c_key
- BITMAP index on status
- BITMAP index on deleted_at
Partitioning and distribution:
- Distributed by HASH(id) with 8 buckets
- Replication set to 1
Front-end logic:
- The design document outlines a filtering algorithm to compute mutually exclusive dimensions based on selected keys.
Section sources
Metric Field Configuration Table (anls_field)
Purpose:
- Define metric fields, formulas, functions, and their relationships to data sources and dimensions.
Key fields and constraints:
- Primary key: id
- Tenant scoping via tenant_id (0 indicates system-wide)
- Unique key: c_field
- Supports three types: field, formula, function
- Units: number or percent
- Expression and filters stored as JSON
- Date range controls via offsets and fixed dates
- Soft delete via deleted_at
- Status flag and sort order
Indexes:
- Unique index on c_field
- BITMAP index on tenant_id
- BITMAP index on c_group
- BITMAP index on c_type
- BITMAP index on status
- BITMAP index on deleted_at
Partitioning and distribution:
- Distributed by HASH(id) with 8 buckets
- Replication set to 1
Relationships:
- References data source tables via c_table_name and c_table_field
- References dimensions via c_dimension
- Supports derived metrics via c_field_p and c_expression
Initialization:
- The init.sql script seeds system-level metrics across multiple groups (e.g., flow, finance, ad, sku, consult, region, wujie, amortiz, other).
Section sources
Field Metadata Table (anls_field_meta)
Purpose:
- Provide metadata for available fields in source tables to enable dynamic metric creation and validation.
Key fields and constraints:
- Primary key: id
- Composite unique key: (c_group, c_table_field)
- Grouping: c_group identifies data source type (order, refund, sd, apportion, bill_period)
- Data typing: c_data_type (amount, count, rate, flag, enum)
- Date support: c_date_type as JSON array
- Metric capability: is_metric and default aggregation via c_func
- Filter capability: is_filter and supported operators via c_operators
- Enumerations: c_enum_values as JSON
- Related filters: c_related_filters for metric-level filter sets
- Soft delete via deleted_at
- Status flag and sort order
Indexes:
- Unique composite index on (c_group, c_table_field)
- BITMAP index on c_group
- BITMAP index on is_metric
- BITMAP index on is_filter
- BITMAP index on status
- BITMAP index on deleted_at
Partitioning and distribution:
- Distributed by HASH(id) with 4 buckets
- Replication set to 1
Relationships:
- Validates fields referenced by anls_field
- Supports dynamic UI generation for metric creation
Section sources
Date Dimension Table (anls_date)
Purpose:
- Provide a complete date dimension for time-series analysis.
Key fields and constraints:
- Primary key: t_date (DATE)
- Computed attributes: ISO week, year-month, quarter, day of week/month, week start/end, formatted week display
- Pre-populated for years 2020–2050
Partitioning and distribution:
- Distributed by HASH(t_date) with 4 buckets
- Replication set to 1
Initialization:
- The schema initializes the date dimension using a sequence generator and date arithmetic.
Section sources
Additional Configuration Tables
These tables support template-based analysis, user dashboards, and field visibility personalization.
- anls_template: Tenant-scoped templates for analysis layouts and metric selections.
- anls_user_home_layout: User-specific dashboard layouts.
- anls_field_visible_template: Personalized field visibility and ordering.
Indexes and distribution:
- All use distributed HASH(id) with 8 buckets and replication set to 1.
- BITMAP indexes on tenant_id, status, and deleted_at where applicable.
Section sources
Dependency Analysis
The configuration tables form a layered dependency model that governs metric creation and validation.
Diagram sources
Section sources
Performance Considerations
- Distribution and bucket sizing:
- Primary key tables use HASH(id) distribution with 4–8 buckets depending on cardinality needs.
- Date dimension uses HASH(t_date) with fewer buckets due to low-cardinality date keys.
- Indexing strategy:
- BITMAP indexes on status and deleted_at enable fast filtering for soft-deleted records and active/inactive rows.
- Unique indexes on identifiers (c_table_name, c_key, c_field) prevent duplicates and accelerate joins.
- Replication:
- All tables set replication_num to 1, suitable for read-heavy analytical workloads.
- Data lifecycle:
- Soft delete pattern via deleted_at allows safe archival and recovery without physical deletion.
- Partitioning:
- No explicit partitioning directives are present; time-series queries leverage the date dimension and BITMAP filters.
[No sources needed since this section provides general guidance]
Troubleshooting Guide
Common issues and resolutions:
- Duplicate metric field names:
- Symptom: Insert fails on unique index for c_field.
- Resolution: Ensure c_field uniqueness per tenant; system-level metrics use tenant_id=0.
- Invalid dimension selection:
- Symptom: Front-end shows no available dimensions after selection.
- Resolution: Verify c_mutex_keys on selected dimensions; mutual exclusion prevents overlapping selections.
- Missing source field metadata:
- Symptom: Metric creation fails validation.
- Resolution: Confirm c_group and c_table_field exist in anls_field_meta; ensure is_metric/is_filter flags match intended usage.
- Soft-deleted records interfering with queries:
- Symptom: Queries return unexpected inactive rows.
- Resolution: Apply deleted_at=0 filter in queries; BITMAP indexes on deleted_at support efficient filtering.
Section sources
Conclusion
The bi-analysis StarRocks database design leverages configuration-driven metadata to power flexible, tenant-aware metric creation and time-series analysis. The schema emphasizes soft deletes, BITMAP indexing, and primary key tables aligned with StarRocks best practices. The date dimension table and configuration tables collectively support robust analytical workflows, from dimension grouping and mutual exclusion to dynamic metric composition and personalization.