From da139d3182adb3fdb0528fe54abbcf8b6f33e406 Mon Sep 17 00:00:00 2001 From: ralphmalph62 Date: Fri, 13 Feb 2026 18:48:15 -0500 Subject: [PATCH 1/4] Update Architecture.md --- docs/ja/introduction/Architecture.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/ja/introduction/Architecture.md b/docs/ja/introduction/Architecture.md index f570aafd..1b857e9b 100644 --- a/docs/ja/introduction/Architecture.md +++ b/docs/ja/introduction/Architecture.md @@ -6,7 +6,7 @@ displayed_sidebar: docs import QSOverview from '../_assets/commonMarkdown/quickstart-overview-tip.mdx' # Architecture -foo + StarRocks は堅牢なアーキテクチャを備えています。システム全体は「フロントエンド」と「バックエンド」の2種類のコンポーネントのみで構成されています。フロントエンドノードは **FE** と呼ばれます。バックエンドノードには **BE** と **CN** (コンピュートノード) の2種類があります。データにローカルストレージを使用する場合に BE がデプロイされ、データがオブジェクトストレージまたは HDFS に保存される場合に CN がデプロイされます。StarRocks は外部コンポーネントに依存せず、デプロイとメンテナンスを簡素化します。ノードはサービス停止なしで水平にスケールできます。さらに、StarRocks はメタデータとサービスデータのレプリカメカニズムを備えており、データ信頼性を高め、単一障害点 (SPOF) を効率的に防止します。 StarRocks は MySQL 通信プロトコルと互換性があり、標準 SQL をサポートしています。ユーザーは MySQL クライアントから StarRocks に接続し、瞬時に貴重なインサイトを得ることができます。 From 134fe7f7fd621002b401f215c54504b7de738006 Mon Sep 17 00:00:00 2001 From: DanRoscigno Date: Fri, 13 Feb 2026 18:49:11 -0500 Subject: [PATCH 2/4] trigger checkbox update Signed-off-by: DanRoscigno --- docs/zh/introduction/Architecture.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/zh/introduction/Architecture.md b/docs/zh/introduction/Architecture.md index 46bfabbe..2f719d0f 100644 --- a/docs/zh/introduction/Architecture.md +++ b/docs/zh/introduction/Architecture.md @@ -5,6 +5,7 @@ import QSOverview from '../_assets/commonMarkdown/quickstart-overview-tip.mdx' 好的 + StarRocks 具有出色的架构。整个系统只包含两种类型的组件:“前端节点”和“后端节点”。前端节点称为 **FE**。后端节点分为两种类型:**BE** 和 **CN**(计算节点)。当数据使用本地存储时,部署 BE;当数据存储在对象存储或 HDFS 上时,部署 CN。StarRocks 不依赖任何外部组件,这简化了部署和维护。节点可以水平扩展而无需停机。此外,StarRocks 对元数据和服务数据采用副本机制,提高了数据可靠性,有效防止了单点故障 (SPOF)。 StarRocks 兼容 MySQL 通信协议并支持标准 SQL。用户可以通过 MySQL 客户端连接到 StarRocks,即时获取有价值的洞察。 From 7cdd334ab63e9b13b6409bfd367558d4c5adcb48 Mon Sep 17 00:00:00 2001 From: "docs-automation[bot]" Date: Fri, 13 Feb 2026 23:50:36 +0000 Subject: [PATCH 3/4] docs: automated translation via Gemini --- docs/en/introduction/Architecture.md | 61 ++++++++++++++-------------- 1 file changed, 30 insertions(+), 31 deletions(-) diff --git a/docs/en/introduction/Architecture.md b/docs/en/introduction/Architecture.md index 09428b7e..7eb9b061 100644 --- a/docs/en/introduction/Architecture.md +++ b/docs/en/introduction/Architecture.md @@ -3,81 +3,80 @@ import QSOverview from '../_assets/commonMarkdown/quickstart-overview-tip.mdx' # Architecture -OK +Good. -StarRocks has a wonderful architecture. The entire system consists of only two types of components: "frontends" and "backends". Frontend nodes are called **FE**. Backend nodes are divided into two types: **BE** and **CN** (compute node). When data uses local storage, BEs are deployed; when data is stored on object storage or HDFS, CNs are deployed. StarRocks does not rely on any external components, which simplifies deployment and maintenance. Nodes can be scaled horizontally without downtime. In addition, StarRocks has a replica mechanism for metadata and service data, which improves data reliability and effectively prevents single points of failure (SPOFs). +StarRocks has an excellent architecture. The entire system contains only two types of components: "frontend nodes" and "backend nodes". Frontend nodes are called **FE**. Backend nodes are divided into two types: **BE** and **CN** (compute nodes). When data uses local storage, BEs are deployed; when data is stored on object storage or HDFS, CNs are deployed. StarRocks does not rely on any external components, which simplifies deployment and maintenance. Nodes can be horizontally scaled without downtime. In addition, StarRocks adopts a replica mechanism for metadata and service data, which improves data reliability and effectively prevents single points of failure (SPOF). -StarRocks is compatible with the MySQL communication protocol and supports standard SQL. Users can connect to StarRocks via a MySQL client to gain instant and valuable insights. +StarRocks is compatible with the MySQL communication protocol and supports standard SQL. Users can connect to StarRocks via a MySQL client to gain valuable insights instantly. ## Architecture Choices -StarRocks supports the shared-nothing mode (where each BE owns a portion of the data on its local storage) and the shared-data mode (where all data is stored on object storage or HDFS, and each CN only has a cache on its local storage). You can decide where to store your data based on your needs. +StarRocks supports a compute-storage integrated mode (where each BE owns a portion of data on its local storage) and a compute-storage separated mode (where all data is stored on object storage or HDFS, and each CN only has a cache on its local storage). You can decide the data storage location based on your needs. -![架构选择](../_assets/architecture_choices.png) +![Architecture Choices](../_assets/architecture_choices.png) -### Shared-nothing Mode +### Compute-storage Integrated Mode Local storage provides better query latency for real-time queries. -As a typical Massively Parallel Processing (MPP) database, StarRocks supports the shared-nothing architecture. In this architecture, BEs are responsible for data storage and compute. Directly accessing local data on BE nodes enables local compute, avoiding data transfer and data copying, and providing ultra-fast query and data analytics performance. This architecture supports multi-replica data storage, enhancing the cluster's ability to handle high-concurrency queries and ensuring data reliability. It is ideal for scenarios that require optimal query performance. +As a typical massively parallel processing (MPP) database, StarRocks supports a compute-storage integrated architecture. In this architecture, BEs are responsible for data storage and computation. Directly accessing local data on BE nodes enables local computation, avoiding data transfer and replication, and providing ultra-fast query and data analysis performance. This architecture supports multi-replica data storage, enhancing the cluster's ability to handle high-concurrency queries and ensuring data reliability. It is very suitable for scenarios requiring optimal query performance. -![存算一体架构](../_assets/shared-nothing.png) +![Compute-storage Integrated Architecture](../_assets/shared-nothing.png) #### Nodes +In the compute-storage integrated architecture, StarRocks consists of two types of nodes: FE and BE. -In the shared-nothing architecture, StarRocks consists of two types of nodes: FE and BE. - -- FE is responsible for metadata management and building execution plans. -- BE executes query plans and stores data. BEs leverage local storage to accelerate queries and use a multi-replica mechanism to ensure data high availability. +- FEs are responsible for metadata management and building execution plans. +- BEs execute query plans and store data. BEs utilize local storage to accelerate queries and use a multi-replica mechanism to ensure high data availability. ##### FE -FE is responsible for metadata management, client connection management, query planning, and query scheduling. Each FE uses BDB JE (Berkeley DB Java Edition) to store and maintain a complete replica of metadata in its memory, ensuring service consistency among all FEs. FEs can operate as Leader, Follower, and Observer. If the Leader node crashes, Followers will elect a Leader based on the Raft protocol. +FEs are responsible for metadata management, client connection management, query planning, and query scheduling. Each FE uses BDB JE (Berkeley DB Java Edition) to store and maintain a complete metadata replica in its memory, ensuring service consistency among all FEs. FEs can run as Leader, Follower, and Observer. If the Leader node crashes, Followers will elect a Leader based on the Raft protocol. -| **FE Role** | **Metadata Management** | **Leader Election** | +| **FE Role** | **Metadata Management** | **Leader Election** | | ----------- |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| ---------------------------------- | -| Leader | The Leader FE reads and writes metadata. Follower and Observer FEs can only read metadata. They route metadata write requests to the Leader FE. The Leader FE updates metadata, then uses the Raft protocol to synchronize metadata changes to Follower and Observer FEs. Data writes are considered successful only after metadata changes are synchronized to more than half of the Follower FEs. | The Leader FE is, technically, also a Follower node, elected by Follower FEs. To perform Leader election, more than half of the Follower FEs in the cluster must be active. When the Leader FE fails, Follower FEs will initiate a new round of Leader election. | +| Leader | Leader FEs read and write metadata. Follower and Observer FEs can only read metadata. They route metadata write requests to the Leader FE. The Leader FE updates the metadata and then uses the Raft protocol to synchronize metadata changes to Follower and Observer FEs. Data writes are considered successful only after metadata changes are synchronized to more than half of the Follower FEs. | The Leader FE is technically also a Follower node, elected by Follower FEs. To perform a Leader election, more than half of the Follower FEs in the cluster must be active. When the Leader FE fails, the Follower FEs will initiate a new round of Leader election. | | Follower | Followers can only read metadata. They synchronize and replay logs from the Leader FE to update metadata. | Followers participate in Leader election, which requires more than half of the Followers in the cluster to be active. | -| Observer | Synchronizes and replays logs from the Leader FE to update metadata. | Observers are mainly used to improve the query concurrency of the cluster. Observers do not participate in Leader election and therefore do not increase the Leader election pressure on the cluster. | +| Observer | Synchronize and replay logs from the Leader FE to update metadata. | Observers are mainly used to improve the query concurrency of the cluster. Observers do not participate in Leader election, thus they do not increase the Leader election pressure on the cluster. | ##### BE BEs are responsible for data storage and SQL execution. -- Data Storage: BEs have equivalent data storage capabilities. FE distributes data to BEs according to predefined rules. BEs transform ingested data, write data in the required format, and generate indexes for the data. +- Data Storage: BEs have equivalent data storage capabilities. FEs distribute data to BEs according to predefined rules. BEs transform ingested data, write data in the required format, and generate indexes for the data. -- SQL Execution: FE parses each SQL query into a logical execution plan based on the query's semantics, and then converts the logical plan into a physical execution plan that can be executed on BEs. The BEs storing the target data execute the query. This eliminates the need for data transfer and copying, thereby achieving high query performance. +- SQL Execution: FEs parse each SQL query into a logical execution plan according to the query's semantics, and then convert the logical plan into a physical execution plan that can be executed on BEs. The BEs storing the target data execute the queries. This eliminates the need for data transfer and replication, thereby achieving high query performance. -### Shared-data Mode +### Compute-storage Separated Mode -Object storage and HDFS offer advantages in terms of cost, reliability, and scalability. In addition to storage scalability, due to the separation of storage and compute, CN nodes can be added and removed on demand without re-balancing data. +Object storage and HDFS offer advantages in terms of cost, reliability, and scalability. In addition to storage scalability, due to the separation of storage and computation, CN nodes can be added and removed on demand without rebalancing data. -In the shared-data architecture, BEs are replaced by "compute nodes (CNs)", which are only responsible for data compute tasks and caching hot data. Data is stored in low-cost, reliable remote storage systems, such as Amazon S3, Google Cloud Storage, Azure Blob Storage, MinIO, etc. When the cache hits, query performance is comparable to the shared-nothing architecture. CN nodes can be added or removed on demand within seconds. This architecture reduces storage costs, ensures better resource isolation, and offers high elasticity and scalability. +In the compute-storage separated architecture, BEs are replaced by "Compute Nodes (CNs)", which are solely responsible for data computation tasks and caching hot data. Data is stored in low-cost, reliable remote storage systems (such as Amazon S3, Google Cloud Storage, Azure Blob Storage, MinIO, etc.). When a cache hit occurs, query performance is comparable to the compute-storage integrated architecture. CN nodes can be added or removed on demand within seconds. This architecture reduces storage costs, ensures better resource isolation, and provides high elasticity and scalability. -The shared-data architecture, like the shared-nothing architecture, maintains a simple design. It consists of only two types of nodes: FE and CN. The only difference is that users need to provision a backend object storage. +The compute-storage separated architecture, like the compute-storage integrated architecture, maintains a simple design. It only contains two types of nodes: FE and CN. The only difference is that users need to provide a backend object storage. -![存算分离架构](../_assets/shared-data.png) +![Compute-storage Separated Architecture](../_assets/shared-data.png) #### Nodes -The coordinator nodes in a shared-data architecture provide the same functionality as the FEs in a shared-nothing architecture. +The coordinator nodes in the compute-storage separated architecture provide the same functions as FEs in the compute-storage integrated architecture. -BEs are replaced by CNs (compute nodes), and storage functionality is offloaded to object storage or HDFS. CNs are stateless compute nodes that perform all BE functions except data storage. +BEs are replaced by CNs (Compute Nodes), and storage functions are offloaded to object storage or HDFS. CNs are stateless compute nodes that perform all BE functions except data storage. #### Storage -StarRocks shared-data clusters support two storage solutions: object storage (such as AWS S3, Google GCS, Azure Blob Storage, or MinIO) and HDFS. +StarRocks compute-storage separated clusters support two storage solutions: object storage (such as AWS S3, Google GCS, Azure Blob Storage, or MinIO) and HDFS. -In shared-data clusters, the data file format remains consistent with shared-nothing clusters (which feature coupled storage and compute). Data is organized into segment files, and various indexing technologies are reused in Cloud-native tables, which are tables specifically used in shared-data clusters. +In compute-storage separated clusters, the data file format remains consistent with compute-storage integrated clusters (which have tightly coupled storage and computation). Data is organized into segment files, and various indexing technologies are reused in Cloud-native tables, which are tables specifically designed for compute-storage separated clusters. #### Cache -StarRocks shared-data clusters decouple data storage and compute, allowing them to scale independently, thereby reducing costs and enhancing elasticity. However, this architecture may affect query performance. +StarRocks compute-storage separated clusters decouple data storage from computation, allowing them to scale independently, thereby reducing costs and enhancing elasticity. However, this architecture may affect query performance. -To mitigate the impact, StarRocks has established a multi-tiered data access system covering memory, local disk, and remote storage to better meet various business needs. +To mitigate this impact, StarRocks has established a multi-tiered data access system covering memory, local disk, and remote storage, to better meet various business needs. -Hot data queries directly scan the cache, then scan the local disk; while cold data needs to be loaded from object storage into the local cache to accelerate subsequent queries. By keeping hot data close to the compute unit, StarRocks achieves truly high-performance compute and cost-effective storage. In addition, cold data access has been optimized through data prefetching strategies, effectively eliminating query performance limitations. +Hot data queries directly scan the cache, then scan local disks; while cold data needs to be loaded from object storage into local cache to accelerate subsequent queries. By keeping hot data close to the compute units, StarRocks achieves truly high-performance computation and cost-effective storage. Furthermore, cold data access is optimized through data prefetching strategies, effectively eliminating query performance limitations. -Caching can be enabled when creating a table. If caching is enabled, data will be written simultaneously to both local disk and the backend object storage. During queries, CN nodes first read data from the local disk. If data is not found, it will be retrieved from the backend object storage and simultaneously cached to the local disk. +Caching can be enabled when creating tables. If caching is enabled, data will be written to both local disk and backend object storage simultaneously. During queries, CN nodes first read data from the local disk. If the data is not found, it will be retrieved from the backend object storage and simultaneously cached to the local disk. From b8dda8460e8ef7ceab9a0f831b4efdf6741da549 Mon Sep 17 00:00:00 2001 From: DanRoscigno Date: Thu, 26 Feb 2026 08:07:30 -0500 Subject: [PATCH 4/4] set ref Signed-off-by: DanRoscigno --- .github/workflows/ci-doc-translater.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/ci-doc-translater.yml b/.github/workflows/ci-doc-translater.yml index d5aa69ef..c20bca7d 100644 --- a/.github/workflows/ci-doc-translater.yml +++ b/.github/workflows/ci-doc-translater.yml @@ -85,7 +85,7 @@ jobs: uses: actions/checkout@v4 with: repository: StarRocks/markdown-translator - ref: main + ref: 'v1.0.5' path: trusted_tools - name: Get Changed Files