Star-Tree

Star tree cubing is a pre-aggregation technique to achieve low latency runtime for iceberg queries. An iceberg query computes an aggregate function over an attribute ( or set of attributes) in order to find aggregate values above a specified threshold. Using this technique, User is provided with an option to create a cube with necessary aggregations and dimensions. Then, when aggregation queries are executed, the cube is used to executing the query instead of the original table. The actual performance gain is achieved during the TableScan operation as cubes are pre-computed and pre-aggregated.

For this reason, the cubing technique is highly effective when the group by cardinality results in lesser rows than the original table.

Supported functions

COUNT, COUNT DISTINCT, MIN, MAX, SUM, AVG

Enabling and Disabling Star-tree

To enable:

SET SESSION enable_star_tree_index=true;

To disable:

SET SESSION enable_star_tree_index=false;

Configuration Properties

Property NameDefault ValueRequiredDescription
optimizer.enable-star-tree-indexfalseNoEnables star-tree index
cube.metadata-cache-size5NoThe maximum number of metadata for star-trees that could be loaded into cache before eviction happens
cube.metadata-cache-ttl1hNoThe maximum time to live of star-trees that are be loaded into cache before eviction happens

Examples

Creating a star-tree cube:

CREATE CUBE nation_cube 
ON nation 
WITH (AGGREGATIONS=(count(*), count(distinct regionkey), avg(nationkey), max(regionkey)),
GROUP=(nationkey),
format='orc', partitioned_by=ARRAY['nationkey']);

Next, to add data to the cube:

INSERT INTO CUBE nation_cube WHERE nationkey > 5;

To use the new cube, just query the original table using aggregations that were included in the cube:

SELECT count(*) FROM nation WHERE nationkey > 5 GROUP BY nationkey;
SELECT nationkey, avg(nationkey), max(regionkey) WHERE nationkey > 5 GROUP BY nationkey;

Optimizer Changes

The star tree aggregation rule is an Iterative optimizer that optimizes the logical plan by replacing the original aggregation sub-tree and original table scan with pre-aggregation table scan. This optimizer uses the TupleDomain construct to match if predicates provided in the Query can be supported by the Cubes. The exact rows are not queried to check if Cube is applicable or not.

Dependencies

Star Tree index relies on Hetu metastore to store the cube related metadata. Please check Hetu Metastore for more information.

Limitation

  1. Star tree cube is only effective when the group by cardinality is considerably lower than the number of rows in source table.
  2. A significant amount of user effort required in maintaining Cubes for large datasets.
  3. Only incremental insert into cube is supported. Cannot delete specific rows from Cube.