Spatiotemporal Big Data Store: Rolling Data Options

EricIronside · ‎02-11-2020

The Spatiotemporal Big Datastore is fairly complex and it can be difficult to figure out what setting values you may need without actually looking at the STBDS indices (in Elasticsearch) and getting some information. In general, these are in line with the ES recommendations.

Shard Size
- Elastic reccommends a few tens of GB.
- For log based data, the optimal size is 20-40 GB.
number_of_shards per node
- Less than 20 per GB of heap space.
- For 32GB heap size, try to keep the node at 640 shards or less.

https://www.elastic.co/guide/en/elasticsearch/reference/current/scalability.html

number_of_shards - set this to 1.5 to 3 * N (where N = number of STBDS machines)
- Primary shards increase indexing performance but too many can reduce query performance.
- Primary shards determine the scalability of the index.
  e.g. an index with 5 shards can theoretically scale up to 5 machines for indexing, and have each machine handle 1/5th of the load.
  An index with 1 shard cannot scale to 5 machines and will have 4 machines sitting idle for indexing.
- Primary shards cannot be changed after the index is created, so it’s best to over-allocate them a bit.
- The Default Number of Shards can exceed the number of nodes, particularly in situations where:
  
  - The spatiotemporal cluster is expected to expand in the future. If you start with 3 STBDS nodes but add more machines over the project to support additional load, recommend going with 7 primary shards. That would support scalability up to 7 machines without requiring re-indexing or re-creating the STBDS data source.
  - You have sufficient CPU cores and disk bandwidth on each machine. So if you have a 3-node cluster with 8 cores per node, you should be fine with an index that has 7 primary shards. Source:
  https://www.datadoghq.com/blog/elasticsearch-performance-scaling-problems/
  - You are trying to control shard size. If you are getting very large shards for daily rollover but very small shards (<10GB) with hourly rollover, it may be better to increase the primary shards to reach the desired 20-40GB per shard recommendation.
replication_factor - set this to ceil(N / 3)
- A replication_factor of X means you can sustain the loss of X machines simultaneously, without losing data.
- You can optionally set it to zero for 1 node, but I think ES already performs that optimization.
- Replication increases query performance but reduces indexing performance.
- Replication factor can be changed dynamically.
  For instance, you can set this to zero when about to ingest a lot of data, and then set it back to 1 when the indexing is done.
rolling_data
- If data is log-based (not updated often), and newer data is more important than older data,
  you should care about rolling data. Otherwise, set it to yearly.

Consider a given period of time and estimate the amount of records that will accumulate for your index.
Divide this number by the number_of_shards for that index to get the expected record count.

If this number is under 10 million records per some period, then rolling data should be set to that period*.
That is:

yearly for > 10 million records (per shard) per year.
monthly for > 10 million records (per shard) per month.
daily for > 10 million records (per shard) per day.
hourly for > 10 million records (per shard) per hour.

For a more advanced way to determine the rolling data period, follow these steps:

Consider the ingest rate per day you consider “steady state”.
Find the yearly estimate (e.g. multiply that number by 365) and divide by number_of_shards.
If the result is less than 10 million, use yearly.
Otherwise, divide by 12 and continue to the next step.
If the result is less than 10 million, use monthly.

Otherwise, divide by 30 and continue to the next step.

If the result is less than 10 million, use daily.
Otherwise, choose hourly.

* Assuming 3 shards, each record is 4KB, 10 million records ~= 40GB / 3 = 13.3GB per shard,
which falls within ES recommendations.
Record size needs to be estimated empirically, but 2-4KB is common.

See: https://www.elastic.co/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster

Shard index calculation

To determine how many shards are created per index interval:

total_shards_per_rolling_period = number_of_shards + (replication_factor*number_of_shards)

To determine the “steady state” shard count:

total_shards = total_shards_per_rolling_period * retention_period

shards_per_machine = total_shards / N

primary_shards_per_machine = shards_per_machine / (replication_factor + 1)

replica_shards_per_machine = (shards_per_machine * replication_factor) / (replication_factor + 1)

e.g.

index is configured with 10 shards, daily rolling period, replication factor of 2, 30 days retention

N (number of machines) = 5

number_of_shards = 10

replication_factor = 2

rolling_period = 1 day

retention_period = 30 days

total_number_of_shards_per_rolling_period = 10 + (10*2) = 30 shards per day

total_shards = 30 shards per day * 30 days = 900 shards

shards_per_machine = 900 / 5 = 180 (60 primary, 120 replicas)

primary_shards_per_machine = 180 / (2 + 1) = 60 primary shards

replica_shards_per_machine = (180 * 2) / (2 + 1) = 120 primary shards

60 primary shards per machine isn’t too bad. If necessary, shards can be lowered by setting the replication factor lower.

Please note that in 10.6+, STBDS will “shrink” smaller rollover indexes/shards to 1 shard to avoid an index explosion.

Flush Interval

Typically you don’t need to change this from the default (1,000 milliseconds) unless you are running into write issues
(rejected execution exceptions, or Feature Service document count lagging the GeoEvent STBDS output count, etc.).

The refresh_interval is a separate setting on the STBDS data source, and it defaults to 1 second.

Generally the STBDS output flush interval should be less than or equal to refresh_interval.

Additionally, consider looking at other aspects of the system which may affect query performance:

Replica shards should increase query performance by parallelizing searches across nodes. Increasing replication factor to 2 incurs additional storage/network/CPU load, but will reduce query latency and also increase the availability of the cluster.
Active indexing can cause query performance to decrease on the machines if they also need to service queries at the same time. ES sometimes has problems balancing shards across nodes so I would consider checking the shard allocation to make sure there is not a “hot spot” with nodes having most of the primary shards.
Also, newer versions of STBDS perform shrinking/compressing of old shards over a certain time period. There is some logic in place that determines whether this shrinking occurs. This eliminates old shards (reducing them to 1). If STBDS is shrinking your shards, that can definitely affect query performance.