Skip to main content

Mapping the Edge: Elasticsearch Trends from Bayview’s Qualitative Benchmarks

Elasticsearch clusters are evolving faster than many teams can track. This guide cuts through the noise by focusing on qualitative benchmarks — real-world patterns we have observed across projects dealing with search, observability, and semantic retrieval. We examine the shift from monolithic clusters to tiered architectures, the growing role of vector search alongside traditional BM25, and the operational trade-offs that come with each choice. You will learn how to evaluate index lifecycle policies, decide between hot-warm-cold and searchable snapshots, and avoid common pitfalls like oversharding or neglecting field mappings. We also compare three common approaches: keeping everything on hot nodes, using cold tiers with forced merges, and adopting a hybrid search setup with dense vectors. The piece includes a decision framework based on query latency, storage cost, and indexing throughput, plus a mini-FAQ on shard sizing, replica counts, and snapshot strategies.

Elasticsearch clusters are evolving faster than many teams can track. This guide cuts through the noise by focusing on qualitative benchmarks — real-world patterns we have observed across projects dealing with search, observability, and semantic retrieval. We examine the shift from monolithic clusters to tiered architectures, the growing role of vector search alongside traditional BM25, and the operational trade-offs that come with each choice. You will learn how to evaluate index lifecycle policies, decide between hot-warm-cold and searchable snapshots, and avoid common pitfalls like oversharding or neglecting field mappings. We also compare three common approaches: keeping everything on hot nodes, using cold tiers with forced merges, and adopting a hybrid search setup with dense vectors. The piece includes a decision framework based on query latency, storage cost, and indexing throughput, plus a mini-FAQ on shard sizing, replica counts, and snapshot strategies. Whether you are migrating from an older version or designing a new cluster from scratch, these benchmarks offer a practical lens for mapping your next move.

Who Needs to Choose and Why Now

Every Elasticsearch deployment eventually hits a decision point. Maybe your nightly merge times are creeping up. Maybe your search latency went from 50 milliseconds to 300 after a data spike. Or maybe you are planning to add vector embeddings to an existing index and suddenly realize the cluster topology you have won't support it. This section is for the engineers and architects who are responsible for that next step — not the ones who can afford to wait until the cluster starts throwing circuit-breaking exceptions.

The timing matters because the Elasticsearch ecosystem has shifted in two major ways over the past few releases. First, tiered storage (hot-warm-cold-frozen) is no longer an optional feature; it is the default recommendation for any index that holds more than a few days of data. Second, vector search with dense embeddings has moved from experimental to production-ready, but it comes with its own memory and indexing cost profile. Teams that ignore these shifts often find themselves stuck with clusters that are either overprovisioned (paying for hot nodes to store cold data) or underperforming (queries timing out because shard counts were set years ago and never revisited).

Signs You Are at a Decision Point

Look for these signals: your disk usage is growing faster than your query volume; you have started skipping index rollovers because the process takes too long; or your team is manually moving indices between nodes because the ILM policy does not fit your access patterns. Another common sign is when a new use case — say, adding a recommendation engine or log analytics for a new service — triggers a debate about whether to scale vertically or horizontally. That debate is healthy, but it needs data. That is what we aim to provide here.

The qualitative benchmarks we reference come from observing dozens of cluster configurations in production-like environments. We did not run a formal lab study with controlled variables; instead, we collected patterns from community discussions, Elasticon talks, and our own experiments with test clusters. The numbers are directional, not absolute. Use them as starting points for your own testing.

The Landscape: Three Common Approaches

When teams decide to restructure their Elasticsearch deployment, they usually start from one of three positions. Each has a different set of assumptions about data volume, query patterns, and operational maturity. Understanding which camp you fall into helps narrow the options quickly.

Approach 1: Hot-Only with Forced Merges

This is the simplest setup. All data lives on hot nodes (SSD-backed, high memory). Indices are rolled over frequently (every few hours or daily), and old segments are force-merged to a single segment per shard. The advantage is predictable query performance — no tiering latency, no snapshot restore delays. The downside is cost: you are paying SSD prices for data that is rarely queried after the first week. Teams that choose this path typically have high query rates on recent data and can afford to archive older data outside Elasticsearch (e.g., in Parquet files on S3).

Approach 2: Hot-Warm-Cold with Searchable Snapshots

This is the current recommended pattern for time-series data. Hot nodes handle indexing and recent queries. Warm nodes use cheaper spinning disks or lower-cost SSDs for data that is queried occasionally. Cold nodes (or frozen tier) rely on searchable snapshots stored in object storage — S3, GCS, or Azure Blob. The trade-off is clear: lower storage cost at the expense of higher query latency on older data (seconds instead of milliseconds). Teams that adopt this approach must be comfortable with ILM policies and snapshot lifecycle management. The biggest pitfall we see is setting the rollover period too short, creating thousands of small indices that overwhelm the cluster state.

Approach 3: Hybrid Search with Dense Vectors

This is the newest pattern, driven by semantic search and RAG (retrieval-augmented generation) pipelines. In this setup, a portion of the cluster is dedicated to dense vector indices, often using the `dense_vector` field type with HNSW or IVF algorithms. The vectors are stored alongside traditional BM25 fields, allowing hybrid queries that combine keyword and semantic relevance. The challenge is that vector indices consume significantly more memory per document than inverted indices. A single float32 embedding of 768 dimensions takes about 3 KB per document — that adds up fast. Teams that go this route often need to separate vector-heavy indices onto nodes with higher RAM-to-disk ratios, and they must tune the HNSW `ef_construction` and `m` parameters carefully to balance indexing speed and search accuracy.

How to Compare the Options: Criteria That Matter

Choosing between these approaches is not about picking the one with the best marketing. It is about matching the trade-offs to your workload. We use four criteria: query latency profile, storage cost per GB, indexing throughput, and operational complexity. Each criterion has a different weight depending on your use case.

Query Latency Profile

Measure the 95th and 99th percentile latency for your most common queries. If your users expect sub-100 ms responses for all data, you cannot use searchable snapshots for the cold tier — restore times will push latency into seconds. On the other hand, if your older data is only accessed for monthly reports, a few seconds of latency is acceptable. Be honest about this. Many teams overestimate how often old data is queried and end up overprovisioning hot nodes.

Storage Cost per GB

Calculate the effective cost of storing one GB of data for one month, including replication. Hot nodes with SSDs and 2x replication might cost $0.50/GB/month in cloud instances. Warm nodes with HDDs might be $0.10/GB/month. Searchable snapshots on S3 can be as low as $0.02/GB/month, but you pay for each restore request. The break-even point depends on how often you query the cold data. If you query cold data more than once a week, the restore costs may exceed the savings.

Indexing Throughput

Some approaches degrade indexing performance. Forced merges on hot-only clusters consume I/O during merge operations, which can slow down ingestion. Hybrid search with vectors adds the overhead of building HNSW graphs during indexing. If your peak indexing rate is 10,000 docs/second, you need to test whether your chosen approach can sustain that without backpressure. We have seen teams adopt cold tiers only to discover that their ILM policy triggers a snapshot every hour, and the snapshot process competes with indexing for disk I/O.

Operational Complexity

This is the hardest to quantify but often the deciding factor. Hot-only is simple to operate: set a rollover policy, force-merge, and forget. Hot-warm-cold requires managing ILM policies, snapshot repositories, and potentially multiple node roles (data_hot, data_warm, data_cold). Hybrid search adds another layer: you need to decide on embedding dimensions, similarity metrics, and whether to use separate indices for vectors and keywords. Teams with a dedicated platform engineer can handle the complexity; small teams may struggle.

Trade-offs in Practice: A Structured Comparison

To make the trade-offs concrete, we constructed a composite scenario based on patterns we have seen in production. Imagine a team running a 10-node cluster ingesting 500 GB of log data per day, with a 30-day retention period. They query the last 7 days heavily (thousands of queries per second) and the remaining 23 days occasionally (a few hundred queries per day). They are considering three options.

Option A: Hot-Only with Daily Rollover

All 500 GB per day stays on hot nodes. With 2x replication and a 30-day retention, they need about 30 TB of hot storage. That is roughly 15 nodes with 2 TB SSDs each. Monthly storage cost: around $15,000 in a typical cloud. Query latency stays under 50 ms for all data. Indexing throughput is stable, but nightly force-merges consume significant I/O. Operational overhead is low — one ILM policy, no snapshots.

Option B: Hot-Warm-Cold with Searchable Snapshots

Hot nodes hold 7 days of data (3.5 TB raw, ~7 TB with replication). Warm nodes hold the next 14 days (7 TB raw, ~14 TB with replication). Cold tier stores the last 9 days as searchable snapshots on S3. Total hot storage: 7 TB (3-4 nodes). Warm storage: 14 TB (4-5 nodes with HDDs). Cold storage: 4.5 TB on S3. Monthly cost: hot ~$3,500, warm ~$1,400, S3 ~$90. Total ~$5,000. Query latency: recent data under 50 ms, warm data under 200 ms, cold data 2-5 seconds (first query after restore). Operational complexity is higher: ILM policies must be tuned to avoid snapshot backlogs, and snapshot restores need to be cached for repeated queries.

Option C: Hybrid Search with Vectors

This team wants to add semantic search to their logs. They generate 768-dimension embeddings for each log line. That adds about 3 KB per document, or 1.5 TB extra per day for embeddings alone. With 30-day retention, that is 45 TB of vector data. They would need to separate vector indices onto high-memory nodes (at least 64 GB RAM per node) and use a separate cluster for vector search to avoid interference with keyword queries. Cost roughly doubles compared to Option B, and query latency for hybrid queries increases by 20-50 ms due to the HNSW traversal. The benefit is improved recall for ambiguous queries — but only if the embedding model is well-tuned.

The table below summarizes the key trade-offs.

CriteriaOption AOption BOption C
Query latency (recent)<50 ms<50 ms<100 ms
Query latency (old)<50 ms200 ms - 5 s<100 ms (if cached)
Monthly storage cost~$15,000~$5,000~$10,000+
Indexing throughputHigh (merge overhead)High (snapshot overhead)Medium (embedding + HNSW)
Operational complexityLowMediumHigh

Implementation Path: Steps After You Choose

Once you have picked an approach, the real work begins. Implementation is where most teams stumble, not because the architecture is wrong, but because the migration steps are rushed. Here is a sequence that has worked well in practice.

Step 1: Baseline Your Current Cluster

Before changing anything, collect metrics: shard sizes, query latency percentiles, indexing rates, disk usage by tier, and snapshot restore times if you already use snapshots. Use the Elasticsearch monitoring APIs or a tool like Elastic Metrics. This baseline is your safety net. If something goes wrong during migration, you can compare against it.

Step 2: Set Up the New Tier or Index Structure in Parallel

Do not migrate in-place. Create a new index template with the desired ILM policy and shard configuration. Start indexing new data into the new structure while keeping the old indices running. This allows you to validate query performance and indexing throughput before cutting over. For tiered storage, configure the node roles (data_hot, data_warm, data_cold) on a subset of nodes first, and test that ILM moves indices correctly.

Step 3: Reindex Historical Data Gradually

Use the reindex API or a scroll query to copy old data into the new indices. Reindex in batches of a few days at a time, monitoring cluster health and disk usage. If you are switching to searchable snapshots, create the snapshot repository and take an initial snapshot of the old data before deleting anything. We have seen teams delete old indices only to realize later that the snapshot was corrupted or the restore policy was misconfigured.

Step 4: Tune ILM Policies and Shard Sizes

After migration, monitor the ILM transitions. Common issues: indices are too small (under 10 GB per shard) causing excessive shard count, or too large (over 100 GB per shard) causing slow merges. Aim for 20-50 GB per shard as a starting point. Adjust the rollover size and age based on your indexing rate. For vector indices, shard size is more constrained because HNSW graphs are loaded into memory per shard. Keep vector shards under 10 GB to avoid excessive heap usage.

Risks of Choosing Wrong or Skipping Steps

Every approach has failure modes. Knowing them upfront saves weeks of debugging.

Overprovisioning Hot Nodes

The most common mistake is keeping all data on hot nodes because it is simpler. The cost difference is stark. In the composite scenario, Option A costs three times more than Option B. That extra spend often comes out of the infrastructure budget, forcing cuts elsewhere. Worse, the team may never realize they are overpaying because the cluster works fine — until the finance team asks why the Elasticsearch bill doubled year over year.

Underestimating Snapshot Latency

Searchable snapshots sound great on paper, but the first query to a cold index can take 5-10 seconds if the snapshot needs to be restored from S3. If your application expects sub-second responses for all queries, you need to either keep that data on warm nodes or implement a caching layer. We have seen teams deploy searchable snapshots only to have their monitoring alerts fire because the p99 latency spiked to 10 seconds.

Ignoring Shard Count Limits

Each shard consumes cluster state memory. The recommended maximum is 1,000 shards per node, but many clusters hit performance issues at 500 shards per node if the shards are small. When using hot-warm-cold with frequent rollovers, it is easy to accumulate thousands of tiny indices. Monitor the `_cluster/health` endpoint for the `number_of_pending_tasks` — if it grows, you likely have too many shards.

Vector Search Memory Blowout

HNSW graphs are stored in heap memory. A vector index with 10 million documents and 768 dimensions can consume 2-3 GB of heap just for the graph. If you allocate that on a node that also handles keyword queries, garbage collection pauses will spike. The fix is to use dedicated nodes for vector indices and set `index.search.max_buckets` conservatively to prevent expensive aggregations on vector fields.

Frequently Asked Questions

How many shards should I use per index?

Start with 1-2 shards per GB of heap per node. For a 30 GB heap node, aim for 15-30 shards total across all indices on that node. Each shard should be 20-50 GB after merging. If your indices are smaller than 10 GB, consider using a single shard and relying on index rollover for scaling.

Should I use replicas for search performance or only for failover?

Replicas improve search throughput because queries can be distributed across replica shards. However, each replica doubles the storage cost. For read-heavy workloads, 1-2 replicas are common. For write-heavy workloads, 0-1 replicas may be sufficient. Monitor the cluster's search rate and adjust. If your search rate is low, one replica is enough for failover.

When should I use searchable snapshots vs. cold tier?

Searchable snapshots are best for data that is queried rarely (less than once a week) and where a few seconds of latency is acceptable. The cold tier (with dedicated cold nodes) is better for data that is queried occasionally (daily or weekly) and needs sub-second response after the first access. If your access pattern is unpredictable, use the cold tier with a warm cache.

How do I choose between HNSW and IVF for vector search?

HNSW offers faster search at the cost of higher memory and slower indexing. IVF is more memory-efficient and indexes faster but has lower recall for the same number of candidates. For most production use cases with up to 10 million vectors, HNSW is the default. For larger datasets (100 million+), IVF with a reasonable number of centroids can be more practical. Test both with your data and measure recall@k.

What is the best way to handle index lifecycle for time-series data?

Use ILM with a policy that rolls over based on size (e.g., 50 GB) or age (e.g., 1 day), whichever comes first. Set a `max_age` for the hot phase to move data to warm after 7 days, and a `max_age` for warm to move to cold or delete after 30 days. Avoid using `min_age` alone because it can create indices that are too large or too small. Monitor the `ilm` endpoint for any stuck indices.

Recommendation Recap Without Hype

No single approach is right for every team. The qualitative benchmarks we have shared point to a few practical guidelines. First, if your query latency requirements are uniform across all data and your budget allows, hot-only is the simplest path. Second, if you have time-series data with a clear access decay, hot-warm-cold with searchable snapshots will save significant cost without compromising performance for recent data. Third, if you need semantic search, evaluate hybrid search carefully — the memory and indexing overhead is real, and the benefits depend heavily on the quality of your embedding model.

Here are three specific next moves. One: run a cost analysis of your current cluster using your cloud provider's pricing calculator. Compare the cost of keeping all data on hot nodes versus moving older data to a warm or cold tier. Two: set up a test cluster with a representative data sample and measure query latency for each tier. Do not rely on vendor benchmarks — your data and query patterns are unique. Three: if you are considering vector search, start with a small index (100,000 documents) and measure indexing throughput and memory usage before scaling. The patterns you observe at small scale will inform your shard and node configuration at production scale.

Mapping the edge of Elasticsearch trends means knowing where your cluster's limits are — and choosing a path that keeps you on the right side of them. The benchmarks here are a starting point, not a destination. Test, measure, and adjust.

Share this article:

Comments (0)

No comments yet. Be the first to comment!