Every Elasticsearch cluster faces a fundamental tension: you want to index data as fast as possible, but you also need queries to return in milliseconds. Optimizing for one often degrades the other. In this guide, we draw on patterns from the Bayview production environment—a high-throughput indexing pipeline serving both real-time analytics and search—to walk through the key trade-offs and how to navigate them.
Why the indexing-query speed trade-off matters now
Elasticsearch is no longer just a search engine; it's the backbone of observability, security analytics, and e-commerce personalization. In these roles, the balance between write speed and read performance directly impacts user experience and operational costs. A common scenario: a team sets up a cluster for log ingestion, indexing thousands of events per second, only to find that dashboard queries time out. Or the reverse—a search application with sub-second query requirements struggles to keep up with batch updates during peak hours.
The root cause is that Elasticsearch's internal architecture—segments, merges, caches, and translog—creates inherent trade-offs. For example, increasing the refresh interval reduces segment count and merge pressure, speeding up indexing, but it also means queries see older data. Similarly, using keyword fields with eager global ordinals speeds up aggregations but adds indexing overhead. Understanding these mechanics is essential for making informed decisions.
In the Bayview environment, we've seen teams adopt one-size-fits-all defaults that work for neither extreme. The goal of this article is to provide a framework for diagnosing your workload's dominant pattern—write-heavy, read-heavy, or balanced—and then tuning accordingly.
Who should read this
This guide is for engineers and SREs who manage Elasticsearch clusters in production. If you've ever wondered why your indexing rate drops when you enable fielddata or why queries slow down after a bulk load, you'll find practical explanations and strategies here.
Core idea: indexing and query speed are inversely related
At its simplest, the trade-off works like this: every optimization that makes indexing faster tends to make queries slower, and vice versa. This is because Elasticsearch stores data in immutable segments that are periodically merged. Indexing creates new segments; merging consolidates them. Queries must scan all segments that contain relevant data. More segments mean faster indexing (less merge work) but slower queries (more files to open). Fewer segments mean faster queries but slower indexing because merges are more aggressive.
But the trade-off goes deeper. Consider refresh intervals. The refresh_interval controls how often new segments become visible to search. A shorter interval (e.g., 1s) makes data available sooner, but increases segment count and merge overhead. A longer interval (e.g., 30s) reduces segment count, improving query speed, but delays data visibility. In Bayview's log pipeline, we use a 30s refresh for bulk ingestion and a 1s refresh for critical alerts—splitting the cluster by data criticality.
Another core lever is the number of shards. More shards distribute indexing load across nodes, increasing write throughput. But each shard is a separate Lucene index, and queries must fan out to all matching shards. Too many shards increase query latency due to coordination overhead. Bayview's rule of thumb: keep shard count per node under 20 per GB of heap, and size shards between 10–50 GB.
Document structure matters
Fields that require fielddata or norms add indexing cost. For example, enabling fielddata on a text field for aggregations loads the entire field into memory during indexing, slowing writes. Similarly, doc_values (used for sorting and aggregations) are compressed on disk and built during indexing—they speed up queries but add indexing overhead. Bayview's approach: disable norms on fields that don't need scoring, and use doc_values only when required for aggregations.
How it works under the hood: segments, merges, and caches
To make informed trade-offs, you need to understand the internals. Elasticsearch uses Apache Lucene, which stores data in segments—immutable, inverted-index files. When you index a document, it's written to a new segment in memory and then flushed to disk. Over time, many small segments accumulate. Lucene merges them in the background into larger segments, which reduces file handles and improves query performance because queries scan fewer segments.
Merge policies control when and how merges happen. The default tiered policy merges segments of similar size, which works well for most workloads. But in a write-heavy environment, merges can become a bottleneck, consuming I/O and CPU. You can tune merge settings like segments_per_tier and max_merged_segment to trade off merge frequency against segment count. In Bayview's production, we increased max_merged_segment to 5 GB to reduce large merges during peak hours, accepting slightly more small segments.
Caches also play a role. The request cache caches query results for the filter context, speeding up repeated queries. But it's invalidated on every index, write, or refresh. In a high-throughput indexing cluster, the request cache is nearly useless because it's constantly cleared. Bayview disables request caching on indices that receive continuous writes, saving memory. The shard query cache, on the other hand, caches segment-level results and is less frequently invalidated; it's useful for read-heavy clusters.
Translog and durability
The translog is Elasticsearch's write-ahead log, ensuring durability. Every indexing request is written to the translog before being acknowledged. The index.translog.durability setting can be request (fsync on every request, slower) or async (fsync periodically, faster but riskier). In Bayview, we use async with a 5-second sync interval for non-critical logs, but request for transactional data.
Worked example: tuning a log ingestion pipeline
Let's walk through a concrete scenario. Suppose you have a cluster ingesting 50,000 log events per second, with queries for recent data (last hour) and aggregations over time. The default settings give you 5 shards per index, 1s refresh, and request translog durability. Indexing rate is 20,000 events/s—too slow. Queries are fast but data is visible immediately.
First, we increase the number of shards to 15 to distribute the indexing load. This improves indexing to 35,000 events/s, but query latency rises by 20% because each query now hits more shards. Next, we increase the refresh interval to 30s. Indexing jumps to 45,000 events/s, and query latency drops back to near-baseline because fewer segments are created. The trade-off: data is now up to 30 seconds stale, which is acceptable for logs.
We then switch translog durability to async with a 5s sync interval. Indexing reaches 55,000 events/s—exceeding the target. Query performance remains stable. Finally, we disable norms on all fields except message (which needs scoring for full-text search) and disable doc_values on fields not used in aggregations. This reduces indexing overhead by another 10%.
After these changes, the cluster handles 55,000 events/s with sub-second query latency for the last-hour range. The cost: data staleness up to 30 seconds, and a slight increase in shard count. This is a classic trade-off that works for logging but would fail for a real-time alerting system.
Key decisions in this scenario
- Shard count: from 5 to 15, improving write throughput at the cost of query fan-out.
- Refresh interval: from 1s to 30s, reducing segment count and merge pressure.
- Translog durability: from
requesttoasync, gaining write speed but risking data loss on node failure. - Field-level optimizations: disabling
normsanddoc_valueswhere unnecessary.
Edge cases and exceptions
The trade-offs are not always linear. Some workloads break the pattern. For example, if your queries are all filter-based on a single field (like timestamp), increasing shards may not hurt query performance much because Elasticsearch can prune shards using the routing field. Similarly, if your indexing is bursty (e.g., daily batch loads), you can temporarily increase shards and refresh intervals during the load, then revert for query-heavy periods.
Another edge case: using index.sort to pre-sort documents on disk. This speeds up range queries and aggregations but adds indexing overhead because documents must be sorted during flush. In Bayview's time-series indices, we sort by @timestamp descending, which makes recent-data queries extremely fast, but indexing throughput drops by about 15%. That's acceptable for the use case.
Hardware also changes the equation. On SSD-backed clusters, merge I/O is less of a bottleneck, so you can afford more aggressive merging. On spinning disks, merges can saturate I/O, forcing you to trade query speed for indexing throughput. Bayview's production uses NVMe SSDs, which allows us to run with default merge settings even under high write load.
One more exception: the search_idle setting. When a shard receives no searches for a while, Elasticsearch may reduce its resources. In a cluster with mixed write and query traffic, idle shards can cause query latency spikes when traffic resumes. Bayview avoids this by setting index.search.idle.after to a high value (like 5 minutes) on indices with steady query traffic.
When the trade-off doesn't apply
If your cluster is massively over-provisioned (e.g., 10 nodes for a workload that needs 3), you may not see the trade-off because resources are abundant. But this is wasteful. The trade-off is most visible when hardware is the bottleneck—which is the norm in cost-conscious environments.
Limits of the approach
No amount of tuning can overcome a fundamentally mismatched architecture. If your indexing rate exceeds the write capacity of your nodes, you need to scale horizontally or reduce the data volume (e.g., by sampling or pre-aggregation). Similarly, if your queries require scanning billions of documents, no shard tuning will make them fast—you need to rethink your data model, use rollups, or add a caching layer.
Another limit: the trade-offs are interdependent. Changing one setting often forces adjustments elsewhere. For example, increasing shard count to improve indexing may require reindexing to change the number of primary shards, which is a heavy operation. Bayview's advice: test changes on a staging cluster before applying to production, and monitor both indexing and query metrics during the rollout.
There's also the human limit: complexity. With dozens of settings—from index.merge.scheduler.max_thread_count to indices.memory.index_buffer_size—it's easy to over-optimize. Bayview's philosophy: start with defaults, identify the bottleneck (CPU, I/O, memory, or network), then change only the settings that address that bottleneck. Over-tuning can lead to fragile systems that break when workload patterns shift.
When to seek external help
If your cluster is consistently at 80%+ resource utilization and you've tried the common levers, consider consulting Elastic's professional services or a specialist. They can perform a deep audit of your mapping, shard strategy, and hardware sizing.
Reader FAQ
Should I always use the default merge policy?
Not necessarily. The default tiered policy works for most workloads, but if your indexing is bursty, consider log_byte_size or log_doc policies that merge more aggressively during idle periods. Bayview uses tiered with custom thresholds for large indices.
How do I choose between more shards and more nodes?
Adding nodes is usually better than adding shards, because nodes provide more CPU and memory for both indexing and queries. Shards add overhead. Aim for 20–40 shards per node, and scale nodes first. Bayview's rule: if you need more than 50 shards per node, add a node instead.
Can I use index templates to apply different trade-offs per index?
Yes. Index templates allow you to set per-index settings like number_of_shards, refresh_interval, and translog.durability. This is ideal for mixed workloads. Bayview uses separate templates for logs (write-optimized) and search indices (read-optimized).
What monitoring metrics should I watch?
Key metrics: indexing rate (docs/s), query latency (p50, p99), merge rate and time, segment count per shard, and disk I/O utilization. Tools like Elastic's monitoring UI or Prometheus with the Elasticsearch exporter give you these. Bayview sets alerts when merge time exceeds 10% of indexing time.
Is there a one-size-fits-all configuration?
No. The optimal configuration depends on your data, query patterns, and hardware. However, a good starting point for write-heavy workloads: 30s refresh, async translog with 5s sync, 20 shards per node, and disable norms and doc_values on unused fields. For read-heavy: 1s refresh, request translog, fewer shards (10 per node), and enable eager global ordinals for keyword fields.
What's the biggest mistake teams make?
Over-sharding. Many teams create indices with 100+ shards thinking it will speed up indexing, but it kills query performance and increases cluster management overhead. Start with fewer shards and scale nodes.
How do I handle time-series indices with rolling indices?
Use ILM (Index Lifecycle Management) to automatically roll over indices and apply different settings per phase. For example, the hot phase can have 1s refresh and high shard count, while the warm phase reduces shards and increases refresh interval. Bayview uses ILM with three phases: hot (write-optimized), warm (balanced), and cold (read-optimized with force merge).
Comments (0)
Please sign in to post a comment.
Don't have an account? Create one
No comments yet. Be the first to comment!