Elasticsearch in Production: Search & Observability
Elasticsearch in Production: Search & Observability
Why Elasticsearch, and What It’s Good For
Elasticsearch is a distributed search and analytics engine built on the Apache Lucene library. For the high-volume fintech and insurance platforms we build in Abidjan, it excels at three use cases that often exceed what a traditional relational database handles well: full-text search, analytical aggregations, and log and metrics collection (observability) via the Elastic Stack.
Full-text search is its historic strength: typo tolerance, relevance scoring, prefix search, autocomplete, and language handling (analyzers, stemming). Aggregations let you compute sums, averages, percentiles, and histograms over millions of documents in near real time. Finally, paired with Logstash or Beats/Elastic Agent and Kibana, the suite forms an observability stack to explore logs, traces, and metrics.
Core Concepts to Master
A few notions underpin any serious deployment:
- Index: a collection of JSON documents, the logical equivalent of a table. Documents have no rigid schema but benefit from a mapping.
- Mapping: the definition of fields and their types (text, keyword, date, numeric, geo). Choosing between text (analyzed for search) and keyword (exact, for sorting and aggregations) is a critical design decision.
- Shards: each index is split into primary shards, enabling horizontal distribution across nodes. The number of primary shards is fixed at index creation.
- Replicas: copies of primary shards that provide high availability and increase read throughput.
A common anti-pattern is over-sharding: too many small shards consume memory and degrade performance. For time-growing data (logs, transactions), prefer time-based indices managed by lifecycle policies (ILM) and data streams.
Proven Architecture Patterns
The guiding principle we apply consistently: Elasticsearch is a read model, not the source of truth. Canonical data stays in a transactional database (PostgreSQL, for example), and Elasticsearch is a read-optimized projection of it.
- Change Data Capture (CDC): synchronize the primary database into Elasticsearch via an event stream (Debezium/Kafka) or indexing jobs. This decouples transactional writes from indexing.
- Dedicated read models: denormalize data into documents shaped for the UI’s queries, avoiding expensive joins at read time.
- Audit trails and logs: Elasticsearch is excellent for storing and exploring append-only events (audit trails, application logs), with retention handled by ILM.
Honest Caveats
No tool is free. Here is what a CTO must internalize before committing:
- It is not a primary database. No multi-document ACID transactions, no referential integrity constraints. Never store critical data only in Elasticsearch.
- Eventual consistency. Indexing is near real time (by default, a periodic refresh of roughly one second); a written document is not instantly searchable. Design for this explicitly.
- Operational and resource cost. The cluster is hungry for RAM (JVM heap, page cache) and disk. Managing upgrades, shards, backups (snapshots), and security requires real expertise.
- Security. A misconfigured, exposed cluster is a major data-breach risk. Authentication, TLS, and access control are non-negotiable.
Conclusion
Properly scoped, Elasticsearch transforms a platform’s search experience and analytical capability. Poorly scoped, it becomes operational debt. The key is to treat it as a read projection, cleanly fed from your source of truth.
ProCode Legion, an elite software-engineering firm based in Abidjan, designs and operates this kind of architecture for fintech and insurance platforms across francophone Africa. Let’s discuss your use case: talk to our engineers.