Skip to content

Elasticsearch — Basics

  • Distributed search & analytics engine. Built on Apache Lucene.
  • JSON over HTTP API. Schemaless-ish (with mappings).
  • Use cases: full-text search, log/metric aggregation (ELK), geo search, vector search, observability.
  • Cluster — set of nodes.
  • Node — single ES instance. Roles: master, data, ingest, coordinating, ml.
  • Index — logical collection of documents (~ DB table).
  • Document — JSON record (~ row).
  • Shard — Lucene index. Primary + replicas.
  • Mapping — schema: field types, analyzers.
  • Core data structure for full-text search. Maps term → list of docs containing it.
  • Built per shard. Tokens come from analysis (analyzer).
  • Term dictionary + posting list. Posting list also stores positions (for phrase queries) and offsets (for highlighting).

Pipeline: char filterstokenizertoken filters.

  • Char filter: strip HTML, replace patterns.
  • Tokenizer: splits text — standard, whitespace, keyword, pattern, ngram, edge_ngram.
  • Token filter: lowercase, stop, stemmer, synonym, asciifolding, ngram.

Standard analyzer: tokenize on word boundaries, lowercase, no stemming.

  • text — analyzed, tokenized, full-text searchable, NOT sortable/aggregable directly.
  • keyword — exact value, sortable, aggregable, used for filters.
  • integer/long/short/byte/float/double/scaled_float.
  • date — ISO8601 or epoch.
  • boolean, geo_point, geo_shape, ip, binary.
  • nested — array of objects, queryable as separate docs.
  • object — flattened by default (loses array semantics).
  • dense_vector, sparse_vector — for KNN/vector search.

Dual-mapping pattern (multi-fields):

{ "title": { "type":"text", "fields": { "raw": { "type":"keyword" } } } }
  • Match — analyzed query (full-text).
  • Term — exact value (use on keyword/numeric, not text).
  • Rangegte/lte.
  • Bool — combine with must (AND, scoring), should (OR, scoring), filter (AND, no scoring), must_not.
  • Multi-match — query across multiple fields with weight (title^3).
  • Function score / Rank features — boost based on numeric fields, decay, script.
  • kNN — vector similarity (since 8.0).
{
"query": {
"bool": {
"must": { "match": { "title": "rental dubai" } },
"filter": [
{ "term": { "status": "active" } },
{ "range": { "price": { "lte": 5000 } } }
]
}
}
}
  • Bucket — group: terms, range, date_histogram, histogram, geohash_grid.
  • Metric — compute: avg, sum, min, max, cardinality, percentiles, stats.
  • Pipeline — agg over agg results (moving avg, derivative).
  • Primary shard stores data; replicas for HA + read scaling.
  • Default: 1 primary, 1 replica (per index, since 7.x).
  • Allocation: master node assigns shards. Use cluster.routing.allocation.awareness for rack/zone awareness.
  • Refresh: in-memory buffer flushed to searchable segment every 1s (default). Near-real-time, not real-time.
  • Translog: write-ahead log for durability between segment flushes.
Terminal window
PUT /products
PUT /products/_mapping { ... }
POST /products/_doc { "name":"x" }
PUT /products/_doc/123 { ... }
GET /products/_doc/123
POST /products/_update/123 { "doc": { "price": 99 } }
DELETE /products/_doc/123
POST /products/_search { "query": {...} }
POST /_bulk
  • Kibana — UI, dashboards, dev tools.
  • Logstash, Beats (Filebeat, Metricbeat) — ingestion.
  • Ingest pipelines (in ES) — lightweight transform on index.
  • ILM (Index Lifecycle Management) — hot/warm/cold/frozen tiers + delete.