Elasticsearch — Basics
Elasticsearch — Basics
Section titled “Elasticsearch — Basics”What it is
Section titled “What it is”- Distributed search & analytics engine. Built on Apache Lucene.
- JSON over HTTP API. Schemaless-ish (with mappings).
- Use cases: full-text search, log/metric aggregation (ELK), geo search, vector search, observability.
Core concepts
Section titled “Core concepts”- Cluster — set of nodes.
- Node — single ES instance. Roles: master, data, ingest, coordinating, ml.
- Index — logical collection of documents (~ DB table).
- Document — JSON record (~ row).
- Shard — Lucene index. Primary + replicas.
- Mapping — schema: field types, analyzers.
Inverted index
Section titled “Inverted index”- Core data structure for full-text search. Maps term → list of docs containing it.
- Built per shard. Tokens come from analysis (analyzer).
- Term dictionary + posting list. Posting list also stores positions (for phrase queries) and offsets (for highlighting).
Analyzer
Section titled “Analyzer”Pipeline: char filters → tokenizer → token filters.
- Char filter: strip HTML, replace patterns.
- Tokenizer: splits text —
standard,whitespace,keyword,pattern,ngram,edge_ngram. - Token filter: lowercase, stop, stemmer, synonym, asciifolding, ngram.
Standard analyzer: tokenize on word boundaries, lowercase, no stemming.
Field types (mapping)
Section titled “Field types (mapping)”text— analyzed, tokenized, full-text searchable, NOT sortable/aggregable directly.keyword— exact value, sortable, aggregable, used for filters.integer/long/short/byte/float/double/scaled_float.date— ISO8601 or epoch.boolean,geo_point,geo_shape,ip,binary.nested— array of objects, queryable as separate docs.object— flattened by default (loses array semantics).dense_vector,sparse_vector— for KNN/vector search.
Dual-mapping pattern (multi-fields):
{ "title": { "type":"text", "fields": { "raw": { "type":"keyword" } } } }Query DSL
Section titled “Query DSL”- Match — analyzed query (full-text).
- Term — exact value (use on keyword/numeric, not text).
- Range —
gte/lte. - Bool — combine with
must(AND, scoring),should(OR, scoring),filter(AND, no scoring),must_not. - Multi-match — query across multiple fields with weight (
title^3). - Function score / Rank features — boost based on numeric fields, decay, script.
- kNN — vector similarity (since 8.0).
{ "query": { "bool": { "must": { "match": { "title": "rental dubai" } }, "filter": [ { "term": { "status": "active" } }, { "range": { "price": { "lte": 5000 } } } ] } }}Aggregations
Section titled “Aggregations”- Bucket — group:
terms,range,date_histogram,histogram,geohash_grid. - Metric — compute:
avg,sum,min,max,cardinality,percentiles,stats. - Pipeline — agg over agg results (moving avg, derivative).
Cluster basics
Section titled “Cluster basics”- Primary shard stores data; replicas for HA + read scaling.
- Default: 1 primary, 1 replica (per index, since 7.x).
- Allocation: master node assigns shards. Use
cluster.routing.allocation.awarenessfor rack/zone awareness. - Refresh: in-memory buffer flushed to searchable segment every 1s (default). Near-real-time, not real-time.
- Translog: write-ahead log for durability between segment flushes.
Common operations
Section titled “Common operations”PUT /productsPUT /products/_mapping { ... }POST /products/_doc { "name":"x" }PUT /products/_doc/123 { ... }GET /products/_doc/123POST /products/_update/123 { "doc": { "price": 99 } }DELETE /products/_doc/123POST /products/_search { "query": {...} }POST /_bulkTooling
Section titled “Tooling”- Kibana — UI, dashboards, dev tools.
- Logstash, Beats (Filebeat, Metricbeat) — ingestion.
- Ingest pipelines (in ES) — lightweight transform on index.
- ILM (Index Lifecycle Management) — hot/warm/cold/frozen tiers + delete.