Deploy a Multi-node Elasticsearch Cluster with Docker Compose
--
A link to the companion GitHub repository is available at the bottom of this article.
Quick Intro to Elasticsearch
Why Elasticsearch?
Elasticsearch is one of the most popular and powerful search engines made available to the open source community today. Some of the major, consumer software companies utilize Elasticsearch, as well as the rest of the ELK (Elasticsearch, Logstash, Kibana) stack, to store/manage/query their data and logs. Elasticsearch’s multi-node cluster setup allows you to easily scale your Elasticsearch deployment as the size of your data grows and/or the number of queries increases. Elasticsearch can be scaled either vertically on the same server or horizontally across servers — although cross-server deployments in a production environment typically use Docker Swarm or Kubernetes, which falls outside the scope of this beginner-intended article. Typical Elasticsearch deployments also have a companion UI service called Kibana; which is an easy-to-use portal used to manage your Elasticsearch instance.
How does Elasticsearch work?
Elasticsearch is written in Java and built on top of Apache Lucene. Lucene is the “meat and potatoes” library for Elasticsearch, which handles the core indexing and search features.
When we “start” Elasticsearch, we are creating a node. A node is just an instance of Elasticsearch that stores data. We can have as many nodes as we want in our Elasticsearch cluster, which is just a collection of nodes. Each node stores part of the cluster’s data.
Elasticsearch stores data as documents. A document is stored as a JSON object (in Apache Lucene) that separates its data into fields and given a unique ID. Documents are stored in indices (an index). An index is a logical grouping of documents; conceptually similar to (but not really) a table in a relational database where each row would be a document.
Indices are distributed into pieces across nodes in a process called sharding, where each piece is called a shard. The main function of sharding is:
- store more documents in a index
- enable queries to be distributed and…