In this article i’ll put some basic information about Elasticsearch, and what you should know about it
Node, index, shards, replicas…Lets start
Elasticsearch – it’s a open-source project, search engine, real time search platform and NoSQL database. In this case you should know difference between SQL and NoSQL databases. Major difference:
- SQL it’s a Relational database, NoSQL primarily a non-relational or distributed DB
- Also SQL vertically scalable DB scale increasing power of hardware , NoSQL horizontal scalable DB, scaling increasing number of servers
- DB SQL bases on table view with predefined schema, NoSQL are document based with key-value pairs.
Cluster – it’s a collection of one or more nodes(basically it’s your VMs/servers), where located your data. Cluster have own name, and using this name you can connect additional nodes to your cluster. By default cluster named ‘elasticsearch’.
Node – its can be virtual machine, or your server, that stored your data, and working in searching process. When you started a node, to node assigned a random Universally Unique Identifier (UUID). This Id very important when you want to add some policy or logic in Elasticsearch work.
Index – it’s collection of documents (basically your data) that grouped by some logic, for example product catalog, users information, logs of your build process etc. Your index name should be lowercase, it’s requirement of Elasticsearch. Also you’ll be use this name when writing requests to data, like update, delete, modify data.
Document – basically it’s your data, that stored in JSON format. It’s one record in your database. For example record about one object that have some properties like size, color, count.
Shards & Replicas – it’s a very interesting thing in Elasticsearch, when you have a lot of data, dozens of Gb, your hardware will be very limit. For example one index with billion of documents, that take 1Tb can be very slow for search request on single node. So as solution of this problem Elasticsearch provides the ability to subdivide index into multiple pieces called shards. You can easy define your number of shards when create index. Shard allow you:
- Horizontally scale and split you data
- Increase your high availability and better high tolerance.