IT Cloud. Eugeny Shtoltc
Чтение книги онлайн.
Читать онлайн книгу IT Cloud - Eugeny Shtoltc страница 18
Comparison of ElasticSearch and Sold (systems are comparable):
Elastic:
** Commercial with open source and the ability to commit (via approval);
** Supports more complex queries, more analytics, out of the box support for distributed queries, more complete REST-full JSON-BASH, chaining, machine learning, SQL (paid);
*** Full-text search;
*** Real-time index;
*** Monitoring (paid);
*** Monitoring via Elastic FQ;
*** Machine learning (paid);
*** Simple indexing;
*** More data types and structures;
** Lucene engine;
** Parent-child (JOIN);
** Scalable native;
** Documentation from 2010;
Solr:
** OpenSource;
** High speed with JOIN;
*** Full-text search;
*** Real-time index;
*** Monitoring in the admin panel;
*** Machine learning through modules;
*** Input data: Work, PDF and others;
*** Requires a schema for indexing;
*** Data: nested objects;
** Lucene engine;
** JSON join;
** Scalable: Solar Cloud (setting) && ZooKeeper (setting);
** Documentation since 2004.
At the present time, micro-service architecture is increasingly used, which allows due to weak
the connectivity between their components and their simplicity to simplify their development, testing, and debugging.
But in general, the system becomes more difficult to analyze due to its distribution. To analyze the condition
in general, logs are used, collected in a centralized place and converted into an understandable form. Also arises
the need to analyze other data, for example, access_log NGINX, to collect metrics about attendance, mail log,
mail server to detect attempts to guess a password, etc. Take ELK as an example of such a solution. ELK means
a bunch of three products: Logstash, Elasticsearch and Kubana, the first and last of which are heavily focused on the central and
provide ease of use. More generally ELK is called Elastic Stack, since the tool for preparing logs Logstash
can be replaced by analogs such as Fluentd or Rsyslog, and the Kibana renderer can be replaced by Grafana. For example, although
Kibana provides great analysis capabilities, Grafana provides notifications when events occur, and
can be used in conjunction with other products, for example, CAdVisor – analysis of the state of the system and individual containers.
EKL products can be self-installed, downloaded as self-contained containers for which you need to configure
communication or as a single container.
For Elasticsearch to work properly, you need the data to come in JSON format. If the data is submitted to
text format (the log is written in one line, separated from the previous one by a line break), then it can
provide only full-text searches as they will be interpreted as one line. For transmission
logs in JSON format, there are two options: either configure the product under investigation to be output in this format,
for example, for NGINX there is such a possibility. But, often this is impossible, since there is already
the accumulated database of logs, and traditionally they are written in text format. For such cases, it is necessary
post processing of logs from text format to JSON, which is handled by Logstash. It is important to note that if
it is possible to immediately transfer data in a structured form (JSON, XML and others), then this follows
do, because if you do detailed parsing, then any deviation is a one-sided deviation from the format
will lead to inoperability, and if superficial – we lose valuable information. Anyway, parsing in
this system is a bottleneck, although it can be scaled to a limited extent to a service or log
file. Fortunately, more and more products are starting to support structured logging, such as
the latest versions of NGINX support logs in JSON format.
For systems that do not support this format, you can use the conversion to it using such
programs like Logstash, File bear and Fluentd. The first one is included in the standard Elastic Stack delivery from the vendor
and can be installed in one way ELK in Docker – container. It supports fetching data from files, network and
standard stream both at the input and at the output, and most importantly, the native Elastic Search protocol.
Logstash monitors log files based on modification date or receives over the network telnet data from a distributed
systems, for example, containers and, after transformation, it is sent to the output, usually in Elastic Search. It is simple and
comes standard with the Elastic Stack, making it easy and hassle-free to configure. But thanks to
Java machine inside is heavy and not very functional, although it supports plugins, for example, synchronization with MySQL
to send new data. Filebeat provides slightly more options. An enterprise tool for everything
cases of life can serve Fluentd due to its high functionality (reading logs, system logs, etc.),
scalability and the ability to roll out across Kubernetes clusters using the Helm chart, and monitor everything
data center in the standard package, but about this relevant section.
To manage logs, you can use Curator, which can archive old ones from ElasticSearch
logs or delete them, increasing the efficiency of its work.
The process of obtaining logs is logical carried out by special collectors: logstash, fluentd, filebeat or
others.