This role will install elasticsearch from the official repository.

Mandatory variables

You must define the elastic repository major version with this form: "N.x" where N is the major version, for example:

elastic_major_version: "7.x"

Currently the 3 supported versions are "7.x", "6.x" and "5.x", for other version, you need to create a new jvm.options template file.

This variable is also used for other elastic.co tools like logstash.

You also must define the elasticsearch clustername, please respect the naming convention (mode is either "prod" or "test"):

elasticsearch_clustername: "center-stats-mode-client"

ElasticSearch optional variables

The variable you are the most likely to change is the JAVA heap size, the default variable is:

elasticsearch_heap_size: "10g"

Other variables have sane default and should not be changed except if you know what you are doing:

elasticsearch_node_name: "${HOSTNAME}"
elasticsearch_node_name_path_data: "/var/lib/elasticsearch"
elasticsearch_node_name_path_logs: "/var/log/elasticsearch"
elasticsearch_node_name_network_host: "_site_"
elasticsearch_cluster_routing_allocation:
  cluster_concurrent_rebalance: 4
  node_concurrent_recoveries: 4
  node_initial_primaries_recoveries: 8

An finaly, you can define any arbitrary variables using the object "elasticsearch_additional_config" like that for version 6.x:

elasticsearch_additional_config:
  action.destructive_requires_name: "true"
  script.painless.regex.enabled: "true"
  script.max_compilations_rate: "120/1m" # this is specific for 6.x

or an other example for 5.x:

elasticsearch_additional_config:
  action.destructive_requires_name: "true"
  script.painless.regex.enabled: "true"
  script.max_compilations_per_minute: "1000" # this is specific for 5.x

Update

To perform an update, add this to the command line: --extra-vars '{ "elasticsearch_update_now" : true }'.

You still have to double check the different settings between major version if you are doing a major update. For minor ones, the update should be painless.

systemd service optional variables

To modify the systemd service for elasticsearch, the official documentation (at https://www.elastic.co/guide/en/elasticsearch/reference/master/setting-system-settings.html) explains that a systemd override file must be used.

This role uses an override file to change the following default values:

LimitNOFILE: "655360"		# same as ulimit -n
LimitNPROC: "4096"		# same as ulimit -u
LimitMEMLOCK: "infinity"	# same as ulimit -l

You can override any of those 3 settings with this variable (undefined variable will use the default from above):

elasticsearch_systemd_override:
  LimitNOFILE: "655360"
  LimitNPROC: "4096"

Clustering

Your cluster must have an odd number of master nodes with a quorum of 1/2 + 1 node (minimum of 3 nodes, quorum of 2), this is necessary to avoid data loss. Look at the official documentation for more details.

To define a master only node, you must specify this:

(!) replace the expected_data_nodes with the number of data nodes that must be up to start the cluster, without replication, this is all your data nodes.

elasticsearch_node:
  master: "true"
  data: "false"
  ingest: "false"
elasticsearch_gateway:
  expected_data_nodes: "3"

For a data node, use this instead:

elasticsearch_node:
  master: "false"

You also need to define on every node the cluster topology, with the DNS of every nodes and the minimum number of master to start the cluster (= the quorum), this is an example:

elasticsearch_additional_config:
  discovery.zen.ping.unicast.hosts: '[ "center-stats-prod-o2k-1.cosium.com", "center-stats-prod-o2k-2.cosium.com", "center-stats-prod-o2k-3.cosium.com", "center-stats-prod-o2k-4.cosium.com", "center-stats-prod-o2k-5.cosium.com", "center-stats-prod-o2k-6.cosium.com" ]'
  discovery.zen.minimum_master_nodes: "2"

If you already defined elasticsearch_additional_config, just add those settings to the already defined variables.

Security

By default, there is absolutely no security restricting the access to the elasticsearch instance from anywhere. The only protection is the network.

To protect the instance, use iptables rule with the firewall role.

JMX monitoring

By default, JMX monitoring is active and listening on port 8301. You also must protect this port using a firewall rule because it is not protected by a login/pass. I tried to use the classic JMX login/password mechnism but for an unknown reason, this doesn't work.

You can deactivate the JMX monitoring by setting this variable to False:

elasticsearch_jvm_monitoring: False

Useful curl command

General informations:

# curl http://localhost:9200/ {#curl-httplocalhost9200}
{
  "name" : "infra-log-elasticsearch-1",
  "cluster_name" : "infra-prod",
  "cluster_uuid" : "kbEf8yXQT1amAZrKhGZbTg",
  "version" : {
    "number" : "7.5.0",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "e9ccaed468e2fac2275a3761849cbee64b39519f",
    "build_date" : "2019-11-26T01:06:52.518245Z",
    "build_snapshot" : false,
    "lucene_version" : "8.3.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

Show indices:

# curl http://localhost:9200/_cat/indices?v {#curl-httplocalhost9200-catindicesv}
health status index                            uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   filebeat-7.5.0-2019.12.09-000001 2rCN7-qPQrS-HKG1tGPwvQ   1   1          0            0       460b           230b
green  open   .kibana_task_manager_1           GFzoyVwfQvOaolx46qlCaw   1   1          2            1     32.5kb         16.2kb
green  open   .apm-agent-configuration         zVcE8tJWT_63J-tX1zcx-A   1   1          0            0       566b           283b
green  open   .kibana_1                        LxaUmUqpR6ibZOXlbrNmhw   1   1       1058           44        1mb          514kb

Show mappings:

curl http://localhost:9200/_mapping

curl http://localhost:9200/filebeat-7.5.0-2019.12.05-000001/_mapping | jq .

Delete an indice or sevferal indices:

curl -X DELETE "localhost:9200/filebeat-7.5.0?pretty"

curl -XDELETE 'http://localhost:9200/filebeat-*'

Import a template:

filebeat export template > filebeat.template.json
curl -XPUT -H 'Content-Type: application/json' http://localhost:9200/_template/filebeat-7.5.0 -d@filebeat.template.json

See the ILM status if it exists:

curl -s http://localhost:9200/filebeat-7.5.0-2019.12.09-000001/_ilm/explain| jq .

Shards

Show shards status:

curl -s 'http://localhost:9200/_cat/shards'

Explain shards allocation issues:

curl -s "http://localhost:9200/_cluster/allocation/explain" | jq .

Retry failed shards allocation:

curl -X POST -s 'http://localhost:9200/_cluster/reroute?retry_failed=true'

Upgrade of an elasticsearch cluster

https://tracker.cosium.com/browse/IT-8876

Rolling upgrade works and will allow full access to the cluster while updating but is very time consuming, method is here: https://www.elastic.co/guide/en/elasticsearch/reference/current/rolling-upgrades.html

Full cluster restart upgrade is faster, but it means shutting down all nodes in the cluster, method is here: https://www.elastic.co/guide/en/elasticsearch/reference/current/restart-upgrade.html

Summary for full cluster restart upgrade:

1/ disable allocation via

curl -X PUT "localhost:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d'
{
  "persistent": {
    "cluster.routing.allocation.enable": "primaries"
  }
}
'

For cluster with not too many indices, you can also change the cluster configuration to not move index until a node is down for more than 10 minutes, be careful, this can take a while to apply because it will need to be applied to all indices:

curl -X PUT -u elastic:xxx "localhost:9200/_all/_settings?pretty" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "index.unassigned.node_left.delayed_timeout": "10m"
  }
}
'

2/ stop all nodes

3/ apt update && apt dist-upgrade && apt autoremove -y

4/ start all nodes

5/ wait for the status to turn yellow by checking curl -s http://localhost:9200/_cluster/health | jq and for curl -X GET "localhost:9200/_cat/recovery?pretty" to return existing_store on every lines

6/ re-enable shard allocation via:

curl -X PUT "localhost:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d'
{
  "persistent": {
    "cluster.routing.allocation.enable": null
  }
}
'

If you changed the delayed_timeout value, reset it too:

curl -X PUT -u elastic:xxx "localhost:9200/_all/_settings?pretty" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "index.unassigned.node_left.delayed_timeout": null
  }
}
'

7/ Since ES version 7.X, ths cluster should come back resonably quickly.

Securing Elasticsearch with login/pass

By default, Elasticsearch is not secured via login/pass, only the firewall is protecting it.

Securing ElasticSearch via login/pass also allow the configuration of rights on kibana.

Certificate generation for the cluster

(!) Currently this step is not handled automatically by ansible.

This is mandatory: you must add certificate security first for internode communication.

Generate the CA for the internode communication:

/usr/share/elasticsearch/bin/elasticsearch-certutil ca

This will generate the CA at this location: /usr/share/elasticsearch/elastic-stack-ca.p12.

Then generate the certificate for the internode communication:

/usr/share/elasticsearch/bin/elasticsearch-certutil cert --ca /usr/share/elasticsearch/elastic-stack-ca.p12

The certificate is generated to: /usr/share/elasticsearch/elastic-certificates.p12

Copy the file /usr/share/elasticsearch/elastic-certificates.p12 to /etc/elasticsearch/elastic-certificates.p12 on all nodes.

enable xpack + certificate

To enable x-pack:

xpack.security.enabled: "true"
xpack.security.transport.ssl.enabled: "true"
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.client_authentication: required
xpack.security.transport.ssl.keystore.path: elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: elastic-certificates.p12

At this point, the cluster is NOT USABLE anymore. You must set up login and pass.

set-up the default accounts

Use this command to generate the default login/password for elasticsearch:

/usr/share/elasticsearch/bin/elasticsearch-setup-passwords auto

The admin user is elastic, use this user/pass for all super-admin actions.

To allow the monitoring to work, you need to use those variable:

elasticsearch_xpack_login: "elastic" # this is the default value, you can omit it
elasticsearch_xpack_password: "{{ lookup('hashi_vault', 'secret=cosium-kv/data/group_vars/name_of_group')['elastic'] }}"

Kibana

If you are using kibana to access the cluster, you need to add the following to its configuration so that it can access the cluster using login/pass:

kibana_extra_config:
  elasticsearch.username: "kibana_system"
  elasticsearch.password: "{{ lookup('hashi_vault', 'secret=cosium-kv/xxxxxxxxxxxxxxxxxxx')['kibana_system'] }}"