{"id":2456,"date":"2021-10-03T19:52:16","date_gmt":"2021-10-03T19:52:16","guid":{"rendered":"https:\/\/dft.wiki\/?p=2456"},"modified":"2026-06-02T20:59:52","modified_gmt":"2026-06-03T00:59:52","slug":"monitoring-servers-with-prometheus-and-grafana-on-ubuntu-20-04","status":"publish","type":"post","link":"https:\/\/dft.wiki\/?p=2456","title":{"rendered":"Monitoring Servers with Prometheus and Grafana on Ubuntu"},"content":{"rendered":"<p><strong>Prometheus<\/strong> is a systems and service monitoring system [<a href=\"https:\/\/github.com\/prometheus\/prometheus\">Link<\/a>].\u00a0Out of the box, it monitors the host where it is running and uses collectors such as Prometheus Node Explorer [<a href=\"https:\/\/github.com\/prometheus\/node_exporter\">Link<\/a>] to scrape metrics from other endpoints.<\/p>\n<p>On the other hand, <strong>Grafana<\/strong> is a visualizer that consumes metrics from data sources like <strong>Prometheus<\/strong>, to build graphs and dashboards [<a href=\"https:\/\/github.com\/grafana\/grafana\">Link<\/a>].<\/p>\n<hr \/>\n<p><strong>INSTALLING PROMETHEUS<\/strong><\/p>\n<p>Some will say that you should not use the distribution source because it is not frequently updated but this is the most reliable source and will upgrade\/patch with the system:<\/p>\n<pre>sudo apt-get install prometheus prometheus-node-exporter prometheus-pushgateway prometheus-alertmanager -y\r\nsudo systemctl stop prometheus<\/pre>\n<p>Enable the API (it will provide additional features):<\/p>\n<pre>sudo nano \/etc\/systemd\/system\/multi-user.target.wants\/prometheus.service<\/pre>\n<p>Edit the following line:<\/p>\n<pre>ExecStart=\/usr\/bin\/prometheus<strong> --web.enable-lifecycle<\/strong> $ARGS<\/pre>\n<p>Note: the default retention period of the collected data is 15 days. To customize this period add the argument <strong>&#8211;storage.tsdb.retention.time=30d<\/strong> with the desired period.<\/p>\n<p>Apply the change and start the service:<\/p>\n<pre>sudo systemctl daemon-reload\r\nsudo systemctl start prometheus\r\nsudo systemctl status prometheus<\/pre>\n<p>It could also be deployed as a Docker Container (just for reference):<\/p>\n<pre>sudo docker run --name prometheus -d -p 127.0.0.1:9090:9090 prom\/prometheus<\/pre>\n<p><strong>INSTALLING NODE_EXPLORER ON EACH OF THE MONITORED HOSTS<\/strong><\/p>\n<pre>sudo apt install prometheus-node-exporter -y\r\nsudo systemctl start prometheus-node-exporter\r\nsudo systemctl enable prometheus-node-exporter\r\nsudo systemctl status prometheus-node-exporter\r\nsudo ufw allow from <strong>192.168.1.162<\/strong> to any port 9100<\/pre>\n<p>Note: replace the IP <strong>192.168.1.162<\/strong> with the IP of the server where Prometheus is running.<\/p>\n<p><strong>CONFIGURING THE PROMETHEUS TO REACH THE\u00a0 MONITORED HOSTS<\/strong><\/p>\n<pre>sudo nano \/etc\/prometheus\/prometheus.yml<\/pre>\n<p>Example of configuration:<\/p>\n<pre>global:\r\n  scrape_interval: 1s\r\n  evaluation_interval: 1s\r\nalerting:\r\n  alertmanagers:\r\n  - static_configs:\r\n    - targets: ['192.168.1.162:9093']\r\nscrape_configs:\r\n  - job_name: 'prometheus'\r\n    scrape_interval: 1s\r\n    scrape_timeout: 1s\r\n    static_configs:\r\n      - targets: ['192.168.1.162:9090']\r\n  - job_name: 'nodes'\r\n    scrape_interval: 1s\r\n    scrape_timeout: 1s\r\n    static_configs:\r\n      - targets: ['192.168.1.162:9100', '<strong>192.168.1.163<\/strong>:9100', '<strong>192.168.1.164<\/strong>:9100']<\/pre>\n<p>Note: I recommend you use names instead of IPs. You can configure the translation in the file <strong>\/etc\/hosts<\/strong>.<\/p>\n<p>Reload the configuration using the API:<\/p>\n<pre>curl -X POST http:\/\/localhost:9090\/-\/reload<\/pre>\n<p>Access Prometheus WebUI with a browser <strong>http:\/\/192.168.1.162:9090\/<\/strong> and navigate to <strong>Status &gt; Targets<\/strong>.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-2457\" src=\"https:\/\/dft.wiki\/wp-content\/uploads\/sites\/15\/2021\/10\/Screenshot_2021-10-03_15-17-28.png\" alt=\"\" width=\"1495\" height=\"544\" srcset=\"https:\/\/dft.wiki\/wp-content\/uploads\/sites\/15\/2021\/10\/Screenshot_2021-10-03_15-17-28.png 1495w, https:\/\/dft.wiki\/wp-content\/uploads\/sites\/15\/2021\/10\/Screenshot_2021-10-03_15-17-28-300x109.png 300w, https:\/\/dft.wiki\/wp-content\/uploads\/sites\/15\/2021\/10\/Screenshot_2021-10-03_15-17-28-1024x373.png 1024w, https:\/\/dft.wiki\/wp-content\/uploads\/sites\/15\/2021\/10\/Screenshot_2021-10-03_15-17-28-768x279.png 768w\" sizes=\"auto, (max-width: 1495px) 100vw, 1495px\" \/><\/p>\n<hr \/>\n<p><strong>INSTALLING GRAFANA<\/strong><\/p>\n<p>Grafana is a powerful tool to visualize data over time. It is the most popular dashboard viewer for Prometheus but not limited to it [<a href=\"https:\/\/github.com\/grafana\/grafana\">Link<\/a>].<\/p>\n<pre>sudo apt-get install -y apt-transport-https\r\nsudo apt-get install -y software-properties-common wget\r\ncurl -fsSL \"https:\/\/keyserver.ubuntu.com\/pks\/lookup?op=get&amp;search=0x963FA27710458545\" | gpg --dearmor | sudo tee \/usr\/share\/keyrings\/repository-keyring.gpg &gt;\/dev\/null\r\necho \"deb [signed-by=\/usr\/share\/keyrings\/repository-keyring.gpg] https:\/\/apt.grafana.com stable main\" | sudo tee -a \/etc\/apt\/sources.list.d\/grafana.list\r\nsudo apt-get update\r\nsudo apt-get install grafana -y\r\nsudo systemctl start grafana-server\r\nsudo systemctl enable grafana-server\r\nsudo systemctl status grafana-server\r\nsudo ufw allow 3000<\/pre>\n<p>Access the Web UI\u00a0 on <strong>http:\/\/192.168.1.162:3000\/<\/strong> and change the <span style=\"color: #ff0000;\"><strong>default password (admin:admin) immediately<\/strong><\/span>.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-2460\" src=\"https:\/\/dft.wiki\/wp-content\/uploads\/sites\/15\/2021\/10\/Screenshot_2021-10-03_13-10-22.png\" alt=\"\" width=\"1344\" height=\"660\" srcset=\"https:\/\/dft.wiki\/wp-content\/uploads\/sites\/15\/2021\/10\/Screenshot_2021-10-03_13-10-22.png 1344w, https:\/\/dft.wiki\/wp-content\/uploads\/sites\/15\/2021\/10\/Screenshot_2021-10-03_13-10-22-300x147.png 300w, https:\/\/dft.wiki\/wp-content\/uploads\/sites\/15\/2021\/10\/Screenshot_2021-10-03_13-10-22-1024x503.png 1024w, https:\/\/dft.wiki\/wp-content\/uploads\/sites\/15\/2021\/10\/Screenshot_2021-10-03_13-10-22-768x377.png 768w\" sizes=\"auto, (max-width: 1344px) 100vw, 1344px\" \/><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-2461\" src=\"https:\/\/dft.wiki\/wp-content\/uploads\/sites\/15\/2021\/10\/Screenshot_2021-10-03_13-11-12.png\" alt=\"\" width=\"1340\" height=\"479\" srcset=\"https:\/\/dft.wiki\/wp-content\/uploads\/sites\/15\/2021\/10\/Screenshot_2021-10-03_13-11-12.png 1340w, https:\/\/dft.wiki\/wp-content\/uploads\/sites\/15\/2021\/10\/Screenshot_2021-10-03_13-11-12-300x107.png 300w, https:\/\/dft.wiki\/wp-content\/uploads\/sites\/15\/2021\/10\/Screenshot_2021-10-03_13-11-12-1024x366.png 1024w, https:\/\/dft.wiki\/wp-content\/uploads\/sites\/15\/2021\/10\/Screenshot_2021-10-03_13-11-12-768x275.png 768w\" sizes=\"auto, (max-width: 1340px) 100vw, 1340px\" \/><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-2462\" src=\"https:\/\/dft.wiki\/wp-content\/uploads\/sites\/15\/2021\/10\/Screenshot_2021-10-03_15-37-27.png\" alt=\"\" width=\"638\" height=\"419\" srcset=\"https:\/\/dft.wiki\/wp-content\/uploads\/sites\/15\/2021\/10\/Screenshot_2021-10-03_15-37-27.png 638w, https:\/\/dft.wiki\/wp-content\/uploads\/sites\/15\/2021\/10\/Screenshot_2021-10-03_15-37-27-300x197.png 300w\" sizes=\"auto, (max-width: 638px) 100vw, 638px\" \/><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-2463\" src=\"https:\/\/dft.wiki\/wp-content\/uploads\/sites\/15\/2021\/10\/Screenshot_2021-10-03_15-38-20.png\" alt=\"\" width=\"303\" height=\"175\" srcset=\"https:\/\/dft.wiki\/wp-content\/uploads\/sites\/15\/2021\/10\/Screenshot_2021-10-03_15-38-20.png 303w, https:\/\/dft.wiki\/wp-content\/uploads\/sites\/15\/2021\/10\/Screenshot_2021-10-03_15-38-20-300x173.png 300w\" sizes=\"auto, (max-width: 303px) 100vw, 303px\" \/><\/p>\n<p>Back to Home, import a Dashboard View from the online repository [<a href=\"https:\/\/grafana.com\/grafana\/dashboards\">Link<\/a>].<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-2464\" src=\"https:\/\/dft.wiki\/wp-content\/uploads\/sites\/15\/2021\/10\/Screenshot_2021-10-03_13_16_09.png\" alt=\"\" width=\"371\" height=\"400\" srcset=\"https:\/\/dft.wiki\/wp-content\/uploads\/sites\/15\/2021\/10\/Screenshot_2021-10-03_13_16_09.png 371w, https:\/\/dft.wiki\/wp-content\/uploads\/sites\/15\/2021\/10\/Screenshot_2021-10-03_13_16_09-278x300.png 278w\" sizes=\"auto, (max-width: 371px) 100vw, 371px\" \/><\/p>\n<p>Import the dashboard 1860 and select the data source as Prometheus.<\/p>\n<p>Repeat the import procedure with the number 405.<\/p>\n<p>Go to the dashboard, navigate and customize your imported view:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-2467\" src=\"https:\/\/dft.wiki\/wp-content\/uploads\/sites\/15\/2021\/10\/Screenshot_2021-10-03_13-20-40.png\" alt=\"\" width=\"1344\" height=\"861\" srcset=\"https:\/\/dft.wiki\/wp-content\/uploads\/sites\/15\/2021\/10\/Screenshot_2021-10-03_13-20-40.png 1344w, https:\/\/dft.wiki\/wp-content\/uploads\/sites\/15\/2021\/10\/Screenshot_2021-10-03_13-20-40-300x192.png 300w, https:\/\/dft.wiki\/wp-content\/uploads\/sites\/15\/2021\/10\/Screenshot_2021-10-03_13-20-40-1024x656.png 1024w, https:\/\/dft.wiki\/wp-content\/uploads\/sites\/15\/2021\/10\/Screenshot_2021-10-03_13-20-40-768x492.png 768w\" sizes=\"auto, (max-width: 1344px) 100vw, 1344px\" \/><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-2468\" src=\"https:\/\/dft.wiki\/wp-content\/uploads\/sites\/15\/2021\/10\/Screenshot_2021-10-03_13-29-48.png\" alt=\"\" width=\"1347\" height=\"1168\" srcset=\"https:\/\/dft.wiki\/wp-content\/uploads\/sites\/15\/2021\/10\/Screenshot_2021-10-03_13-29-48.png 1347w, https:\/\/dft.wiki\/wp-content\/uploads\/sites\/15\/2021\/10\/Screenshot_2021-10-03_13-29-48-300x260.png 300w, https:\/\/dft.wiki\/wp-content\/uploads\/sites\/15\/2021\/10\/Screenshot_2021-10-03_13-29-48-1024x888.png 1024w, https:\/\/dft.wiki\/wp-content\/uploads\/sites\/15\/2021\/10\/Screenshot_2021-10-03_13-29-48-768x666.png 768w\" sizes=\"auto, (max-width: 1347px) 100vw, 1347px\" \/><\/p>\n<hr \/>\n<p><strong>REFLECTION<\/strong><\/p>\n<p>Keep monitoring the storage usage on <strong>\/var\/lib\/prometheus\/metrics2\/<\/strong>\u00a0until you define the retention time and scraping frequency that best suits your need.<\/p>\n<p>For reference, I have set up the server where the Prometheus runs on, and two other servers. It scrapes data from 4 sources every 1 second and every 24h it accumulates about 800MB of data.<\/p>\n<p>The email alerts would look much nicer with the image renderer:<\/p>\n<pre>sudo grafana-cli plugins install grafana-image-renderer<\/pre>\n<p>The SMTP configuration can be found at:<\/p>\n<pre>sudo nano \/etc\/grafana\/grafana.ini<\/pre>\n<p>It is always recommended to restart Grafana after making changes to its configurations or installing plugins:<\/p>\n<pre>sudo systemctl restart grafana-server<\/pre>\n<p>To use <strong>AWS CloudWatch<\/strong> as a data source for Grafana follow the step below making the adjustments accordingly:<\/p>\n<pre>IAM &gt; Policies &gt; Add service: CloudWatch &gt; Allow for: ListMetrics, GetMetricData, GetMetricStatistics.\r\nIAM &gt; Roles &gt; AWS Service &gt; EC2 &gt; Select the Policy to the Role.\r\nIAM &gt; Users &gt; Add User &gt; Attach Existent Policy &gt; Select the Policy to the User &gt; Get <strong>AccessKey<\/strong> and <strong>SecretKey<\/strong>.\r\nEC2 &gt; Select the Instance &gt; Actions &gt; Instance Settings &gt; Attach\/Replace IAM Role.<\/pre>\n<p>Then use the created <strong>AccessKey<\/strong> and <strong>SecretKey\u00a0<\/strong>to add the data source using the Grafana web application.<\/p>\n<hr \/>\n<p><strong>BONUS<\/strong><\/p>\n<p>For monitoring <strong><span style=\"text-decoration: underline;\">Databases<\/span><\/strong> use <strong>Percona Monitoring and Management<\/strong> (PMM) [<a href=\"https:\/\/www.percona.com\/software\/pmm\/quickstart\">Link<\/a>].<\/p>\n<p><strong>Server<\/strong><\/p>\n<pre>curl -fsSL https:\/\/www.percona.com\/get\/pmm | \/bin\/bash<\/pre>\n<p><strong>Client<\/strong><\/p>\n<pre>wget https:\/\/repo.percona.com\/apt\/percona-release_latest.$(lsb_release -sc)_all.deb\r\nsudo dpkg -i percona-release_latest.$(lsb_release -sc)_all.deb\r\nsudo apt-get update\r\nsudo apt-get install pmm2-client<\/pre>\n<p>Connect the Client to the Server<\/p>\n<pre>sudo pmm-admin config --server-insecure-tls --server-url=https:\/\/admin:<strong>strong_password<\/strong>@<strong>pmm-server.local<\/strong><\/pre>\n<p>Create an account in the DB for acquiring metrics from the DB engine (example for MySQL 8):<\/p>\n<pre>CREATE USER 'pmm'@'localhost' IDENTIFIED BY '<strong>strong_password<\/strong>' WITH MAX_USER_CONNECTIONS 10;\r\nGRANT SELECT, PROCESS, SUPER, REPLICATION CLIENT, RELOAD, BACKUP_ADMIN ON *.* TO 'pmm'@'localhost';<\/pre>\n<hr \/>\n<p><strong>BONUS OF THE BONUS<\/strong><\/p>\n<p>For monitoring the uptime of different service types (e.g., HTTP(S), DNS, Ping, and TCP), there is an amazing tool called <strong>Uptime Kuma<\/strong> [<a href=\"https:\/\/github.com\/louislam\/uptime-kuma\">Link<\/a>].<\/p>\n<p>Docker Compose Deployment<\/p>\n<pre>apt update &amp;&amp;m apt-get upgrade -y\r\napt install docker.io docker-compose -y\r\nmkdir uptime-kuma &amp;&amp; cd uptime-kuma\r\ncurl -o compose.yaml https:\/\/raw.githubusercontent.com\/louislam\/uptime-kuma\/master\/compose.yaml\r\ndocker-compose up -d<\/pre>\n<p>Docker Deployment (the lazy way)<\/p>\n<pre>sudo docker run -d --restart=always --network=host -v uptime-kuma:\/app\/data --name uptime-kuma louislam\/uptime-kuma:base3<\/pre>\n<p>Ubuntu Installation<\/p>\n<pre>git clone https:\/\/github.com\/louislam\/uptime-kuma.git\r\ncd uptime-kuma\r\nsudo apt install npm -y\r\nsudo npm run setup\r\nsudo npm install pm2 -g &amp;&amp; sudo pm2 install pm2-logrotate\r\nsudo pm2 start server\/server.js --name uptime-kuma\r\nsudo pm2 save &amp;&amp; sudo pm2 startup<\/pre>\n<p>To run it behind a reverse proxy with TLS termination and automated Let&#8217;s Encrypt certificates.<\/p>\n<pre>sudo apt install nginx certbot python3-certbot-nginx -y\r\nsudo nano \/etc\/nginx\/sites-enabled\/default<\/pre>\n<pre>server {\r\n   listen 80 default_server;\r\n   listen [::]:80 default_server;\r\n   root \/var\/www\/html;\r\n   index index.html;\r\n   server_name _;\r\n   location \/ {\r\n      proxy_pass http:\/\/127.0.0.1:3001;\r\n      proxy_set_header Host $host;\r\n   }\r\n}<\/pre>\n<pre>sudo nginx -t &amp;&amp; sudo nginx -s reload\r\nsudo certbot --nginx --non-interactive --agree-tos --redirect --email <strong>devops@domain.com<\/strong> -d <strong>uptime.domain.com<\/strong><\/pre>\n<p>If you need to monitor an external or a public service to identify if they are experiencing an outage that affects your monitoring, check out <strong>Status Shield<\/strong> [<a href=\"https:\/\/statusfield.com\/\">Link<\/a>].<\/p>\n<hr id=\"healthchecks\" \/>\n<p><strong>MOREOVER<\/strong><\/p>\n<p>For monitoring cron jobs or background tasks that could silently fail, try <strong>healthchecks.io<\/strong> [<a href=\"https:\/\/github.com\/healthchecks\/healthchecks\">Link<\/a>].<\/p>\n<p>The following <code>docker-compose.yaml<\/code> takes care of the whole deployment and configuration.<\/p>\n<pre>version: \"3\"\r\nservices:\r\n  healthchecks:\r\n    image: healthchecks\/healthchecks:latest\r\n    container_name: healthchecks\r\n    environment:\r\n      - DB=sqlite\r\n      - DB_NAME=\/data\/hc.sqlite\r\n      - DEBUG=False\r\n      - DEFAULT_FROM_EMAIL=healthchecks@example.com\r\n      - EMAIL_HOST=smtp.example.com\r\n      - EMAIL_HOST_USER=healthchecks@example.com\r\n      - EMAIL_HOST_PASSWORD=strong_password\r\n\r\n###### For STARTTLS - Explicit TLS\r\n      - EMAIL_PORT=587\r\n      - EMAIL_USE_TLS=True\r\n\r\n###### For SMTPS - Implicit TLS (deprecated)\r\n#      - EMAIL_PORT=465\r\n#      - EMAIL_USE_TLS=False\r\n#      - EMAIL_USE_SSL=True\r\n\r\n      - SECRET_KEY=f7993db8-1004-414b-be03-ad4a5d5153a3\r\n      - ALLOWED_HOSTS=healthchecks.simnet.cloud\r\n      - SITE_ROOT=https:\/\/healthchecks.simnet.cloud\/\r\n      - REGISTRATION_OPEN=False\r\n    ports:\r\n      - 8000:8000\r\n    volumes:\r\n      - healthchecks-data:\/data\r\n    restart: unless-stopped\r\nvolumes:\r\n    healthchecks-data:<\/pre>\n<p>It is also recommended to keep behind a reverse proxy that terminates TLS.<\/p>\n<hr \/>\n<p>See also <strong>Grafana Alloy<\/strong> and <strong>Grafana Loki<\/strong> for combining logs with metrics and visualizing and drilling down with Grafana Dashboards.<\/p>\n<ul>\n<li><strong>Grafana Alloy<\/strong>\u00a0[<a href=\"https:\/\/github.com\/grafana\/alloy\">Link<\/a>] scrapes log files and sends them to Grafana Loki.<\/li>\n<li><strong>Grafana Loki<\/strong> [<a href=\"https:\/\/github.com\/grafana\/loki\">Link<\/a>] indexes only the relevant data and stores it efficiently.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Prometheus is a systems and service monitoring system [Link].\u00a0Out of the box, it monitors the [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4],"tags":[],"class_list":["post-2456","post","type-post","status-publish","format-standard","hentry","category-linux"],"_links":{"self":[{"href":"https:\/\/dft.wiki\/index.php?rest_route=\/wp\/v2\/posts\/2456","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dft.wiki\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dft.wiki\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dft.wiki\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dft.wiki\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2456"}],"version-history":[{"count":18,"href":"https:\/\/dft.wiki\/index.php?rest_route=\/wp\/v2\/posts\/2456\/revisions"}],"predecessor-version":[{"id":5582,"href":"https:\/\/dft.wiki\/index.php?rest_route=\/wp\/v2\/posts\/2456\/revisions\/5582"}],"wp:attachment":[{"href":"https:\/\/dft.wiki\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2456"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dft.wiki\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2456"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dft.wiki\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2456"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}