Nagios Plugin

Nick Lamb -

The Clustrix Nagios Plugin is used to monitor a Clustrix Cluster with Nagios.

Usage: check_clustrix.py [options]
Options:
 -h, --help            show this help message and exit
 -H HOSTNAME, --hostname=HOSTNAME
                      any hostname for a clustrix system
 -P PORT, --port=PORT  port to connect to
 -u USER, --user=USER  database user to connect as
 -p PASSWORD, --password=PASSWORD
                      account password
 -L, --list-checks     list available checks
 -w WARN, --warn=WARN  warning threshold
 -c CRIT, --crit=CRIT  critical threshold Available checks:
check_cpu
  Check that the cluster cpu utilization is within bounds. check_executing_sessions
  Check that the number of executing statements which have been
  running for more than 5 seconds is within a reasonable bound check_long_running_queries
  Check for queries which have been executing over 4 hours. check_offline_slices
  Check for number of offline slices. Any value over 0 indicates that
  some data is unavailable in the current membership. User quries will
  get errors. check_underprotected_slices
  Check for any slices which have less than 2 replicas.

Additional statistics are managed by our new statistics aggregation and tracking mechanism, statd.py. This script runs on each node (but only one instance is active at a time), and collects statistics into tables in the clustrix_statd database (clustrix_statd.statd_metadata, clustrix_statd.statd_current, clustrix_statd.statd_history).

There is a method in the nagios plugin, fetch_stats, which retrieves the specified stats (as many as desired per invocation) from these statd tables, e.g.:

cmd$ nagios/check_clustrix.py -H alpha013 -u root fetch_stats clustrix.stats.bm_miss_rate clustrix.tps clustrix.qps
OK - clustrix.stats.bm_miss_rate=0  OK - clustrix.qps=22  OK - clustrix.tps=21  | clustrix.stats.bm_miss_rate=0  clustrix.qps=22 clustrix.tps=21

This allows us to expand the number of stats available over time, without having to revisit the nagios plugin.

Below is where to find some of the stats that were included in older versions of the plugin, in the new framework:

  • bm_miss_rate - Rate of Buffer Misses vs. Cache Hits fetch_stats clustrix.stats.bm_miss_rate
  • disk_read - Percentage of Maximum Disk Read Capability
  • disk_write - Percentage of Maximum Disk Write Capability 

You can calculate rate/delta from:

  • fetch_stats clustrix.io.disks.bytes_read 
  • fetch_stats clustrix.io.disks.bytes_written 
  • qps - Queries per Second
    • fetch_stats clustrix.qps
  • tps - Transactions per Second
    • fetch_stats clustrix.tps
  • conns -  Current Connections / Max Connections 
  • queries - Number of Currently Running Queries 
  • avg_q_age - Average Age of Running Queries 
  • max_q_age - Age of Longest Running Query these are reported by fetch_session_stats (does not use statd):

Example:

cmd$ nagios/check_clustrix.py -H alpha013 -u root fetch_session_stats
OK - CONNECTIONS=5 OK - EXECUTING_SESSIONS=2 OK - AVG_QUERY_AGE=2052.000000 OK - MAX_QUERY_AGE=4104 | CONNECTIONS=5 EXECUTING_SESSIONS=2 AVG_QUERY_AGE=2052.000000 MAX_QUERY_AGE=4104
  • quorum - Number of Nodes in Quorum / Number of Nodes in Cluster
    • fetch_stats clustrix.cluster.nodes_in_quorum 
    • fetch_stats clustrix.cluster.total_nodes
  • rebalancer_jobs_queued - Number of Rebalancer Jobs Currently Queued 
    •  fetch_stats clustrix.rebalancer.jobs_queued

http://files.clustrix.com/support/files/clustrix_nagios.tar.gz

또 다른 질문이 있으십니까? 문의 등록

0 댓글

댓글을 남기려면 로그인하세요.
Zendesk 제공