DocsAutoDBAMonitoring and Alerting

Monitoring and Alerting

Monitoring and alerting can be one of the most burdensome parts of a DBA’s job. Not only are many metrics monitoring candidates, but determining the right threshold for alerting can also be a vexing problem. Set too conservatively and the application will experience avoidable service interruptions; set too aggressively and DBAs will be plagued by excessive and too often unactionable notifications. Alerts are rarely stable for long—they typically require adjustment whenever the application scales and as the workload evolves.

Candidate metrics are numerous. Resource utilization metrics include processor usage, memory usage, disk space, disk I/O operations, and disk latency. Database activity metrics include the number of connections, transaction rates, slow queries, rollbacks, index utilization, table bloat, and vacuum activity. When high availability is a requirement, metrics also include replica lag, availability, and resources. Deviations in many of these metrics could signal a problem, so traditionally, DBAs are likely to set up monitoring and alerting on as many as possible.

AutoDBA is responsible for keeping metrics within healthy ranges, eliminating many monitoring and alerting concerns. That said, CrystalDB has integrated monitoring, and we provide detailed metrics in the Control Center, ensuring that you have the information you need to understand how your applications interact with the database.

Under certain situations, AutoDBA will escalate actionable alert notifications to your operators. One situation is when the system is running close to its configured limits. If the application is scaling as expected, you will want to increase your limits; otherwise, you will want to investigate. You will also receive alerts when AutoDBA identifies a contention bottleneck or other threat to application performance.