0:00
Hello, and welcome to Component Monitoring.
On this lesson, we will cover the following topics, System Metrics,
Component Status, Log Events JMX Monitoring,
Apache Zookeeper, Apache Cassandra and OpenLDAP.
Apigee Edge is a combination of Java,
C++, and PHP applications.
As with any computer application,
basic system checks should be put in place to
ensure and monitor the health of the operating system,
application processes, the network and hardware.
Items frequently monitored include,
CPU utilization, for example,
user, system, IO wait,
and total CPU used by this system,
Free and Used memory,
Disk Space and IO apps usage,
Load averages, and Network Statistics.
Thresholds for each of these will vary by component,
as well as by your operational procedures.
It is useful to understand the common patterns for each component.
For instance, Cassandra behaves like a regular application most of the time,
but during compaction, disk space usage will
greatly increase along with the amount of heap memory.
This kind of pattern for other applications might be seen as abnormal,
but for Cassandra, this is normal.
Ideal maximum values for thresholds will depend on allocated hardware,
network traffic, TPS, and other factors.
System metrics must be collected for every host and relevant process.
It's expected that your provision to infrastructure will provide this capability.
You might find it a valuable exercise to view metrics under expected,
and extreme load conditions,
to understand the overall operational patterns
to use as input when deciding your thresholds.
Validating CPU and memory consumption,
as well as the state of the different processes at the operating system level,
gives good information about the health of the machine,
but it is not sufficient to understand the health of the platform.
Edge provides component specific APIs for monitoring.
We have seen component status in previous modules.
Edge components such as Edge management server,
Edge router, et cetera,
expose a management API on a specific port.
All components support a core set of calls, such as v1-servers-self,
statistics about the component,
v1-servers-self-up, returns true if the component is up, v1-servers-self-uuid.
For example, invoking the v1-servers-self-up on
a message processor forces the component to
execute the API and provide a suitable response.
This indication represents a functional ping on the component.
An API response is considered successful,
as long as the call results in an HTTP 200 response code.
Component status usage was previously covered as part of the platform operation module.
A full set of calls for each component can be found at docs.apigee.com. Log events.
All components log to disk.
Log files reside in opt-apigee-var-log-component name.
When logging at the default logging level,
system logs should be quiet with only exceptions and errors locked.
As required, the system will report on health check activity.
For example, router is designed to initiate message processor health checks,
if the message processor's detected down or slow.
These events can be observed on the router log file as mark down and mark up events.
If the message processor is marked as down or unreachable,
the router will take the message processor out of
rotation until the health check starts to pass again.
Using log events for monitoring is not recommended,
we encourage you to leverage more proactive approaches such as component status.
Apigee Edge allows you to collect components run-time metrics via JMX.
Edge exposes an Mbean, called platform.
On this JMX interface,
metrics related to inbound and outbound traffic,
thread pools, memory, and others are exposed.
Out of the box authentication is not enabled on Edge JMX Mbean,
authentication can be enabled if required.
Many open source components such as Cassandra,
Cupid, and Zookeeper also provide JMX interfaces.
If required, you can leverage those as well.
Detailed documentation regarding Edge JMX interfaces is found on docs.apigee.com.
Similar to other components,
there are multiple ways to check Zookeeper component health.
Zookeeper process state can be validated by looking at
the Linux process state by executing apigee Zookeeper status from the command line.
Zookeeper uses two ports to handle application calls and data replication.
As part of the health check validation testing,
connectivity to 2181 and 3888 maybe relevant.
Zookeeper offers a series of four letter commands.
You can use commands such as ruok,
to execute a functional ping on a specific Zookeeper node.
As previously discussed, Zookeeper uses
a leader election mechanism to handle data consistency on the cluster.
You can use the four letter command stat to display nodes statistics,
which include the operating mode for the node.
For the purpose of zookeeper monitoring,
consider using the ruok to validate Zookeeper node status.
Other methods described here could be leveraged for troubleshooting,
or to gather additional information when required.
Cassandra provides a command line utility called nodetool.
Nodetool allows you to interrogate the cluster or individual components.
Nodetool ring allows you to describe the Cassandra ring.
This command displays the status of each node, up or down.
Applications to Cassandra communicate using the thrift protocol.
Nodetool allows you to check the status of the thrift protocol.
The combination of Nodetool ring and status
thrift can be leveraged as part of monitoring,
to check the Cassandra nodes availability.
OpenLDAP provides a command line utility called ldapsearch.
As we described for other components,
ldapsearch can be leveraged to perform functional pings on LDAP.
This concludes Component Monitoring.
For more information, you can visit docs.apigee.com.
To get involved with the community,
please go to community.apigee.com. Thank you.