Anytime an unexpected
event happens anywhere on our network, our network
staff are notified by call pager alert.
We have designed custom
health checking service software that monitors
device availability and directs network traffic to
devices that are online. It also monitors the
response time of devices so that we can
automatically reduce traffic to devices that are
responding slowly and direct more traffic at faster
devices, all in real time.
We also maintain
detailed internal infrastructure graphs of all
internal systems including disk space, CPU, memory
utilization and disk I/O responsiveness.
We understand that a
failure doesn't necessarily mean a complete device
outage, so we monitor in real time the disk
read/write operations of all data servers on our
network. If a read/write operation takes longer than
normal, our network staff are notified and our
system will automatically suspend non-critical
services/operations to this disk, ensuring minimal
disruption to other services.