Observability on Event Hubs

4 min readNov 6, 2020

Overview

Observability is one of the important parts of the system and when it comes to observability in Event Hubs, we have to monitor different kinds of logs to understand what’s going on. In short summary of this post, we used Azure Monitor for basic infrastructure level logs, we used Time Series Insights for validating contents of Event Hubs, and Application Map for latency.

Architecture

In this architecture, we have Azure Functions in between Event Hubs to filter out data and push to another Event Hubs.

As data flow from first Hub to another, content changes but there is not way to validate this just by looking at Event Hubs.

We want to validate contents of data when we are developing

We used Time Series Insights (TSI) to confirm the datapoint at each stage of Event Hubs. Looking at the graph, we could spot the spike. We can do the similar validation by checking each log using a standard debugger but as data increases, this gets more and more complicated and time consuming. TSI was used as a debugging tool on top of Event Hubs in our scenario to have quick feedback while we were developing.

We want to observe throughput of each component.

We used Application Map from Application Insight to see latency and volume of each component. In addition to latency and volume, we can dig deeper into each latency and error logs associated. We use data on latency to make a plan to improve on it. (I am using the jpeg from the official documents but idea still persists)

These are steps we took to improve our system from a latency point of view.

Step1: Spot where it takes longer

Step2: Check log on points with longer latency by clicking on components

Step3: Fix those bugs and check latency again.

We want to observe if system can take the load or not.

How much load Event Hubs can take can easily be looked up here, https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-scalability#throughput-units For one throughput unit, “Ingress: Up to 1 MB per second or 1000 events per second (whichever comes first).” and “Egress: Up to 2 MB per second or 4096 events per second.”

Although we have those numbers, we wanted to confirm if our expected load can be handled or not. We came up with two scenarios load cannot be handled and both of cases can be monitored by Azure Monitor.

Azure Functions cannot handle the load

This can be observed by plotting a graph for ingress and egress. If egress is less at the instance, we can narrow down that there could be an issue at output level. First graph illustrates that egress is substantially lower than ingress. Second graph illustrates that egress and ingress are equal concluding that Azure Functions can handle current load

2. Event Hubs cannot handle the load

This can be observed by plotting a graph for throttle count. If this number is consistently greater than zero, there could be an issue on load at Event Hubs layer.

Reference

Application Map in Azure Application Insights - Azure Monitor

Application Map helps you spot performance bottlenecks or failure hotspots across all components of your distributed…

docs.microsoft.com

Overview: What is Azure Time Series Insights Gen2? - Azure Time Series Insights Gen2

Azure Time Series Insights Gen2 is an open and scalable end-to-end IoT analytics service featuring best-in-class user…

docs.microsoft.com

Consideration: Scaling Out Azure Functions With Event Hubs Effectively

Expected reader and outcome from this article

medium.com

Azure Monitor overview - Azure Monitor

Azure Monitor maximizes the availability and performance of your applications and services by delivering a…

docs.microsoft.com

Scalability - Azure Event Hubs - Azure Event Hubs

There are two factors which influence scaling with Event Hubs. Throughput units Partitions The throughput capacity of…

docs.microsoft.com

Observability on Event Hubs

Overview

Architecture

We want to validate contents of data when we are developing

We want to observe throughput of each component.

We want to observe if system can take the load or not.

Reference

Application Map in Azure Application Insights - Azure Monitor

Application Map helps you spot performance bottlenecks or failure hotspots across all components of your distributed…

Overview: What is Azure Time Series Insights Gen2? - Azure Time Series Insights Gen2

Azure Time Series Insights Gen2 is an open and scalable end-to-end IoT analytics service featuring best-in-class user…

Consideration: Scaling Out Azure Functions With Event Hubs Effectively

Expected reader and outcome from this article

Azure Monitor overview - Azure Monitor

Azure Monitor maximizes the availability and performance of your applications and services by delivering a…

Scalability - Azure Event Hubs - Azure Event Hubs

There are two factors which influence scaling with Event Hubs. Throughput units Partitions The throughput capacity of…

Written by Akira Kakkar

No responses yet