Introduction:
Azure Event Hubs is a fully managed, big data messaging service which can receive and process millions of events per second with low latency.
It can be easily integrated with data and analytics services inside and outside Azure.
Key Service Characteristics:
- Scalable up to terabytes of data and millions of events per second
- Reliable with zero data loss because it is designed to be agnostic for failures.
- Supports multiple protocols and SDKs.
What is Azure Event Hub used for?
Azure Event Hubs is used in business orchestrations for event storage and handling, to get timely insights on the application, and it can easily integrate with data and analytics services to create big data pipeline.
Why use Azure Event Hubs?
Azure Event Hubs allows to build a data pipeline and it can process data from parallel sources and connect them to different infrastructures and services.
Scenarios for Azure Event Hubs:
- Application Logging.
- Archiving data.
- User/Device telemetry processing.
- Live dashboarding.
- Internet of Things (IoT).
Event Hubs events processing architecture:
Azure Event Hubs Key Concepts:
Event Hubs Namespace:
Event Hubs represent unique stream of data whereas Event Hubs namespace is a collection of event hubs. It provides a dedicated scoping container that has multiple shared properties like shared throughputs, shared cost etc. and it is accessible via FQDN.
Throughput Units:
Throughput unit is simply a performance unit of event hub. It defines the number of events that ingress and egress through Event hubs. .
Event Producers/Publishers:
An entity that sends data to an event hub is an event producer. Event Producers send events via AMQP or HTTPS or Apache Kafka protocol. Events can be published individually or batched.
Partitions:
Azure Event Hub splits up the streaming data into partitions. A partition is an ordered sequence of events. As newer events arrive, they are added to the end of the sequence. Below points are the few important characteristics of partitions.
- Load Balanced Distribution:The events will be load balanced across the partitions, so there is no guarantee that the partitions will be utilized equally
- Each Partition is ordered:All events in a partition will be ordered from the oldest to the newest just like a queue but the order is not maintained across partitions
- Partition Key:Partition Key is used for mapping the incoming event into a specific partition to achieve data organization. The partition key is a sender-supplied value passed onto an event hub.
Message Retention:
Message retention specifies how long the Event Hubs service keeps the published events for processing. Changing the retention period will applies to all messages including messages that are already in the event hub.
SAS tokens:
SAS token provides delegate access to Event Hubs resource based on authorization rules. It can be configured either on namespace or event hub level. The rights provided by the authorization rules can be a combination of:
- Send-Gives the right to send messages to the entity.
- Listen-Gives the right to listen or receive to the entity.
- Manage-Gives the right to manage the topology of the namespace including creation and deletion of entities. It includes both Send and Listen rights.
Consumer Groups:
A consumer group is a unique view on event hub data. The data in the event hub can be accessed only via consumer groups, partitions cannot be accessed directly. A default consumer group is created right at the stage of event hub creation.
Event Consumers:
An entity that read the data from an event hub is an event consumer. Event hub consumers are connected through AMQP channels, this makes data availability easier for clients. This is significant for scalability and avoids unnecessary load on the application. Having as many consumers as partition allows an exceptionally good scaling but it is still possible to have more consumers than partitions.
Offset:
An offset is the position of an event within a partition. Each event includes an offset within a partition will be unique. This helps the consumers to know where it is currently processing the data.
Checkpointing:
Checkpointing is a process of saving the offset on client side. This checkpointing mechanism allows to have more reliability and scalability because the partition have millions of events, if a consumer dies there is no need to read all those events again.
Azure Event Hubs Additional Features:
Event Capture:
Event Capture enables you to automatically capture Event Hubs streaming data and store it in an Azure Blob storage account or Azure Date Lake storage account in case if you cannot process your events within the retention period or you might want to have long term retention of your events. Event Hub Capture reduces the complexity of loading the data and allows to focus on data processing. You can find this feature under ‘Features’ Category of ‘Event Hubs Instances’ page in Azure Portal..
Auto-Inflate:
Auto-Inflate enables you to scale-up your Throughput Units automatically to meet your usage needs. You can find this feature under ‘Settings’ Category of ‘Event Hubs Namespace’ page in Azure portal.
Geo-Disaster recovery:
Geo-Disaster recovery allows you to fail over your primary namespace to a secondary namespace in the event of a datacenter or region experiencing downtime. It helps you to have highly redundant and universally available applications. You can find this feature under ‘Settings’ Category of ‘Event Hubs Namespace’ page in Azure portal.