Prevent Azure VM Outages with Load Balancer Health Probes

X Facebook LinkedIn

You deployed three VMs behind a single public IP and called it “high availability.” Then one VM’s disk filled up, your app started throwing 502 errors, and you spent the next four hours manually removing the unhealthy instance from DNS while your boss watched the uptime dashboard like it owed him money. Sound familiar?

Azure Load Balancer exists so you never have that night again. It sits in front of your virtual machines, distributes incoming traffic across healthy backend instances, and automatically stops sending requests to anything that fails a health check. It operates at Layer 4 of the OSI model (the transport layer), which means it routes based on TCP and UDP headers rather than inspecting application payloads. Fast, low-latency, and completely unaware of what your app is actually doing—exactly the way a network load balancer should work.

In this walkthrough, you’ll create a public Standard Load Balancer using the Azure CLI, configure a backend pool with multiple VMs, set up health probes, and create load balancing rules. By the end, you’ll have traffic distribution that actually works without you babysitting DNS records at midnight.

Prerequisites

Before you start, make sure you have the following:

An active Azure subscription. If you don’t have one, create a free account.
Azure CLI version 2.49.0 or later installed on your local machine.
Permissions to create resources in your subscription (Contributor role or equivalent).

Verify your CLI version and that you’re logged in:

az version --query '"azure-cli"' -o tsv
az account show --query name -o tsv

If either command fails, run az login and try again.

Create the Resource Group and Virtual Network

Every Azure deployment starts with a resource group and a network. You’ll create both, plus a subnet specifically for your backend VMs.

az group create \
  --name lb-demo-rg \
  --location eastus

Now create the virtual network and subnet:

az network vnet create \
  --resource-group lb-demo-rg \
  --name lb-vnet \
  --address-prefix 10.0.0.0/16 \
  --subnet-name backend-subnet \
  --subnet-prefix 10.0.1.0/24

Confirm the network exists:

az network vnet show \
  --resource-group lb-demo-rg \
  --name lb-vnet \
  --query '{name:name, addressSpace:addressSpace.addressPrefixes[0]}' \
  -o table

You should see your 10.0.0.0/16 address space. If you don’t, something went sideways with your subscription permissions.

Create the Load Balancer

Here’s where it gets interesting. You’re creating a Standard SKU load balancer—not Basic. The Basic SKU has been retired and no longer accepts new deployments. Standard is the only SKU you should be using, and it gives you availability zone support, a 99.99% SLA, and a security model that’s closed by default.

Create the load balancer with a public frontend IP:

az network public-ip create \
  --resource-group lb-demo-rg \
  --name lb-public-ip \
  --sku Standard \
  --allocation-method Static \
  --zone 1 2 3

az network lb create \
  --resource-group lb-demo-rg \
  --name my-load-balancer \
  --sku Standard \
  --public-ip-address lb-public-ip \
  --frontend-ip-name lb-frontend \
  --backend-pool-name lb-backend-pool

The --zone 1 2 3 flag on the public IP makes it zone-redundant, meaning your frontend IP survives an entire availability zone going down. One less thing to worry about at 3 AM.

Verify the load balancer was created:

az network lb show \
  --resource-group lb-demo-rg \
  --name my-load-balancer \
  --query '{name:name, sku:sku.name, frontendIP:frontendIPConfigurations[0].name}' \
  -o table

Configure Health Probes

Health probes are how the load balancer decides which backend instances deserve traffic. Without probes, the load balancer blindly sends requests to every VM in the pool—including the one that crashed ten minutes ago.

You have three probe types to choose from:

TCP: Completes a three-way handshake. If the connection succeeds, the instance is healthy. Simple, but it only tells you the network stack is running.
HTTP: Sends a GET request to a path you specify (like /health). The instance is healthy only if it returns HTTP 200. This actually verifies your application is responding.
HTTPS: Same as HTTP but over TLS. Use this when your health endpoint requires encryption.

For most web workloads, HTTP probes are the right choice. Create one:

az network lb probe create \
  --resource-group lb-demo-rg \
  --lb-name my-load-balancer \
  --name http-health-probe \
  --protocol Http \
  --port 80 \
  --path "/health" \
  --interval 5 \
  --probe-threshold 2

The --interval 5 means the probe fires every five seconds. The --probe-threshold 2 means two consecutive failures mark an instance as unhealthy. That’s a ten-second detection window—fast enough for most production scenarios.

Warning: Your Network Security Groups (NSGs) and any local firewalls on backend VMs must allow inbound traffic from 168.63.129.16. This is the Azure platform IP that originates all health probes. Block it and every single backend instance gets marked unhealthy. You’ll have a load balancer with zero healthy targets, which is a fancy way of saying “total outage.”

Create Load Balancing Rules

A load balancing rule ties your frontend IP, backend pool, and health probe together. It tells the load balancer: “When traffic arrives on this port, distribute it to healthy instances on that port.”

az network lb rule create \
  --resource-group lb-demo-rg \
  --lb-name my-load-balancer \
  --name http-lb-rule \
  --protocol Tcp \
  --frontend-port 80 \
  --backend-port 80 \
  --frontend-ip-name lb-frontend \
  --backend-pool-name lb-backend-pool \
  --probe-name http-health-probe \
  --idle-timeout 15 \
  --enable-tcp-reset true

The --enable-tcp-reset true flag sends a TCP RST packet when a connection hits the idle timeout. This keeps your clients from hanging on dead connections—they’ll get a clean reset and can reconnect immediately.

By default, Azure Load Balancer uses a five-tuple hash to distribute traffic: source IP, source port, destination IP, destination port, and protocol. Every packet in the same TCP session lands on the same backend VM. But if a client opens a new connection (different source port), it might hit a different VM.

If your application needs sticky sessions—like a shopping cart that stores session state locally—you can change the distribution mode to session persistence:

az network lb rule update \
  --resource-group lb-demo-rg \
  --lb-name my-load-balancer \
  --name http-lb-rule \
  --load-distribution SourceIP

That switches to a two-tuple hash (source IP + destination IP), so all connections from the same client IP always hit the same backend. Fair warning: this reduces your distribution efficiency. If most of your traffic comes from a few large NAT gateways, those backend VMs are going to get hammered while others sit idle.

Deploy Backend Virtual Machines

Now you need actual VMs to receive traffic. You’ll create a Network Security Group first to allow HTTP traffic, then spin up two VMs.

Create the NSG:

az network nsg create \
  --resource-group lb-demo-rg \
  --name lb-nsg

az network nsg rule create \
  --resource-group lb-demo-rg \
  --nsg-name lb-nsg \
  --name allow-http \
  --protocol Tcp \
  --priority 100 \
  --destination-port-range 80 \
  --access Allow \
  --direction Inbound

az network nsg rule create \
  --resource-group lb-demo-rg \
  --nsg-name lb-nsg \
  --name allow-health-probe \
  --protocol Tcp \
  --priority 110 \
  --source-address-prefix 168.63.129.16 \
  --destination-port-range 80 \
  --access Allow \
  --direction Inbound

Notice the second rule explicitly allows the health probe IP. Skip this and you’re back to the “total outage” scenario mentioned earlier.

Create two VMs and attach them to the backend pool:

for i in 1 2; do
  az network nic create \
    --resource-group lb-demo-rg \
    --name vm${i}-nic \
    --vnet-name lb-vnet \
    --subnet backend-subnet \
    --network-security-group lb-nsg \
    --lb-name my-load-balancer \
    --lb-address-pools lb-backend-pool

  az vm create \
    --resource-group lb-demo-rg \
    --name backend-vm${i} \
    --nics vm${i}-nic \
    --image Ubuntu2204 \
    --admin-username azureuser \
    --generate-ssh-keys \
    --zone ${i} \
    --size Standard_B1s \
    --custom-data cloud-init.txt \
    --no-wait
done

Each VM lands in a different availability zone (--zone ${i}), so a single zone failure won’t take out your entire backend. The --no-wait flag lets both VMs deploy in parallel instead of making you stare at a progress bar twice.

Pro Tip: The --custom-data cloud-init.txt reference assumes you have a cloud-init file that installs and starts a web server. For a quick test, create a file called cloud-init.txt with an nginx installation: #cloud-config\npackage_upgrade: true\npackages:\n - nginx

Verify both VMs are in the backend pool:

az network lb address-pool show \
  --resource-group lb-demo-rg \
  --lb-name my-load-balancer \
  --name lb-backend-pool \
  --query 'backendIPConfigurations[].id' \
  -o tsv

You should see two NIC references. If you see zero, check that the NICs were created with the --lb-address-pools flag.

Test Your Load Balancer

Grab your load balancer’s public IP:

az network public-ip show \
  --resource-group lb-demo-rg \
  --name lb-public-ip \
  --query ipAddress \
  -o tsv

Hit it with curl a few times:

LB_IP=$(az network public-ip show \
  --resource-group lb-demo-rg \
  --name lb-public-ip \
  --query ipAddress -o tsv)

for i in $(seq 1 10); do
  curl -s http://$LB_IP | grep -o '<title>.*</title>'
done

You should see responses coming from both backend VMs. If every response comes from the same VM, remember that five-tuple hash—you’re probably using the same source port each time. Try from different terminals or machines.

Clean Up Resources

When you’re done testing, tear everything down so you’re not paying for idle VMs:

az group delete \
  --name lb-demo-rg \
  --yes \
  --no-wait

Quick Win: The --no-wait flag on the delete command returns immediately and lets the cleanup happen in the background. Your terminal is free, and Azure handles the rest. Just don’t forget to verify the resource group is actually gone later—az group show --name lb-demo-rg should return a “not found” error.

What You Built

You now have a working Azure Load Balancer deployment that handles traffic distribution without manual intervention. The health probes automatically remove unhealthy VMs from rotation. The zone-redundant frontend IP survives availability zone failures. The five-tuple hash distributes connections across your backend pool. And the NSG rules ensure health probes can actually reach your VMs—which is the part most people forget on their first deployment.

The next time a backend VM fills its disk or crashes, the load balancer quietly stops sending it traffic while you sleep through the night. That’s the whole point.

Hate ads? Want to support the writer? Get many of our tutorials packaged as an ATA Guidebook.

Explore ATA Guidebooks