Control-Plane Network Isolation Failures
After this chapter you will be able to locate the network boundaries a cloud provider relies on to keep tenants apart — host↔guest channels, the peering and routing fabric, and “trusted” IP ranges — and explain, with real cases, how each one leaks.
Imagine an apartment building. Every flat has its own lock, and the tenants trust those locks completely. But the building also has things the tenants share without thinking about it — the lift, the post room, the intercom, the corridor. None of those were ever meant to be a security boundary. They are conveniences. They exist so the building can be run.
A cloud data centre is the same. Two customers' servers sit a few metres apart on the same hardware, joined by the same wires, fronted by the same load balancers, watched over by the same management software. The provider promises that your traffic and your neighbour's traffic never mix. That promise rests almost entirely on shared plumbing — and shared plumbing was built for convenience, not for keeping determined strangers apart. This chapter walks the corridors of that building and shows where the doors do not quite close.
The problem
A public cloud is a multi-tenant network. Thousands of unrelated customers run workloads on the same physical fabric, and the provider's job is to make each of them feel alone. It does this with a stack of separators: virtual networks that keep one customer's packets from reaching another's; firewall rules that say which traffic is allowed; and internal control channels — invisible to customers — that the platform uses to provision, configure, and repair their machines.
Those separators leak in recurring ways, and they leak for the same underlying reason every time. The component that enforces a boundary is usually a component the tenant can also talk to. A host-agent endpoint is reachable from inside your guest operating system and trusted by the platform that manages it. A peering attachment joins two networks and relies on an acceptance check that may not be enforced. A firewall rule that trusts a source IP is the boundary and trusts an address an attacker can borrow. Wherever one component does both jobs, every parsing quirk, every missing ownership check, every “internal only” assumption becomes an isolation bug.
The data plane is where your workloads run and your traffic flows — VMs, pods, load-balanced requests. The control plane is the provider machinery that creates, configures, and manages those workloads. This chapter is about the seams where data-plane traffic can reach into the control plane, or where one tenant's data-plane traffic reaches another's.
Why it matters — and how it differs from a traditional pentest
In a traditional pentest you attack one customer's network. The boundaries you probe — their firewalls, their segmentation, their VPN — belong to the target, and breaking one moves you sideways inside their estate. The blast radius stops at the edge of that customer.
Attacking the provider's network isolation is different in two ways. First, the boundaries you are testing are operated by the provider and shared across every tenant, so a single failure can move you from your tenant into someone else's — or into the provider's own fabric. Second, many of these “boundaries” were never designed to be security controls at all. They are routing conveniences that defenders, and sometimes the provider's own documentation, quietly promoted to the rank of firewall. A pentester who only knows customer-side network testing will look straight past them, because in a single-tenant world they genuinely were harmless.
For each technique below, ask the Chapter 1 questions: which plane is abused, which isolation boundary failed, what identity the crossing lets you assume, what shared component is involved, what provider convenience became the surface, and what a defender would have to watch to notice. In this chapter the “shared component” answer is almost always the same as “what enforced the boundary.”
The methods at a glance
Three techniques, ordered by increasing subtlety.
| Technique | The shared component | What crossing it gives you | Featured case |
|---|---|---|---|
| A · Host↔guest channel | The provider's host-agent endpoint, reachable from inside the guest | Bootstrap secrets and provisioning data — a far stronger identity than a scoped token | WireServing (#255) |
| B · Routing & peering fabric | The provider's BGP backbone and peering attachments | Traffic interception, or an unconsented network path into a victim's network | Direct Connect (#235), Transit Gateway (#234) |
| C · Soft boundaries | Service tags and IP allowlists — shared address ranges | A “trusted” firewall rule satisfied by a borrowed source IP | Azure service tags (#259) |
Technique A · The host↔guest channel
What it is. Every cloud VM you have ever launched is a managed machine, and management means a wire. The provider needs to push configuration, install agents, rotate certificates, and run the occasional repair script. That requires a channel from the provider's control plane straight into your guest operating system — and any channel reachable from inside the guest is reachable by whatever code is running there, including an attacker's.
Every cloud VM runs a provider-supplied guest agent — the Azure VM Agent (waagent), AWS's Nitro/SSM components, the GCE guest agent — that talks back to a provider-side controller. In Azure that controller is WireServer, reached at the address 168.63.129.16, which serves the VM's “goal state”: desired configuration, extension packages, and certificates. It is a control-plane endpoint sitting one curl away from untrusted guest code, and that dual nature is the whole problem.
Chapter 4 taught you 169.254.169.254, the link-local IMDS endpoint that vends scoped workload credentials. The host-agent channel is different: WireServer lives at 168.63.129.16 and a sibling component, the HostGAPlugin, listens on 168.63.129.16:32526. IMDS hands you a role token; the host-agent channel hands you provisioning data — including bootstrap secrets — so its blast radius is considerably larger.
How it works. What flows over that channel is mostly VM extensions — and extensions are where the convenience turns dangerous.
protectedSettings ▾A VM extension is a package the platform downloads to your VM and runs as SYSTEM or root — CustomScript, RunCommand, VMAccess, the AKS node-bootstrap extension. Each extension's configuration carries a protectedSettings blob described as “encrypted at rest.” The catch: for the agent to use the settings, the decryption certificate must also sit on that VM — so “encrypted” really means “encrypted to a key any on-box attacker can also reach.”
NetSPI documented the WireServer protocol precisely.[2]#184 WireServer is plain HTTP on 168.63.129.16 with no authentication — it identifies the caller purely by source IP, so any code that can route a packet to that address from the VM is, as far as WireServer is concerned, the VM's own agent. The only "credential" is the x-ms-version header naming a protocol version the endpoint accepts. That is the entire access-control story, and it is why an unprivileged process on the box can speak the protocol at all.
To see why retrieving the secrets takes several steps, start from the obstacle. The protectedSettings blob is encrypted to a per-VM key called the TenantEncryptionCert. Its private key does not sit in a file you can simply read — WireServer holds it and will only hand it back encrypted. The protocol's intended design is: the guest agent generates its own throwaway "transport" key pair, gives WireServer the public half, and WireServer returns the certificate bundle encrypted (as a PKCS#007 / CMS envelope) to that public key, so only the agent's transport private key can open it. The flaw is that WireServer never checks who supplied that transport public key — so an attacker supplies their own, and WireServer obligingly encrypts the tenant's secrets to the attacker's key. The exchange then unrolls as four requests:
- Fetch the goal state.
GET /machine/?comp=goalstatereturns an XML document — the VM's desired configuration — containing URLs for everything else: the certificate bundle, the extensions config, and per-extension settings. Nothing here is secret yet; the goal state is just the map. - Generate a transport certificate. The attacker creates a fresh self-signed RSA key pair locally (
openssl req -x509 -nodes ...). This is their key — its only job is to be the key WireServer encrypts the bundle to. - Request the certificate bundle.
GETtheCertificatesURL from the goal state, sending the transport public certificate in a request header (x-ms-guest-agent-public-x509-cert). WireServer encrypts the bundle — which contains the TenantEncryptionCert private key — to that public key and returns it as a CMS envelope. - Decrypt and unwrap.
openssl cms -decryptwith the attacker's transport private key opens the envelope, yielding the TenantEncryptionCert private key. That key then decrypts every extension'sprotectedSettingsblob fetched from the extensions-config URL — turning "encrypted at rest" into plaintext admin passwords, bootstrap tokens, and scripts.
The first request looks like this — note that the only thing resembling a credential is the version header:
GET /machine/?comp=goalstate HTTP/1.1
Host: 168.63.129.16
x-ms-version: 2015-04-05
An earlier, file-based form of the same primitive lives on disk.[3]#186 On Windows, extension settings sit in RuntimeSettings\<#>.settings as JSON containing a certificate thumbprint and a base64 protectedSettings blob; NetSPI's tooling reads the file, finds the matching certificate in the local store, and decrypts it.
The VMAccess extension resets the local administrator password, so its protectedSettings literally contains a fresh admin credential. As a mitigation, VMAccess began truncating its .settings file after use — but the un-redacted JSON is also copied into the guest-log collection path, kept current with the live encryption certificate, where it still decrypts to a plaintext username and password.[3]#186 A point-patch that redacts one file leaves the reachable on-box channel intact.
Real-world illustration · WireServing (#255)
WireServing is the cleanest demonstration that a host-agent channel reachable from an unprivileged tenant workload hands out keys to the control plane. Mandiant found that an attacker with ordinary code execution in a pod on an AKS cluster — no root, no host networking, no privileged container — could escalate to the equivalent of cluster administrator purely by talking to 168.63.129.16.[6]#255 The crucial precondition: the cluster used Azure CNI with the “Azure” network policy, the very component meant to enforce which destinations a pod may reach — and it did not block the host-agent IP.
- Foothold. Ordinary code execution in a pod on an AKS cluster using Azure CNI + the “Azure” network policy. No root, no host networking, no privileged container.
- Reach the host agent. The pod issues
GET http://168.63.129.16/machine/?comp=goalstate(WireServer) andGET http://168.63.129.16:32526/vmSettings(HostGAPlugin). The network policy never blocked the host-agent IP. - The certificate dance. The attacker generates a throwaway certificate (
openssl req -x509 -nodes -subj "/CN=LinuxTransport" -newkey rsa:2048 ...), pulls theCertificatesURL from the goal state, downloads the bundle, and decrypts it withopenssl cms -decryptto recover the TenantEncryptionCert private key. - Decrypt the bootstrap script. Pull
protectedSettingsfrom/vmSettings, base64-decode, decrypt with the recovered key — out comes the node bootstrap script,cse_cmd.sh. - Harvest node secrets.
cse_cmd.shcontainsTLS_BOOTSTRAP_TOKEN,KUBELET_CLIENT_CERT_CONTENT, andKUBELET_CA_CRT. The bootstrap token can create and readCertificateSigningRequests. - Forge node identity. The attacker submits a CSR with
CN=system:node:<node>,O=system:nodes, andsignerName: kubernetes.io/kube-apiserver-client-kubelet. AKS auto-signs CSRs presented with a bootstrap token. - Impact. The signed certificate authenticates as
system:node:<node>; Kubernetes' Node Authorizer then grants read access to every Secret mounted on that node. Chained across nodes, that is effective cluster administrator.
Frame the win as network (the pod reached the host agent) and credential (it harvested node bootstrap secrets, a far more powerful identity than a scoped workload token). The Kubernetes mechanics — kubelet, the Node Authorizer, CSR auto-signing — are dissected in Chapter 6. And the same WireServer primitive, run against a Microsoft-hosted service VM rather than a customer cluster, is what Wiz weaponised in ChaosDB to pull secrets off Microsoft's own infrastructure; that chain is Chapter 8's to tell in full.
Technique B · The routing & peering fabric
What it is. A cloud provider does not run one network — it runs a planet-scale backbone, and it lets customers attach to it: physically, with dedicated fibre, and logically, with peering between virtual networks. Each attachment is a join between two parties, and each join has a moment where consent and ownership are supposed to be checked. Those moments are the attack surface.
You know BGP as a routing protocol. What may be new is that providers run a BGP-speaking backbone customers can peer into — AWS Direct Connect, Azure ExpressRoute, GCP Cloud Interconnect. On AWS a public virtual interface (VIF) lets a customer advertise IP prefixes into AWS's own network. If the provider does not validate that the advertiser owns the prefix, that advertisement is a route-injection primitive against the backbone.
A Transit Gateway peering, or a VPC peering connection, is a two-party network join: account A requests it, account B must explicitly accept it. That accept step is the entire consent boundary — B agreeing “yes, I want a wire to A.” If accept can be bypassed or self-served by the requester, network reachability into B's environment exists without B's knowledge.
How it works. The routing case is about ownership — does the backbone check that you may announce the prefix you announce? The peering case is about consent — is the accept step enforced where it counts? Two real cases, one of each.
Real-world illustration · Direct Connect route injection (#235)
On 16 May 2024 a route advertised over an AWS Direct Connect public VIF was accepted by AWS without prefix-ownership validation. The observed incident was benign in intent — a customer typo in a public-VIF prefix advertisement — but its effect was real: AWS-originated traffic destined for an external IP was hijacked and blackholed, breaking Let's Encrypt certificate renewals for affected hosts.[4]#235
The teaching point is the generalisation. If AWS will accept a BGP prefix on a public VIF without checking that the advertiser owns it, the same mechanism used deliberately is a route-injection primitive into AWS's network — interception or denial of AWS-originated traffic toward any prefix you announce. The isolation boundary that failed was the routing boundary; the provider convenience was “you can peer into our backbone”; the detection surface is path anomaly, a traceroute or mtr showing AWS-sourced traffic taking a route it should not. AWS's remediation was process, not patch — improved prefix-ownership validation for public-VIF advertisements.
Real-world illustration · Transit Gateway peering acceptance bypass (#234)
The cleanest “consent boundary bypassed” case in the corpus. Cross-region Transit Gateway peering requires the destination account to call ec2:AcceptTransitGatewayPeeringAttachment. DoiT found that the AWS Console correctly refused to let an account accept a peering it had requested itself — but the API and CLI did not enforce the same rule.[5]#234
- Request. From the attacker's own account, request a peering attachment to a victim's Transit Gateway (cross-region).
- Self-accept via the API. Skip the Console and call the API directly from the same attacker account:
aws ec2 accept-transit-gateway-peering-attachment \ --transit-gateway-attachment-id tgw-attach-0abc123def456 - Path established. For cross-region attachments the call succeeded; the attachment transitioned to
Available. - Impact. The attacker now had a network path into the victim's TGW-attached VPCs — without the victim ever accepting anything.
Disclosed by James Sheard of DoiT on 25 July 2024; AWS patched on 7 August 2024. This is a precise callback to Chapter 2's lesson about alternate endpoints: the Console and the API are two implementations of the same operation, and they enforced different rules. Whenever a UI shows a check, ask whether that check lives in the UI or in the service — a control that lives only in the front-end is a suggestion, not a boundary.
Technique C · Soft boundaries — service tags & IP allowlists
What it is. The last technique is not a bug. It is a category error — a thing defenders treat as a security control that the provider only ever built as a routing convenience. This is the chapter's unifying lesson.
A service tag is a named, provider-maintained group of IP ranges — AzureCloud, Storage, ApiManagement, and dozens more — that you reference in a firewall rule instead of hard-coding CIDR blocks. The fine print: those ranges are shared, multi-tenant infrastructure, so “allow from service tag X” means “allow from any tenant's traffic that egresses via service X,” not “allow from my instance of X.”
IP allowlists, service tags, “internal-only” address ranges, X-Forwarded-For trust — these are routing conveniences, and defenders persistently mistake them for authentication. A source IP tells you where a packet entered the network; it tells you nothing about who sent it. The fix is never “tighten the IP list” — it is “require a real identity on the resource.”
How it works. The attack is a network-layer confused deputy: drive a trusted service to make a request on your behalf, and the firewall trusts the service rather than you.
Real-world illustration · Abusing service tags (#259)
Liv Matan of Tenable demonstrated, definitively, that service tags are not a security boundary.[1]#259 More than ten Azure services let a user control the destination of an outbound request the service makes on their behalf — Application Insights availability tests, Azure DevOps, Azure Machine Learning, Logic Apps, Container Registry, Load Testing, API Management, Data Factory, Action Group, AI Video Indexer, and Chaos Studio among them.
- Find a controllable service. Pick one of the ten-plus Azure services that lets a user set the destination of an outbound request it makes on their behalf.
- Aim it at the victim. Configure that service to send a request at the victim's internal or private resource.
- Borrow the trusted IP. The request egresses from the service's IP range — which is the service tag's range.
- Match the rule. The victim's firewall rule “allows” that service tag, so the request is admitted and reaches the resource the rule was meant to protect.
Notably, Microsoft did not patch. Their position: service tags were never intended as a security boundary, and MSRC published updated guidance saying so explicitly, stressing that authentication is still required on the resource.[7] “The provider fixed the docs, not the product” is itself a teaching point — sometimes the provider's honest answer is “this was never a boundary, and treating it as one was your bug.” A service-tag abuse is the packet-layer twin of the IAM confused deputy from Chapter 3: authorize the principal, not the conduit.
Attacker's checklist
- Host-agent reachability. From any foothold, probe the host-agent channel —
168.63.129.16and:32526on Azure — separately from IMDS. If a workload or pod can reach it, you may be one cert-decrypt away from bootstrap secrets. - Extension secrets. On a VM foothold, hunt
protectedSettingson disk and over WireServer; the decryption certificate is on the box, and secondary copies (guest-log paths) survive redaction point-patches. - Peering consent. For any two-party network join, test whether the accept step is enforced server-side or only in the Console — try the API/CLI path directly, and test cross-region variants.
- Routing trust. If you control a peering or VIF advertisement, ask whether the provider validates prefix ownership; watch
traceroute/mtrfor paths that confirm injection. - Soft boundaries. Inventory every firewall rule that trusts a service tag, an IP allowlist, or an “internal” range — each is a candidate confused-deputy or borrowed-IP path.
Defender's mirror
- Block the host-agent IP from workloads. Write network policies that explicitly deny pods and untrusted workloads access to
168.63.129.16and the:32526HostGAPlugin port. The “Azure” network policy did not do this by default — verify, do not assume. - Server-side peering controls. Apply an SCP
Denyonec2:AcceptTransitGatewayPeeringAttachmentfor principals outside your org-ID. A control that lives only in the Console is not a control. - Prefix-ownership validation. For Direct Connect public VIFs, confirm the provider validates prefix ownership; on your side, monitor advertised and received prefixes for anomalies.
- Stop treating soft boundaries as authentication. Service tags and IP allowlists are routing conveniences. Require a real identity on the resource — managed identity, mTLS, signed tokens — not network position alone.
- Detection beats. Alert on anomalous access to
168.63.129.16from workloads; CloudTrailAcceptTransitGatewayPeeringAttachmentfrom external accounts; andtraceroute/mtrpath anomalies for routing attacks.
Notice how often the provider's response in this chapter was not a patch — MSRC closed the VMAccess redaction issue as “requires elevated privileges” and declared service tags out of scope as a boundary. Do not outsource your threat model to a vendor's severity rubric: “requires a foothold” is no reassurance when your entire model of multi-tenancy assumes a foothold in one tenant stays in that tenant.
- The network half of multi-tenant isolation fails at three kinds of seam: host↔guest channels, the peering/routing fabric, and soft boundaries.
- Across all three, the enforcement plane and the attack surface are usually the same component — the thing meant to stop you is the thing you talk to.
- The host-agent channel (
168.63.129.16WireServer /:32526HostGAPlugin) is distinct from IMDS and far higher-stakes: it serves bootstrap secrets, not scoped tokens. - “Encrypted”
protectedSettingsis decryptable on the box, because the key has to be there. Encryption without a key boundary is obfuscation. - A peering accept step is a consent boundary; if the Console enforces it and the API does not, the boundary is gone. Enforce server-side.
- A network position is not an identity. Service tags, IP allowlists, and “internal” address ranges are routing conveniences, never authentication.
References
- Liv Matan, Tenable, “These Services Shall Not Pass: Abusing Service Tags to Bypass Azure Firewall Rules.” Archived: local copy · Original: tenable.com. Corpus #259.
- NetSPI, “Decrypting VM Extension Settings with Azure WireServer.” Archived: local copy · Original: netspi.com. Corpus #184.
- Jake Karnes, NetSPI, “Decrypting Azure VM Extension Settings with Get-AzureVMExtensionSettings.” Archived: local copy · Original: netspi.com. Corpus #186.
- “AWS Direct Connect route injection issue.” Archived: local copy · Original: cloudvulndb.org (primary writeup: chair6.net). Corpus #235.
- James Sheard, DoiT, “AWS Transit Gateway peering exploit.” Archived: local copy · Original: doit.com. Corpus #234.
- Mandiant / Google Cloud, “Escalating Privileges in Azure Kubernetes Services.” Archived: local copy · Original: cloud.google.com. Corpus #255.
- Microsoft Security Response Center, “Improved guidance for Azure network service tags.” Original: msrc.microsoft.com. Public research (no local archive).