Part I · Chapter 01

How the Public Cloud Works

After this chapter you will be able to describe how a public cloud is actually built, separate any service into its data plane and control plane, name the boundaries that hold tenants apart, and apply the six-part lens used in every chapter that follows.

~3,500 words · 3 figures

You open a browser tab, sign in to a cloud console, and click a button that says Create cluster. You pick a name and a region, leave the rest on its defaults, and go make coffee. Four minutes later the page turns green. You copy one line — a web address ending in :443 — paste it into a config file, and run a command. Your application is now running on a Kubernetes cluster you did not build, did not wire up, and could not see if you tried.

It feels like magic, and that is the point — the provider worked hard to make it feel that way. But "magic" is just a word for machinery you have not looked at yet. Somewhere, a great deal of real software just ran on real computers to make that green checkmark appear. This chapter is about that machinery: what a public cloud actually is underneath the button, and why understanding its shape is the prerequisite for everything else in this book.

The problem: what "the cloud" actually is

For most engineers, "the cloud" is a verb. You deploy to the cloud; you run things in the cloud. The cloud is the place where your software lives once it leaves your laptop. That mental model is fine for shipping a product. It is useless — actively misleading — for attacking one.

Strip away the marketing and a public cloud is a single, concrete thing: a landlord of computers. A provider — AWS, Azure, Google Cloud, and the rest — owns physical datacenters full of ordinary hardware, and it rents out slices of that hardware to millions of customers at the same time. The provider's real product is not "servers." Servers are a commodity you could buy yourself. The product is isolation: the promise that your slice and a stranger's slice, running on the very same physical CPU, memory, disk, and network card, cannot see or touch each other.

◈ Concept · Multi-tenancy ▾

Multi-tenancy means many unrelated customers share the same physical and logical infrastructure simultaneously. Your virtual machine and a stranger's may be two processes on one host; your storage objects and theirs may sit in one giant distributed filesystem, told apart only by a key. The mental model to carry all book: shared by default, separated by software.

That last phrase — separated by software — is the whole problem. There is no physical wall around your account. The wall is code: routing rules, policy checks, namespace lookups, signed tokens. Code is written by people, and people make mistakes. So the interesting question for an attacker is never "can I break into the cloud?" The cloud is not a building with a door. The question is: where is the software that keeps tenants apart, and is it as solid as the provider needs it to be? To ask that question well, you have to know how the cloud is built. So let us build one.

How a public cloud is built: the managed-Kubernetes example

Abstractions are hard to attack because they are hard to see. The cure is to pick one real service and follow it all the way down to the machinery. We will use managed Kubernetes — the service behind that Create cluster button — because it is the clearest possible illustration of how a public cloud actually works. Every major provider sells one: Amazon EKS, Azure AKS, Google GKE.^[1]

◈ Concept · Kubernetes, in one breath ▾

Kubernetes is a system for running containerized applications across a fleet of machines. It has a control plane — an API server, a key-value store called etcd, a scheduler, and controllers — that decides what should run where, and a set of worker nodes that actually run the containers (grouped into pods). You tell the API server your desired state; the control plane makes reality match it.^[2]

Here is the question that reveals the architecture. When you clicked Create cluster and got a cluster four minutes later — where did the control plane go? Kubernetes needs an API server, an etcd database, a scheduler, controllers. Those are real programs that need real CPUs to run on. The provider did not rack a new physical server for you in four minutes. So what happened?

The answer is the single most clarifying fact in this chapter, and it is sometimes called the "Kubernetes-on-Kubernetes" architecture. The provider runs its own enormous Kubernetes cluster — the host cluster, or management cluster. And your cluster's control plane — your API server, your etcd, your scheduler, your controllers — runs as a handful of ordinary pods inside that host cluster. Your "cluster" is not a separate machine. It is a workload, scheduled and scaled and restarted by the provider's Kubernetes exactly like any other pod. The provider operates many thousands of these tenant control planes side by side, on shared nodes, as routine workloads.^[3]

Figure 1.1The Kubernetes-on-Kubernetes architecture. Each tenant's control plane is a set of pods inside the provider's host cluster. The tenant sees only one API endpoint; the provider operates thousands of these as ordinary workloads on shared infrastructure.

Pause and appreciate how neat this is. The provider already built, for itself, the best tool in existence for running thousands of small workloads on shared hardware with scheduling, health-checking, and self-healing — Kubernetes. So when it needs to sell you a Kubernetes cluster, it does not invent anything new. It runs your cluster's brain as a workload on its own. It is turtles, deliberately, all the way down.

What do you see of all this? Almost nothing. You see one thing: a web address — your API server endpoint, that line ending in :443 you pasted into your config. You point your tools at that endpoint and they behave exactly as if you owned a private cluster. You cannot tell that your API server is a pod. You cannot see the host cluster, the neighbouring tenants whose control-plane pods share your nodes, or the provider's automation managing all of it. The endpoint is a curtain. Your whole job, for the rest of this book, is learning what is behind curtains like that one.

And your worker nodes — the machines that run your application's containers? Those typically are yours: virtual machines in your tenant, which you can usually log in to. They are told to talk to one place and one place only: your API server endpoint. They never address the host cluster directly. The split between "the brain, run by the provider as pods" and "the worker nodes, which are yours" is not an implementation detail. It is the most important structural line in the cloud, and it has a name.

Data plane and control plane

Every managed cloud service — not just Kubernetes — splits into two layers. Learning to make this split instantly, for any service you are handed, is the single most important skill in this chapter. Each layer is a different kind of target with a different blast radius when it breaks.

◈ Concept · Data plane ▾

The data plane is the part of a service that carries tenant traffic and tenant data: your packets, your database queries, your storage reads and writes, the containers running your application. It is high-volume and per-request — every interaction with your actual data goes through it. If the data plane were a city, it would be the roads: traffic flows along them.

◈ Concept · Control plane ▾

The control plane is the API and orchestration layer that provisions and mutates infrastructure: CreateCluster, PutBucketPolicy, CreateRole, every console click and CLI command and Terraform apply. It does not carry your data; it carries the decisions about your data. If the data plane is the roads, the control plane is the city planning office — it decides which roads exist, where they go, and who may drive on them.

Figure 1.2The two planes. A bug in the data plane usually leaks one tenant's data; a bug in the control plane can rewrite policy and reach the provider's own infrastructure.

Managed Kubernetes makes the split tangible, because it draws the line for you in coloured pixels. Look back at Figure 1.1. The blue boxes — the API server, etcd, the scheduler, the controllers — are the control plane, run by the provider as pods. The green boxes — your worker nodes and the application pods on them — are the data plane, where your code and your traffic actually live. When you run kubectl apply, you are calling the control plane: you are stating a desired state. When your application serves a customer request, that is the data plane: traffic on the road.

⚠ Pitfall · "Planes are crisp partitions" ▾

A plane is a lens, not a wall. Many real operations touch both: aws s3 cp writes an object (data plane) but the request is authenticated and authorised by IAM (control plane) before a byte lands. Do not force every call into one box. The useful question is never "which plane is this call?" — it is "which plane is the bug in?", because that tells you the blast radius.

One more thing the diagram shows, and it is the reason this split is load-bearing for the whole book. Notice the amber box inside the provider's host cluster: the provider's own automation — the controllers that upgrade clusters, monitor their health, back up etcd, manage node pools. That automation runs with high privilege and acts across every tenant's control plane at once. It is part of the control plane in the broadest sense — the provider's orchestration layer — and it is the deepest target in this book. A bug in your data plane leaks your data. A bug in your control plane can rewrite your policy. A bug in the provider's cross-tenant automation can be game over for the whole service. Providers know this; serious cloud architecture deliberately keeps the data plane able to run even when the control plane is down, precisely because the control plane is the more dangerous, more concentrated thing.^[4]

Multi-tenancy and the isolation boundaries

If the provider's product is isolation, we need a precise inventory of how isolation is built — because every cross-tenant attack later in this book is one of these mechanisms failing. This course recognises six isolation boundaries. Learn them as a set; you will be sorting attacks into these six buckets for the rest of the book.

◈ Concept · Isolation boundary ▾

An isolation boundary is any mechanism that keeps one tenant from reaching another's data or compute. For every boundary, ask one diagnostic question: does the tenant get to talk to, or modify, the component that enforces it? If yes, the boundary is advisory, not real.

The six, kept brief — each gets a full chapter later:

Hypervisor / VM isolation. A hypervisor is the virtualization layer that lets one physical host run many virtual machines, each convinced it owns the hardware. This is the strongest common boundary; crossing it needs a rare hypervisor or firmware escape.
Kernel / namespace isolation. Containers on one host all share a single Linux kernel; what separates them is kernel namespaces and cgroups. There is no hypervisor between containers and the kernel, which makes container isolation distinctly weaker than VM isolation — a container escape is a kernel-surface attack (Chapter 6).
Network isolation. Virtual networks, routing, and firewalls — and, importantly, the question of which internal addresses a tenant can reach (Chapter 5).
IAM / identity isolation. Policy: who is allowed to call what, enforced by cryptographically signed tokens (Chapter 3).
Account / subscription / project boundary. The top-level tenant container — AWS calls it an account, Azure a subscription, GCP a project. A frequent failure: a shared back-end service that sits behind all of those containers and quietly trusts them equally.
Naming. The surprising one. Many resources live in a global namespace — every storage bucket name on Earth must be unique. A global namespace is itself a boundary, and it fails when names are predictable or squattable (Chapter 7).

Figure 1.3The six isolation primitives — a quick-reference card. For each, ask whether the tenant can talk to the component that enforces it.

Managed Kubernetes uses several of these at once, which is why it is such a good teaching example. Your control-plane pods are kept apart from a neighbour's by kernel/namespace isolation — or, on more careful designs, by running each tenant's control plane in its own VM, adding hypervisor isolation. The API server enforces IAM/identity: it only answers calls bearing a token it trusts. Network rules keep one tenant's worker nodes from reaching another's endpoint. And the whole arrangement sits under the account/subscription/project boundary. Six boundaries, one service — and an attacker's job is to find the one that is thinner than it looks.

Attacking the provider is not pentesting a tenant

Most security training that mentions "the cloud" is really training you to attack a customer. You learn to find an exposed storage bucket, an over-permissioned role, a forgotten public snapshot. That is a useful trade, and it is not this course. A tenant pentest stops at the edge of one account: a successful finding compromises that tenant, and every component the provider owns worked as designed.

This book studies the other side. We attack the provider itself — the control plane, the orchestration layer, the multi-tenant isolation, the machinery the provider sells as one trustworthy product. The difference is not stylistic; it is a difference in blast radius. When you attack a tenant, a win is one company's data. When you attack the provider, a win can compromise thousands of tenants at once, because the thing you broke was the thing keeping them apart.

The Kubernetes-on-Kubernetes picture makes this vivid. If you compromise one application pod on your own worker node, you have a foothold in your own data plane — bad for you, irrelevant to anyone else. If you escape that pod to the node, then reach your own cluster's API server, you have your own control plane — still just your tenant. But if you could reach the host cluster, where every tenant's control-plane pods live side by side, or subvert the provider's cross-tenant upgrade automation — that is a different category of event entirely. Same provider, same service, three radically different blast radii. Knowing which one you are looking at is the difference between a bug report and an industry-defining disclosure. You will see exactly this escalation, in detail, in Chapter 6 (Azurescape, a cross-account container takeover) and Chapter 8 (ChaosDB, a managed-database control-plane breach).

⚠ Pitfall · "It runs in the cloud, so the cloud is to blame" ▾

Most famous "cloud breaches" are actually tenant breaches — a customer's misconfigured role or exposed bucket, where every provider-owned component worked correctly. Provider-side research is the rarer, harder thing: a flaw in the provider's own code, below the line the customer can see. Throughout this book, keep asking: whose code failed — the tenant's, or the provider's?

The six-part lens

Now we assemble the chapter's centrepiece — the analytical tool you will apply in every remaining chapter. When you are handed an unfamiliar managed service and asked "where would you attack it?", you do not start guessing. You run it through six questions. We call this the six-part lens.

Plane. Is the surface in the data plane or the control plane — and what does sitting there grant an attacker? (In managed Kubernetes: an application pod is data plane; the API server is control plane.)
Isolation boundary. Which of the six boundaries separates tenants here — and the diagnostic question: does the tenant get to talk to the component that enforces it?
Identity propagation. Whose credentials does each request run as? Where is an identity attached to a workload, and where is it trusted without being re-checked? (A worker node carries a token; the API server trusts it. Where else does a token travel?)
Shared component. What does this service share across all tenants — a host cluster, a front-end, a namespace, a back-end store, a service principal? Every shared thing is a blast-radius multiplier.
Provider automation. What does the provider do for the tenant — the "magic" — and with what privilege does that automation run? (The upgrade and health controllers that act across every tenant's control plane.)
Detection surface. What does the provider's logging record about an action — and, more importantly, what is invisible? An action with no audit trail is an action a defender cannot see.

ℹ Note · Two planes, not three ▾

Pillar 1 asks one question: data plane or control plane? There are only those two. The provider's own orchestration — the cross-tenant automation in Figure 1.1 — is not a separate plane; it is the deepest, most privileged end of the control plane.

The lens is service-agnostic by design. A NoSQL database, a container runtime, an AI training platform — nothing in common — and the same six questions produce a clean, comparable diagnosis for each. That is why it is the spine of the book. Every later chapter turns one or two of the six pillars up to full volume: Chapter 3 lives on identity propagation, Chapter 5 on the network boundary, Chapter 12 on the detection surface. But no chapter abandons the lens. Get comfortable running all six in your head now; it will become automatic.

How to read this book

You arrived already fluent in web security — SSRF, XXE, deserialization, OAuth, request smuggling. This book does not re-teach those. It teaches the cloud-specific machinery they get aimed at, and it teaches you to aim them at the provider rather than a single customer.

The structure is deliberate. Part I (this chapter and Chapter 2) gives you the foundations: how the cloud is built, and how to enumerate one. Part II (Chapters 3–6) covers the core attack primitives — IAM and privilege escalation, instance metadata and SSRF, network isolation, container and Kubernetes escape — the moves you will reuse everywhere. Part III (Chapters 7–10) walks specific service families — storage, databases, serverless and CI/CD, AI/ML — applying the primitives to each. Part IV (Chapters 11–13) is synthesis: cross-tenant and provider-side vulnerabilities head-on, detection evasion, and full attack-chain methodology.

Every chapter follows the same shape: a plain-language scenario, then the problem stated clearly, then a breakdown of each technique with a concrete real-world illustration, then key takeaways and references. And every chapter applies the six-part lens explicitly. Hold on to one image as you go: the Kubernetes-on-Kubernetes diagram. Thousands of tenants' control planes, running as pods, on shared infrastructure, separated only by software — that picture, in one form or another, is what a public cloud is. The rest of this book is the study of where that software gives way.

◆ Key takeaways

A public cloud is a landlord of computers: it rents slices of shared hardware to millions of tenants at once. Its real product is isolation, and isolation is enforced entirely by software.
Managed Kubernetes is built Kubernetes-on-Kubernetes: the provider runs a large host cluster, and each tenant's control plane — API server, etcd, scheduler, controllers — runs as ordinary pods inside it. The tenant sees only one API endpoint.
Every service splits into two planes: the data plane (your traffic and data) and the control plane (the API and orchestration that provisions and mutates infrastructure). The provider's own cross-tenant orchestration is the deepest, most privileged end of the control plane.
Tenants are held apart by six isolation boundaries: hypervisor, kernel/namespace, network, IAM/identity, account/subscription/project, and naming. Every cross-tenant attack is one of them failing.
Attacking the provider is not pentesting a tenant. A tenant bug leaks one customer; a provider bug can compromise thousands, because the thing broken is the thing isolating them.
The six-part lens — plane · isolation boundary · identity propagation · shared component · provider automation · detection surface — is the analytical tool used in every chapter. Run it on every service.

References

Amazon Web Services, "Amazon EKS — Managed Kubernetes Service." Original: aws.amazon.com/eks.
The Kubernetes Authors, "Kubernetes Components" (control plane, API server, etcd, nodes, pods). Original: kubernetes.io.
Google Cloud, "GKE cluster architecture" (provider-managed control plane running tenant clusters on shared infrastructure). Original: cloud.google.com.
AWS Well-Architected Framework, "Use static stability" (control plane vs data plane independence). Original: docs.aws.amazon.com.