Platform Engineering: Building an IDP 2026
Platform Engineering: Building an IDP 2026
Software engineering teams at scale hit a wall. Infrastructure becomes a bottleneck. Every team reinvents the same deployment pipeline. Developers wait days for environments. Onboarding takes weeks because nobody documented how to get a service running. Security and compliance requirements create friction that slows everyone down.
Platform engineering emerged as the discipline that solves this problem. The Internal Developer Platform (IDP) is its primary artifact: a layer of self-service tooling, paved paths, and abstractions that lets product engineering teams move faster without reinventing infrastructure or compromising on standards.
This guide covers what platform engineering is, why it emerged, what an IDP looks like in practice, and how to build one.
TL;DR
Platform engineering is the practice of building and operating internal platforms that improve developer productivity and self-sufficiency. The Internal Developer Platform is the product of that work — self-service infrastructure, golden paths, developer portals, and the tooling that lets engineers deploy, monitor, and operate services without becoming infrastructure experts. For teams over ~50 engineers, the investment typically pays back within the first year.
Key Takeaways
- Platform engineering teams treat developers as their customers — the IDP is a product
- Golden paths are opinionated, well-maintained paths to production, not mandates
- Backstage (CNCF project, originally from Spotify) is the dominant open-source developer portal framework
- The "you build it, you run it" model requires platform engineering to be sustainable at scale
- Build vs. buy decisions should favor buying or adopting open-source for commodity capabilities
- Platform team topologies (stream-aligned vs. platform team) follow Team Topologies principles
Why Platform Engineering Emerged
Platform engineering did not emerge from a whitepaper — it emerged from pain. Specifically, the pain of organizations trying to scale DevOps without specialization.
The DevOps movement of the 2010s produced a powerful cultural shift: development teams take ownership of their services in production. "You build it, you run it." This eliminated handoff delays and created better software because the people writing the code felt the consequences of their decisions.
But "you build it, you run it" assumes that developers can run it — that they have the knowledge and tooling to deploy, operate, and monitor services effectively. At 10 engineers, this assumption holds. At 100 engineers, individual teams are spending enormous time on undifferentiated infrastructure work: Kubernetes YAML, CI/CD pipeline configuration, Terraform modules, logging configuration, secret management. At 1,000 engineers, the duplication is crushing.
Platform engineering is the response: a specialized team that treats the developer experience of infrastructure as a product problem, and builds the tools and processes that let every other team operate efficiently.
The term was formalized around 2020–2021, but the practice is older. Teams at Google, Netflix, Spotify, and Airbnb were doing platform engineering (under various names) for years before the terminology emerged.
What an Internal Developer Platform Is
An IDP is not a single product. It is a layer of tooling, documentation, and process that abstracts away infrastructure complexity for application developers.
The components of a mature IDP typically include:
Self-service application infrastructure. Developers can provision a new service, a database, a message queue, or a caching layer through a self-service interface — without filing a ticket with an infrastructure team or writing Terraform from scratch.
Standardized deployment pipelines. The platform provides opinionated, maintained CI/CD pipelines that application teams adopt. Instead of every team writing their own GitHub Actions or Jenkins pipeline, the platform team provides a template or a service that handles build, test, security scanning, and deployment.
Developer portal. A single pane of glass where developers can discover services, view documentation, find runbooks, see who owns what, and access self-service workflows. The leading open-source tool here is Backstage.
Environments on demand. Developers need consistent, reproducible environments for development, testing, and staging. The platform provides tooling to create, manage, and tear down environments without manual infrastructure work.
Observability defaults. Every service created through the platform automatically has logging, metrics, and tracing configured. Developers do not need to think about instrumentation defaults; they are built in.
Secret management. Standardized, secure patterns for managing secrets, credentials, and configuration — integrated into deployment pipelines so developers are not handling plaintext credentials.
Golden paths. Documented, maintained, recommended paths to production for common service types. Not mandates — developers can go off-path — but paths that are easy to follow, well-maintained, and eliminate most of the decision-making overhead.
Golden Paths: The Core Concept
The golden path is the most important concept in platform engineering. It was popularized by Spotify, who described it as "the supported path to production."
A golden path is:
- Opinionated. It makes choices. It selects a framework, a deployment mechanism, a logging library. This is the point — fewer decisions for application engineers.
- Well-maintained. The platform team owns the path. When the underlying components change, the golden path is updated. Application teams on the path benefit automatically.
- Easy to follow. A new engineer should be able to follow the golden path from zero to a running, deployed service in a few hours.
- Not mandatory. Teams can go off-path for legitimate reasons. The goal is making the path so good that going off it is rarely worth it.
What a Golden Path Includes
A golden path typically includes:
-
Service scaffold generator. A CLI command or UI button that generates a new service repository with all the boilerplate: Dockerfile, CI/CD pipeline, logging configuration, health check endpoint, README, ADR directory.
-
Deployment configuration. Pre-configured Kubernetes manifests, Helm charts, or Terraform modules for the service type. Developers specify what they need (2 CPUs, 4GB RAM, a PostgreSQL database); the platform handles how to provision it.
-
Observability setup. Auto-configured dashboards in Grafana, log ingestion, and distributed tracing — wired up as part of the deployment configuration.
-
Security defaults. Network policies, RBAC configuration, secrets management patterns — applied by default.
-
Runbook template. A starting point for the operational runbook that is part of the service's documentation.
Backstage: The Developer Portal
Backstage is an open-source developer portal framework created by Spotify and donated to the CNCF (Cloud Native Computing Foundation) in 2020. It is now the dominant choice for teams building developer portals.
What Backstage Provides
Software catalog. A centralized, searchable catalog of all software components (services, libraries, websites, data pipelines) with ownership, documentation, and health information. The catalog is populated via YAML manifests checked into each service's repository.
Software templates. The scaffolding engine — when a developer wants to create a new service, Backstage templates handle the code generation, repository creation, CI/CD pipeline setup, and catalog registration. This is the technical implementation of the golden path.
TechDocs. A documentation system that renders documentation from Markdown files in service repositories, making it searchable and discoverable through the Backstage UI.
Plugin ecosystem. Backstage has a rich plugin ecosystem (300+ plugins) that integrates data from CI/CD systems, monitoring tools, incident management platforms, and cloud providers into the developer portal. You can surface Datadog dashboards, PagerDuty on-call status, GitHub Actions workflows, and Kubernetes deployment status all in one place.
Backstage Trade-offs
Backstage is powerful and the community is large, but it requires significant engineering investment to run well.
Installation and configuration are non-trivial. Backstage is a TypeScript/React application that you run and operate yourself. The initial setup and integration work typically requires a dedicated platform engineer for several weeks.
Maintenance overhead. Backstage's plugin ecosystem and core framework evolve quickly. Staying current with updates requires ongoing maintenance investment.
Customization is necessary. A default Backstage installation is not useful — the value comes from integrating it with your specific tools, services, and golden paths. This integration work is significant.
Alternatives exist. Port, Cortex, and OpsLevel are managed developer portal products that offer faster time-to-value with less operational overhead. They are not free, but for teams that cannot or do not want to operate Backstage themselves, they are worth evaluating.
Platform Team Structure
A platform team is what Team Topologies calls a "platform team" — a team that provides capabilities and services consumed by stream-aligned (product) teams. It is not a gate; it is a service provider.
Team Topology
The platform team:
- Owns the IDP, golden paths, and developer portal
- Provides self-service infrastructure (developers do not need to file tickets)
- Maintains and evolves the platform based on developer feedback
- Treats developers as customers and measures developer experience
The platform team does not:
- Own individual product services
- Approve or gate deployments (beyond automated checks)
- Serve as a bottleneck that teams must go through for routine infrastructure tasks
Platform Team Composition
A mature platform team typically includes:
- Platform engineers (infrastructure background, strong in Kubernetes, Terraform, CI/CD)
- Developer experience engineers (software engineering background, focus on tooling, CLI design, Backstage development)
- Product manager (treats the IDP as a product with a roadmap and customer feedback loops)
- Technical writer (golden path documentation, portal content)
The ratio depends on organizational size. A starting platform team for a 50–100 engineer organization might be 3–4 engineers. A large-scale platform team (1,000+ engineers) might be 20–30 people.
When to Form a Platform Team
There is no universal threshold, but common triggers:
- Multiple teams are duplicating infrastructure work and the duplication is visibly slowing delivery
- Onboarding a new engineer to set up a service takes more than a day of manual steps
- Teams are blocked on infrastructure for routine tasks
- Security and compliance standards are being inconsistently applied
Many organizations start with a "DevOps" team that owns shared infrastructure and gradually transitions to a platform team with a self-service model as the organization grows.
Build vs. Buy Decisions
Platform engineering involves constant build vs. buy decisions. The general principle: buy commodity capabilities, build differentiating capabilities.
Buy (or adopt open-source):
- CI/CD platforms (GitHub Actions, GitLab CI, CircleCI, Buildkite)
- Kubernetes (managed services: EKS, GKE, AKS)
- Monitoring and observability (Datadog, Grafana Cloud, New Relic)
- Secrets management (HashiCorp Vault, AWS Secrets Manager, Doppler)
- Developer portal (Backstage, Port, Cortex)
- Incident management (covered in our incident management tools compared guide)
Build (or heavily customize):
- Service scaffolding templates specific to your stack
- Deployment abstractions that encode your organization's specific infrastructure patterns
- Internal tooling that connects your specific systems
- Golden path workflows for your service types
The risk in building too much: the platform team becomes overwhelmed maintaining internal tooling instead of improving developer experience. The risk in buying too much without customization: purchased tools that don't fit your workflow create their own friction.
Measuring Platform Engineering Success
Platform engineering success is measured by developer productivity and experience improvements, not by platform engineering team velocity.
Metrics that matter:
- Time from "I want to create a new service" to first deployment in production
- Onboarding time (days until a new engineer makes their first production deployment)
- Self-service rate (percentage of infrastructure tasks completed self-service vs. via ticket)
- Developer Net Promoter Score (does your development team actually like the platform?)
- Lead time for changes (DORA metric — do golden path teams have shorter lead times than off-path teams?)
Metrics to track carefully:
- Number of golden path adopters vs. total services
- Documentation coverage for catalog entries
- Platform reliability (what is the uptime of the CI/CD platform, the developer portal, self-service infrastructure?)
Developer satisfaction surveys are particularly important for platform teams. If developers are not using the golden paths or are complaining about platform tooling, those are signals the platform team needs to respond to as product teams respond to customer feedback.
This connects to broader thinking about engineering productivity measurement — see developer productivity metrics that matter for a framework that applies well to platform engineering teams.
Common Platform Engineering Failure Modes
The ivory tower platform. A platform team that builds what it thinks developers need without talking to developers. The result is a technically impressive platform that nobody uses because it doesn't match how teams actually work. Fix: treat developers as customers. Run user research. Measure adoption.
The bottleneck platform. A platform team that has re-created the old infrastructure team model — everything goes through them. Self-service is the goal; tickets are the failure mode.
The golden cage. Golden paths that are opinionated to the point of being unworkable for non-standard use cases. Teams that need to go off-path find no support and no escape hatch, leading to bespoke infrastructure that diverges from standards. Fix: make going off-path explicit but possible, with clear documentation about what support the platform provides for off-path configurations.
Under-investment. Platform engineering is often underfunded because its outputs (developer experience, self-service rate) are less visible than feature releases. Engineering leaders need to advocate for platform investment with business-impact framing: "every hour a product engineer spends on undifferentiated infrastructure work is an hour not spent on customer value."
Over-investment in tooling, under-investment in documentation. A beautiful internal platform that nobody knows how to use because the golden path documentation is incomplete or outdated is not a platform — it is a liability.
Platform engineering teams that want to track their roadmap and self-service request backlogs alongside engineering work benefit from PM tooling designed for engineering teams — see best PM tools for remote teams for options that fit platform team workflows.
Getting Started: A Practical Roadmap
For organizations that are starting their platform engineering journey:
Phase 1 (Months 1–3): Catalog and visibility
- Stand up Backstage or a lightweight alternative
- Catalog all existing services with owner, documentation link, and tech stack
- Identify the top 3–5 pain points from developer interviews
Phase 2 (Months 3–6): First golden path
- Build a service scaffold template for your most common service type
- Integrate with your CI/CD pipeline
- Measure adoption and iterate based on feedback
Phase 3 (Months 6–12): Self-service infrastructure
- Build self-service workflows for the most commonly requested infrastructure (databases, queues, new environments)
- Measure self-service rate and developer satisfaction
Phase 4 (Months 12+): Expand and mature
- Expand golden paths to cover more service types
- Add observability defaults, security scanning, cost management
- Establish SLOs for platform reliability
This phased approach prevents the platform team from over-building before they understand what developers actually need.
Methodology
This guide draws on the Team Topologies framework (Matthew Skelton and Manuel Pais), the CNCF Platform Engineering Working Group's published research, Gartner's analysis of platform engineering adoption (2023–2025), the Backstage project documentation and community case studies (Spotify, Expedia, American Airlines), and engineering blog posts from platform engineering teams at major technology companies. Platform maturity model references are based on the CNCF Platform Engineering Maturity Model.