AOSP BUILD TIMES ARE KILLING US AND EVERYONE ACTS LIKE BAZEL MIGRATION IS THE ONLY

AOSP Build Times Are Killing Us and Everyone Acts Like Bazel Migration Is the Only Answer

In the high-stakes world of Android Open Source Project (AOSP) development, we face a relentless enemy: time. Specifically, the agonizingly slow build times that grind developer productivity to a halt. The narrative currently dominating engineering circles suggests that the only viable path forward is a complete migration to Bazel. We are told this is the silver bullet, the modern solution to legacy Make-based build systems. However, as we navigate the treacherous waters of migrating tens of thousands of build rules, overhauling CI/CD pipelines, and retraining seasoned developers, we must ask a critical question: Is this the only way?

The reality is far more nuanced. While Bazel offers undeniable benefits in reproducibility and scalability, the cost of migration is staggering. We are looking at potentially years of diverted engineering resources, delayed product roadmaps, and a steep learning curve that alienates experienced contributors. The promise of faster builds is seductive, but the path to achieving it via Bazel is paved with complexity. We need to explore whether there are alternatives—optimizations to the current system, hybrid approaches, or incremental modernizations—that can deliver performance gains without the massive overhead of a full system overhaul.

The AOSP Build Bottleneck: A Deep Dive into the Pain

We cannot solve a problem we do not fully understand. The pain of AOSP build times is not merely an inconvenience; it is a systemic bottleneck that affects every aspect of the development lifecycle.

The Scale of AOSP Complexity

AOSP is not just a large codebase; it is a massive ecosystem of interdependent modules, drivers, frameworks, and applications. Building the full stack involves millions of lines of code, hundreds of thousands of build targets, and complex dependency graphs. When we talk about a “clean build,” we are often discussing a process that touches every single one of these components. Historically, Make-based build systems (specifically the legacy Make and Soong build systems used in AOSP) have struggled to effectively cache and parallelize these tasks at a granular level. This results in redundant work, where even a minor change in a core library can trigger a cascade of rebuilds across the entire tree.

The Impact on Developer Velocity

Developer velocity is the heartbeat of any engineering organization. When a clean build takes 3 hours or more, we are effectively removing a significant portion of the developer’s day from productive work. The context-switching penalty is immense. A developer makes a change, kicks off a build, waits, and inevitably loses focus. By the time the build finishes, the mental model of the code has faded. Incremental builds offer some relief, but they are notoriously unreliable in large AOSP trees. We often see “partial rebuilds” that still take minutes—sometimes tens of minutes—because the dependency tracking is either too coarse or the build system is forced to re-evaluate vast swathes of the graph.

The Illusion of “Just Switch to Bazel”

The industry buzz around Bazel is loud. Google open-sourced it, and it powers their monolithic repositories. It promises hermetic builds, massive parallelization, and remote caching. The proposal to migrate AOSP to Bazel feels like a natural evolution. However, we must distinguish between the tool’s capabilities and the practical reality of migration. Moving a project of AOSP’s magnitude is not a weekend refactor. It involves rewriting build logic in Starlark, redefining dependency rules, and restructuring the repository layout. For many teams, this migration becomes a “big bang” event that stalls feature development for months or years. The question we must address is whether the marginal gains in build speed justify this massive investment.

Deconstructing the Bazel Migration Proposition

Before we commit to the Babel, we need a clear-eyed assessment of what a migration entails and whether it truly solves the root causes of our slow builds.

The Theoretical Benefits of Bazel

We acknowledge that Bazel excels in specific areas. Its hermetic nature ensures that builds are reproducible, eliminating the classic “it works on my machine” problem. Its distributed execution model allows for massive parallelization, which can significantly cut down build times when paired with a robust remote cache. For organizations with hundreds of engineers working on a single monorepo, Bazel is often a necessity. However, AOSP has already made strides in this direction with the introduction of Soong (the Go-based replacement for Make) and Blueprints. While not identical to Bazel, Soong provides a structured build definition language that attempts to modernize the AOSP build process.

The Hidden Costs of Migration

The prompt mentions a timeline that slips from three months to six, and potentially longer. This is a classic symptom of build system migration. The costs extend beyond time:

Codebase Restructuring: Bazel enforces a strict package structure. AOSP’s current layout, evolved over a decade, may not fit neatly into Bazel’s constraints without significant refactoring.
Toolchain Adaptation: The AOSP toolchain (Clang, GCC, Java, Kotlin, etc.) is highly specialized. Integrating these toolchains into Bazel rules requires deep expertise.
The Learning Curve: Developers comfortable with mm or m commands must learn Bazel’s query and build syntax. This friction slows down the entire team during the transition.
CI/CD Retooling: Our continuous integration systems rely on specific build artifacts and caching mechanisms. Ripping this out and plugging in Bazel’s remote execution and caching backends is a massive infrastructure project.

Is Bazel a Panacea or a Distraction?

We must consider if we are solving the right problem. If our build times are slow due to excessive I/O, poor incremental dependency tracking, or bloated build scripts, Bazel helps—but it is not the only solution. Sometimes, the bottleneck is not the build system logic but the physical constraints of the build machines (disk I/O, CPU cores, memory). Throwing a migration at hardware limitations is a misapplication of resources. We need to determine if we are chasing the “shiny new tool” rather than optimizing what we have.

Alternatives to a Full Bazel Migration

The binary choice between “suffer with Make” and “migrate to Bazel” is a false dichotomy. There are several intermediate strategies we can employ to accelerate AOSP builds without disrupting the entire engineering organization.

Optimizing the Existing AOSP Build System (Soong/Make)

Before we abandon the current system, we should exhaust every optimization avenue.

Refining Ninja Files: The underlying build generator for AOSP is Ninja, which is incredibly fast at executing tasks. The slowness often stems from the generation of these Ninja files. We can optimize our Android.bp (Blueprints) and Android.mk files to reduce conditional logic and complex variable expansions that slow down the parse phase.
Parallelism Tuning: We can aggressively tune the -j (jobs) flags and link optimization parameters. While make -j is standard, we might be under-utilizing our build servers. Analyzing the CPU utilization during the link phase can reveal opportunities for better resource allocation.
Modularization: AOSP allows for modular builds. By strictly defining module boundaries and reducing circular dependencies, we can enable the build system to skip unnecessary targets more effectively. We can utilize soong_namespace to isolate parts of the tree, preventing the build graph from exploding unnecessarily.

Leveraging Distributed Build Systems (e.g., distcc/ccache)

We do not need to rewrite our build rules to achieve massive parallelization. We can decouple the compilation from the local machine.

Distcc: The distributed C/C++ compiler (distcc) can distribute compilation tasks across a network of machines. By setting up a cluster of build servers, we can compile hundreds of source files simultaneously. This requires no changes to the AOSP build logic—only configuration changes to the toolchain wrapper.
Ccache: Caching compiled object files is a low-hanging fruit. While AOSP has some built-in caching mechanisms, a robust ccache setup can significantly reduce recompilation times for incremental builds. By storing object files based on the hash of the source code and compiler flags, we can avoid recompiling unchanged code.

Remote Caching and Execution (Without Bazel)

Bazel is famous for remote caching, but we can implement similar strategies for the existing AOSP build system. We can set up a remote cache server (like sccache or a custom solution) that stores build artifacts. When a developer initiates a build, the system checks the remote cache before compiling locally. If a matching artifact exists, it is downloaded instantly. This effectively turns a 3-hour clean build into a download of pre-compiled binaries, provided the cache is warm. This approach delivers some of Bazel’s best features without the rewrite.

Incremental Build Reliability

The biggest frustration with the current AOSP build is often that incremental builds are not truly incremental. This is usually due to timestamp issues or over-broad dependency definitions.

Touching Source Files: We can implement scripts that intelligently “touch” source files only when necessary to trigger minimal rebuilds.
Dependency Pruning: We can use tools like m droid or custom analysis scripts to identify and remove unnecessary dependencies from Android.mk and Android.bp files. Removing a single unnecessary import in a header file can save thousands of recompilations downstream.

Strategic Implementation: A Phased Approach

If we must move toward Bazel, or if we choose to optimize the current system, we need a strategy that minimizes disruption. A “big bang” migration is rarely successful for projects the size of AOSP.

The Polyglot Build Strategy

Rather than migrating the entire tree at once, we can adopt a polyglot build strategy. We can introduce Bazel for new components or isolated libraries within the AOSP tree. This allows teams to adopt Bazel organically for their specific modules. We can use build rules that allow Bazel targets to depend on Make targets and vice versa. This “bridge” approach allows us to test Bazel’s benefits on a smaller scale, gather metrics, and build expertise without halting the main development line.

The “Lunch” Configuration Overhaul

We can significantly improve build times by optimizing the lunch configurations (the target product and variant). Often, developers build more than they need (e.g., building all apps when they only need the framework). We can create stripped-down lunch targets that exclude non-essential modules (like samples, tests, or optional apps) for daily development. This reduces the build graph size and I/O overhead immediately.

Toolchain Upgrades

Sometimes, the bottleneck is the compiler itself. Ensuring we are using the latest version of Clang and linking with lld (the LLVM linker) rather than gold or bfd can shave off significant time in the linking phase. Furthermore, enabling Link Time Optimization (LTO) where appropriate can improve runtime performance, but we must be careful not to slow down the build itself. We should profile the toolchain to find the sweet spot between optimization flags and compilation speed.

The Human Factor: Retraining and Culture

We cannot ignore the impact of build system changes on our engineering teams. A build system is a tool, and tools are only as good as the people wielding them.

Training for the New Paradigm

If we choose to migrate to Bazel, we must invest heavily in training. This goes beyond documentation. We need workshops, sandbox environments, and internal champions who can help debug Bazel build issues. The mental model of Bazel (declarative, strict dependencies) is different from the procedural nature of Make. We must prepare the team for this shift.

Maintaining Morale During Slow Builds

While we work on a long-term solution, we must address the immediate morale issue. Slow builds are demoralizing. We can provide immediate relief by investing in faster hardware (NVMe SSDs, high-core count servers) and better CI infrastructure. Sometimes, a hardware upgrade is cheaper and faster than a software migration. We should also encourage a culture where developers run lightweight build targets (like m framework) rather than full builds for daily work.

Documentation and Standardization

Regardless of the path we take, standardization is key. If we optimize the existing Makefiles, we must document best practices to prevent performance regressions. If we migrate to Bazel, we need clear style guides for BUILD files. A fragmented build system (where every team writes rules differently) is a recipe for disaster. We must enforce strict code review guidelines for build files, just as we do for source code.

Measuring Success: Metrics That Matter

To validate any strategy we choose, we must define clear metrics. We cannot rely on “feels faster.”

Build Duration Metrics

Clean Build Time: The time taken to build a full AOSP image from scratch. This is our baseline.
Incremental Build Time (P50, P90, P95): We need to measure the build time for a small change (e.g., changing one line in a library). We should look at the median (P50) and the worst-case (P95) to identify outliers.
Parse Time: The time spent generating the build graph before any compilation starts. If this is high, no amount of compilation parallelization will help.

Developer Productivity Metrics

Time to Test: How long does it take from writing code to running tests on a device/emulator?
Build Success Rate: Are builds failing more often due to system complexity?
Local vs. CI Build Ratio: Are developers relying too much on CI because local builds are too slow? We want developers to verify changes locally quickly.

Resource Utilization

CPU/Disk I/O Saturation: During a build, are we fully utilizing our hardware? If CPU is low but I/O is maxed out, we need faster storage.
Cache Hit Rates: If we implement remote caching, what percentage of builds are fully or partially cached? A high cache hit rate (e.g., >80%) indicates a successful acceleration strategy.

Conclusion: Navigating the Path Forward

The statement “AOSP build times are killing us” resonates deeply with our experience. The pressure to migrate to Bazel is immense, driven by industry trends and the allure of Google-scale engineering practices. However, we must resist the urge to view Bazel as a panacea. It is a powerful tool, but the cost of migration for a mature AOSP codebase is non-trivial, risky, and time-consuming.

We have a responsibility to evaluate the alternatives. Optimizing the existing Soong/Make build system, leveraging distributed compilation tools like distcc and ccache, and improving remote caching can yield substantial improvements without a year-long migration project. We should consider a hybrid approach, introducing Bazel for specific modules while maintaining the stability of the existing system for the core.

Ultimately, the goal is to increase developer velocity, not just to adopt new technology. By focusing on metrics, investing in hardware, and optimizing our current tooling, we can likely achieve our performance goals faster and with less disruption. We should proceed with caution, data-driven decisions, and a clear focus on the end-user: the developer. The answer lies not in blindly following the crowd to Bazel, but in engineering the best solution for our specific constraints and resources. We must be the architects of our productivity, not the victims of our build times.

You also may like 〣〣