P-Y's blog

A weirder HashMap

Pierre-Yves Ricau — Thu, 15 Feb 2024 03:59:18 GMT

Today I was mob programming with Square's Mobile & Performance Reliability team and we toyed with an interesting idea.

Our codebase has classes that represent screens a user can navigate to. These classes are defined in modules, and these modules have an owner team defined.

When navigating to a screen, we wanted to have the owner team information available, at runtime. We created a build tool that looks at about 1000 Screen classes, determines the owner team, and generates a class to do the lookup at runtime.

The generated code looked like this:

object ScreenOwnership {  private val classNameToOwner = mapOf(    "com.example.feature1.Screen1" to "Team A",    "com.example.feature2.Screen2" to "Team A",    "com.example.feature3.Screen3" to "Team B",    // ... etc. for a thousand screens  )  fun ownerOf(screenClass: KClass<Screen>): String {    return classNameToOwner.getValue(screenClass.java.name)  }}

This works, but it feels a bit wasteful. Let's explore why.

Yes, I'm aware, "Premature optimization is the root of all evil". Our goal here was to think through the impact of this implementation and come up with an alternative implementation, for fun. Don't be a killjoy.

mapOf and pairs

mapOf(vararg pairs: Pair) is a nice utility to create a map (more specifically, a LinkedHashMap) but using that syntax leads to the creation of a temporary vararg array of size 1000, as well as 1000 temporary Pair instances.

Memory hoarding

Let's look at the retained size of the map we just created:

~30 characters per class name * 2 bytes per character = 60 bytes per entry
Each entry is stored as a LinkedHashMapEntry which adds 2 references to HashMap.Node which itself holds 3 references and 1 int. On a 64bit
VM that's 5 references * 8 bytes, plus 4 bytes for the int: 44 bytes per entry.
So for the entries alone we're at (60 + 44) * 1000 = 104 KB.
The default load factor is 75%, which means the size of the array backing the hashmap must always be at least 25% greater than the number of entries. And the array size has to be a factor of 2. So, for 1000 entries, that's an object array of size 2048: 2048 * 8 = 16,314 bytes.

The total retained size of the map is ~120 KB. Can we do better?

Could we make it... 0?

100% code-based map

What if we generate code that returns the right team for a given screen, instead of creating a map?

object ScreenOwnership {  fun ownerOf(screenClass: KClass<Screen>): String {    return when(screenClass.java.name) {      "com.example.feature1.Screen1" -> "Team A"      "com.example.feature2.Screen2" -> "Team B"      "com.example.feature2.Screen3" -> "Team C"      // ... etc for a thousand screens      else -> error("Unknown screen class ${screenClass.java.name}")    }  }}

The Kotlin compiler is smart and actually generates code that checks the string against its hashcode first, which looks like this:

object ScreenOwnership {  fun ownerOf(screenClass: KClass<Screen>): String {    val screenClassName = screenClass.java.name    return when (screenClassName.hashCode()) {      1179818499 -> if (screenClassName == "com.example.feature1.Screen1")        "Team A"      else        null      -627635963 -> if (screenClassName == "com.example.feature2.Screen2")        "Team B"      else        null      -627635962 -> if (screenClassName == "com.example.feature3.Screen3")        "Team C"      else        null      // ... etc for a thousand screens      else -> null    } ?: error("Unknown screen class $screenClassName")  }}

Since we know the full list of screen classes, we can check ahead of time whether there's any hashcode conflict, and if not, we can generate code that directly associates the hashcode of the screen class name to the corresponding team:

object ScreenOwnership {  fun ownerOf(screenClass: KClass<Screen>): String {    val screenClassName = screenClass.java.name    return when (screenClassName.hashCode()) {      1179818499 -> "Team A"      -627635963 -> "Team B"      -627635962 -> "Team C"      // ... etc for a thousand screens      else -> error("Unknown screen class $screenClass")    }  }}

Linear scan?

HashMap.get(key) calls key.hashCode() , wrangles that hashcode modulo the size of the backing array, which gives it an index to look up a linked list of entries in the backing array. The higher the load factor, the more hash collisions and the larger the linked lists. In other words, assuming a reasonable load factor, HashMap.get(key) has constant time performance - O(1) time complexity - to locate the entry, however it does need to follow references which might require loading a different page of memory.

Contrast this with our when based implementation:

when (screenClassName.hashCode()) {  1179818499 -> "Team A"  -627635963 -> "Team B"  -627635962 -> "Team C"}

This code seems to imply that we need to check the hashcode against every single value. That's a linear scan, O(n) time complexity. Oh no!

Fortunately for us, the Android ART runtime will transform a switch on Integers and sort the branches so that it can then perform a binary search at runtime (source) to jump to the correct instruction:

static inline int32_t DoSparseSwitch(const Instruction* inst, const ShadowFrame& shadow_frame,                                     uint16_t inst_data)    {  const uint16_t* switch_data = reinterpret_cast<const uint16_t*>(inst) + inst->VRegB_31t();  int32_t test_val = shadow_frame.GetVReg(inst->VRegA_31t(inst_data));  uint16_t size = switch_data[1];  // Return length of SPARSE_SWITCH if size is 0.  if (size == 0) {    return 3;  }  const int32_t* keys = reinterpret_cast<const int32_t*>(&switch_data[2]);  const int32_t* entries = keys + size;  int lo = 0;  int hi = size - 1;  while (lo <= hi) {    int mid = (lo + hi) / 2;    int32_t foundVal = keys[mid];    if (test_val < foundVal) {      hi = mid - 1;    } else if (test_val > foundVal) {      lo = mid + 1;    } else {      return entries[mid];    }  }  // No corresponding value: move forward by 3 (size of SPARSE_SWITCH).  return 3;}

With a binary search, we're looking at O(log n) time complexity. Not bad!

Scatter Map

I would be remiss if I did not mention Romain Guy's recent article, A Better Hash Map, which inspired this article. ScatterMap significantly improves the memory footprint & memory cache behavior over HashMap. It's very cool! Not as cool as generating a code-based map though 😜.

DIY: your own Dependency Injection library!

Pierre-Yves Ricau — Thu, 18 Jan 2024 20:08:04 GMT

Dependency Injection libraries are powerful tools, but they're often also intimidating & confusing.

When that happens to me, I find that understanding how a tool works helps me get over the initial scare of the dark magic internals.

In this article, I'll walk you through how to implement your own dependency injection library. Starting with manual dependency injection, we'll progressively build a simplistic version of Google Guice, then Dagger 1 and eventually Dagger 2.

By the end of this article, I'm hoping you'll have built up a good intuition for how all these libraries work under the hood. You'll be the life of the party when you casually drop with a straight face: "oh yeah, a dependency injection library is mostly just a map of types to factories".

The code presented here is available at github.com/pyricau/diy.

Manual Dependency Injection

Dependency Injection is a pattern, so starting with no library is helpful. Manual dependency injection typically requires creating instances in the right order, in a dedicated configuration place in the code.

🙅

A common anti-pattern that is not manual dependency injection (looking at some of my iOS developer friends 😘) is having objects be in charge of creating their own collaborators and being passed in the dependencies of these collaborators. If adding a new dependency requires passing it through 10 classes, you're doing it wrong.

Coffee Example

We want to create a CoffeeMaker, which needs a CoffeeLogger, a Heater and a Pump . For the Heater we'll use an ElectricHeater which also needs a CoffeeLogger, and for the Pump we'll use a Thermosiphon which needs a CoffeeLogger and a Heater.

Here's what that looks like in Kotlin:

class CoffeeLoggerinterface Heaterclass ElectricHeater(logger: CoffeeLogger) : Heaterinterface Pumpclass Thermosiphon(logger: CoffeeLogger, heater: Heater) : Pumpclass CoffeeMaker(logger: CoffeeLogger, heater: Heater, pump: Pump)

🤯

Thermowhat?! This DI example comes from Dagger 1, and many folks found it to be a confusing example: they were learning DI, learning a new DI library, and the example didn't map to something they knew how to build in real life. I asked Jesse Wilson why he chose that example, he said: "I was reading about coffee machines and learned about how Mr Coffee doesnt have a pump, just a heater". To make coffee, you need to pour water on ground beans. To move that water towards the beans, you can use a mechanical pump. But Mr Coffee uses a Thermosiphon instead (thermo = hot, siphon = tube), which is, according to Wikipedia, "a method of passive heat exchange, based on natural convection, which circulates a fluid without the necessity of a mechanical pump". Now you know how to make coffee!

We can represent the CoffeeMaker and its dependencies as a directed graph:

A dependency graph is actually a Directed Acyclic Graph aka DAG (hence the name Dagger!) as there cannot be cycles between dependencies.

💡

You can support dependency cycles by using lazy or setter injection, which breaks up the resolving of dependencies into several rounds. Each round then resolves a DAG of dependencies with no cycle.

CoffeeMaker here is called an Entry Point, it's the thing we want to build and the root of our dependency graph.

Let's create a CoffeeMaker and brew!

val logger = CoffeeLogger()val heater: Heater = ElectricHeater(logger)val pump: Pump = Thermosiphon(logger, heater)val coffeeMaker = CoffeeMaker(logger, heater, pump)coffeeMaker.brew()

With manual Dependency Injection, creating dependencies in the right order quickly becomes a problem as the number of collaborators increases. To avoid these issues, we need a Dependency Injection library!

Concepts

Let's first introduce a few API contracts that will be useful throughout this article.

ObjectGraph

The ObjectGraph is our entry point into a DI library. It's also known as Injector, Container, or Component. It's what our application code uses to get started with doing things, and its main job is to provide instances of a requested type:

class ObjectGraph {  operator fun  get(requestedType: Class<T>): T}

The API is straightforward:

val coffeeMaker = objectGraph.get(CoffeeMaker::class.java)// Or using the get() operator overload:val coffeeMaker = objectGraph[CoffeeMaker::class.java]

We can write a reified extension function to leverage the power of the Kotlin compiler:

inline fun <reified T> ObjectGraph.get() = get(T::class.java)// Thank you Kotlin compiler!val coffeeMaker = objectGraph.get()

Factory

A Factory knows how to create instances of a particular type. It can leverage the ObjectGraph to retrieve the dependencies needed to create a collaborator.

fun interface Factory {  fun get(objectGraph: ObjectGraph): T}

The Factory for CoffeeMaker could be implemented as:

  val coffeeMakerFactory = Factory { objectGraph ->    CoffeeMaker(objectGraph.get(), objectGraph.get(), objectGraph.get())  }

💡

We don't have to write CoffeeMaker(logger, heater, pump) here and can just repeatedly call the reified function ObjectGraph.get(), the Kotlin compiler will then pass in the right Class objects.

Module

A Module knows how to create a factory for a specific type. Module.get() might return null if a given module doesn't know how to create a factory for that requested type.

interface Module {  operator fun  get(requestedType: Class<T>): Factory?}

At this point, we start seeing how these concepts connect: when calling ObjectGraph.get() , the object graph will leverage its list of Module to find a suitable Factory for that type and then use the Factory to create the instance.

FactoryHolderModule

Our initial Module implementation is FactoryHolderModule, it holds a map of types to their associated Factory. We call FactoryHolderModule.install(type, factory) to add new factory, and FactoryHolderModule.get(type) to retrieve it:

class FactoryHolderModule : Module {  private val factories = mutableMapOfout Any?>, Factory<out Any?>>()  override fun  get(requestedType: Class): Factory? =    factories[requestedType] as Factory?  fun  install(    requestedType: Class<T>,    factory: Factory<T>  ) {    factories[requestedType] = factory  }}

Here's how we would add the CoffeeMaker factory to a FactoryHolderModule:

val module = FactoryHolderModule()module.install(CoffeeMaker::class.java) { objectGraph ->  CoffeeMaker(objectGraph.get(), objectGraph.get(), objectGraph.get())}

Let's make this API nicer! We don't like having to pass in CoffeeMaker::class.java . Also, repeating objectGraph is annoying, could we use a lambda with receiver instead?

inline fun <reified T> FactoryHolderModule.install(  noinline factory: ObjectGraph.() -> T) = install(T::class.java, factory)// Nicer!val module = FactoryHolderModule()module.install {  CoffeeMaker(get(), get(), get())}

`ObjectGraph` implementation

Our ObjectGraph takes in a list of Module that knows how to create factories. ObjectGraph.get() retrieves the factory from the modules and then calls Factory.get(ObjectGraph):

class ObjectGraph(private val modules: List) {  constructor(vararg modules: Module) : this(modules.asList())  operator fun  get(requestedType: Class<T>): T {    val factory = modules      .firstNotNullOf { module -> module[requestedType] }    return factory.get(this)  }}

Delegating to the provided modules on every call to ObjectGraph.get() could be wasteful, so we can leverage FactoryHolderModule to add a caching layer in ObjectGraph for the factories:

class ObjectGraph(private val modules: List) {  constructor(vararg modules: Module) : this(modules.asList())  // Cache of factories already retrieves from modules.  private val factoryHolder = FactoryHolderModule()  operator fun  get(requestedType: Class<T>): T {    val knownFactoryOrNull = factoryHolder[requestedType]    val factory = knownFactoryOrNull ?: modules      .firstNotNullOf { module -> module[requestedType] }      .also { factory ->        factoryHolder.install(requestedType, factory)      }    return factory.get(this)  }}

Putting it all together

Let's create a CoffeeMaker and brew! We can install our factories on a FactoryHolderModule , then create an ObjectGraph with that module and ask it for a CoffeeMaker instance.

val module = FactoryHolderModule()module.install {  CoffeeLogger()}module.install {  ElectricHeater(get())}module.install {  Thermosiphon(get(), get())}module.install {  CoffeeMaker(get(), get(), get())}val objectGraph = ObjectGraph(module)val coffeeMaker = objectGraph.get()coffeeMaker.brew()

Unfortunately, this doesn't work! CoffeeMaker needs a Heater and a Pump. We've added a factory for Thermosiphon which is a Pump and ElectricHeater which is a Heater, but we didn't connect the interfaces with their implementations.

Bind

Let's introduce a bind() function that associates a requested type to a factory of a provided subtype:

inline fun <reified REQUESTED, reified PROVIDED : REQUESTED>    FactoryHolderModule.bind() {  install(REQUESTED::class.java) { objectGraph ->    objectGraph[PROVIDED::class.java]  }}// Nice!module.bind()module.bind()

Singletons

CoffeeMaker and Thermosiphon both need a Heater . The CoffeeMaker turns the Heater on, and the Thermosiphon starts pumping if the Heater is hot. For things to work correctly, CoffeeMaker and Thermosiphon should use the same Heater instance. We need singleton support!

Let's create a function that transforms any Factory into a caching factory that will reuse the instance after the first call:

fun  singleton(factory: Factory<T>): Factory {  var instance: Any? = UNINITIALIZED  return Factory { linker ->    if (instance === UNINITIALIZED) {      instance = factory.get(linker)    }    instance as T  }}val UNINITIALIZED = Any()

💡

This code isn't thread-safe! For a thread-safe implementation, see Dagger's DoubleCheck.

We already have a nice install function that takes a lambda with receiver, let's create a variant for singletons:

inline fun <reified T> FactoryHolderModule.installSingleton(  noinline factory: ObjectGraph.() -> T) {  install(T::class.java, singleton(factory))}

It works!

We've connected interfaces to implementations and added singletons:

val module = FactoryHolderModule()module.bind()module.bind()module.installSingleton {  CoffeeLogger()}module.installSingleton {  ElectricHeater(get())}module.install {  Thermosiphon(get(), get())}module.install {  CoffeeMaker(get(), get(), get())}val objectGraph = ObjectGraph(module)val coffeeMaker = objectGraph.get()coffeeMaker.brew()

Ugh, that's a lot more boilerplate than our manual DI:

val logger = CoffeeLogger()val heater: Heater = ElectricHeater(logger)val pump: Pump = Thermosiphon(logger, heater)val coffeeMaker = CoffeeMaker(logger, heater, pump)coffeeMaker.brew()

Can we get rid of the boilerplate?

`ReflectiveModule` - Guice style

What if we used reflection to figure out how to create object instances?

`@Inject`

First, we need a way to indicate which constructor to call, and convey which instances should be singletons. We can leverage the javax.inject library, which provides the @Inject and @Singleton annotations:

dependencies {    // ...    api("javax.inject:javax.inject:1")}

Let's sprinkle our annotations:

import javax.inject.Injectimport javax.inject.Singleton@Singletonclass ElectricHeater @Inject constructor(  private val logger: CoffeeLogger) : Heater {  // ...}

Injected constructor

For a given class to inject, we use reflection to find the constructor annotated with @Inject:

val requestedType: Class = //...val injectConstructor = requestedType.constructors.single {  it.isAnnotationPresent(Inject::class.java)}

We extract the types of the constructor parameters, ask the ObjectGraph for an instance of each parameter type then pass these parameters to the constructor:

val objectGraph: ObjectGraph = // ...val parameters = injectConstructor.parameterTypes.map { paramType ->  objectGraph[paramType]}.toTypedArray()val instance = injectConstructor.newInstance(*parameters)

ReflectiveFactory

All together, we get a ReflectiveFactory:

class ReflectiveFactory<T>(  requestedType: Class) : Factory {  private val injectConstructor = requestedType.constructors.single {    it.isAnnotationPresent(Inject::class.java)  } as Constructor  override fun get(objectGraph: ObjectGraph): T {    val parameters = injectConstructor.parameterTypes.map { paramType ->      objectGraph[paramType]    }.toTypedArray()    return injectConstructor.newInstance(*parameters)  }}

Then we create a ReflectiveModule that creates the right ReflectiveFactory for each requested type. It also checks if the class is annotated with @Singleton, in which case it wraps the factory in a caching factory:

class ReflectiveModule : Module {  override fun  get(requestedType: Class<T>): Factory {    val reflectiveFactory = ReflectiveFactory(requestedType)    return if (requestedType.isAnnotationPresent(Singleton::class.java)) {      singleton(reflectiveFactory)    } else {      reflectiveFactory    }  }}

Less boilerplate!

Our coffee example looks a lot nicer, it's similar to how Google Guice works:

val bindingModule = FactoryHolderModule().apply {  bind()  bind()}val objectGraph = ObjectGraph(  bindingModule,  ReflectiveModule())val coffeeMaker = objectGraph.get()coffeeMaker.brew()

This works well, but object creation is done through reflection which is slow. Could we generate code instead?

`InjectProcessor` Dagger-1 style

Generated factories

What if we generated the factory for each injected object, at compile time:

class Thermosiphon_Factory : Factory<Thermosiphon> {    override fun get(objectGraph: ObjectGraph) = Thermosiphon(      objectGraph.get(),      objectGraph.get()    )}

We'd also need to implement singleton support in the generated factories, leveraging the singleton function we defined earlier that transforms any Factory into a caching factory:

class ElectricHeater_Factory : Factory<ElectricHeater> {    private val singletonFactory = singleton { objectGraph ->        ElectricHeater(          objectGraph.get()        )    }    override fun get(objectGraph: ObjectGraph) = singletonFactory      .get(objectGraph)}

`InjectProcessor`

To generate the factory classes, we can use KSP. This article is already long so I won't bore you with all the details (read the source), here's how we generate the factory classes:

val className = "${injectedClassSimpleName}_Factory"ktFile.appendLine("class $className : Factory<$injectedClassSimpleName> {")val constructorInvocation =  "${injectedClassSimpleName}(" + function.parameters.joinToString(", ") {    "objectGraph.get()"  } + ")"if (injectedClass.isAnnotationPresent(Singleton::class)) {  ktFile.appendLine("    private val singletonFactory = singleton { objectGraph ->")  ktFile.appendLine("        $constructorInvocation")  ktFile.appendLine("    }")  ktFile.appendLine()  ktFile.appendLine(    "    override fun get(objectGraph: ObjectGraph) = singletonFactory.get(objectGraph)"  )} else {  ktFile.appendLine(    "    override fun get(objectGraph: ObjectGraph) = $constructorInvocation"  )}ktFile.appendLine("}")

`InjectProcessorModule`

We still need to use reflection to create an instance for each generated factory class:

class InjectProcessorModule : Module {  override fun  get(requestedType: Class<T>) : Factory {    val factoryClass = Class.forName("${requestedType.name}_Factory")    val factoryConstructor = factoryClass.getDeclaredConstructor()    return factoryConstructor.newInstance() as Factory?  }}

The generated factory will create objects without any reflection involved.

Less reflection!

Our coffee example runs faster, and the setup is almost identical, although we have to enable KSP:

plugins {    id("com.google.devtools.ksp")    kotlin("jvm")}dependencies {  // ...  ksp(project(":diy-processor"))}

The result is similar to how Dagger 1 works:

val objectGraph = ObjectGraph(  FactoryHolderModule().apply {    bind()    bind()  },  InjectProcessorModule())val coffeeMaker = objectGraph.get()coffeeMaker.brew()

Could we remove the last remaining use of reflection, get rid of the map of factories, and just invoke the right generated code as needed?

`ComponentProcessor` Dagger-2 style

We want to generate code that doesn't use reflection at all. To do this, we need a way to define what instance our object graph should be able to provide. We really only care about one thing: retrieving CoffeeMaker instances.

The dependency graph is a Directed Acyclic Graph, and CoffeeMaker is its root, which we call an entry point. We can resolve the entire dependency graph by looking at CoffeeMaker dependencies and then recursively looking at the dependencies of these dependencies. And we can do all that at compile time!

`@Component` interface

Let's define an interface that provides CoffeeMaker instances:

@Componentinterface CoffeeComponent {  val coffeeMaker: CoffeeMaker}

KSP `ComponentProcessor`

We then create a KSP ComponentProcessor to find this interface at compile time:

val symbols = resolver.getSymbolsWithAnnotation(Component::class.java.name)val componentInterfaces = symbols      .filterIsInstance()      .filter { it.validate() && it.classKind == INTERFACE }

We can look for properties on that interface, which we'll call entry points:

fun readEntryPoints(classDeclaration: KSClassDeclaration) =   classDeclaration.getDeclaredProperties().map { property ->     val resolvedPropertyType = property.type.resolve().declaration     EntryPoint(property, resolvedPropertyType)   }.toList()

For each of these entry points class, we can look for an @Inject constructor, list the constructor parameters, then look for @Inject constructors for these parameters as well, etc. Here's the code, it might seem like a lot of code but at the core it's a while loop that explores the dependency graph from the entry points:

fun traverseDependencyGraph(factoryEntryPoints: List<KSDeclaration>):  List {  val typesToProcess = mutableListOf()  typesToProcess += factoryEntryPoints  val factories = mutableListOf()  val typesVisited = mutableListOf()  while (typesToProcess.isNotEmpty()) {    val visitedClassDeclaration = typesToProcess.removeFirst()      as KSClassDeclaration    if (visitedClassDeclaration !in typesVisited) {      typesVisited += visitedClassDeclaration      val injectConstructors = visitedClassDeclaration.getConstructors()        .filter { it.isAnnotationPresent(Inject::class) }        .toList()      check(injectConstructors.size < 2) {        "There should be a most one @Inject constructor"      }      if (injectConstructors.isNotEmpty()) {        val injectConstructor = injectConstructors.first()        val constructorParams = injectConstructor.parameters.map {          it.type.resolve().declaration        }        typesToProcess += constructorParams        val isSingleton = visitedClassDeclaration          .isAnnotationPresent(Singleton::class)        factories += ComponentFactory(          visitedClassDeclaration,          constructorParams,          isSingleton        )      }    }  }  return factories}

`@Binds`

While the above code takes care of classes annotated with @Inject and @Singleton, remember that we also need a way to bind Heater to ElectricHeater and Pump to Thermosiphon. But we don't have a module to call methods on anymore, all we have is our component interface.

So we'll do the same weird trick that Dagger 2 did: define a new interface that will never be implemented, and only exists to hold methods that will never be invoked. These methods are our compile time API for defining an association between an interface and its implementation:

@Component(modules = [CoffeeBindsModule::class])interface CoffeeComponent {  val coffeeMaker: CoffeeMaker}interface CoffeeBindsModule {  @Binds fun bindHeater(heater: ElectricHeater): Heater  @Binds fun bindPump(pump: Thermosiphon): Pump}

We can now read the binding modules at compile time in our ComponentProcessor and build a map of requested types to provided types:

fun readBinds(componentAnnotation: KSAnnotation):  Map {  val bindModules = componentAnnotation    .getArgument("modules")    .value as List  val binds = bindModules    .map { it.declaration as KSClassDeclaration }    .flatMap { it.getDeclaredFunctions() }    .filter { it.isAnnotationPresent(Binds::class) }    .associate { function ->      val resolvedReturnType = function.returnType!!        .resolve().declaration      val resolvedParamType = function.parameters        .single().type.resolve().declaration      resolvedReturnType to resolvedParamType    }  return binds}

Generating the component implementation

All that's left for us is to generate the component implementation:

fun generateComponent(  model: ComponentModel,  ktFile: OutputStream) {  with(model) {    ktFile.appendLine("package $packageName")    ktFile.appendLine()    imports.forEach { import ->      ktFile.appendLine("import $import")    }    ktFile.appendLine()    ktFile.appendLine("class $className : $componentInterfaceName {")    factories.forEach { (classDeclaration, parameterDeclarations, isSingleton) ->      val name = classDeclaration.simpleName.asString()      val parameters = parameterDeclarations.map { requestedType ->        val providedType = binds[requestedType] ?: requestedType        providedType.simpleName.asString()      }      val singleton = if (isSingleton) "componentSingleton " else ""      ktFile.appendLine("    private val provide$name = $singleton{")      ktFile.appendLine(        "        $name(${parameters.joinToString(", ") { "provide$it()" }})"      )      ktFile.appendLine("    }")    }    entryPoints.forEach { (propertyDeclaration, type) ->      val name = propertyDeclaration.simpleName.asString()      val typeSimpleName = type.simpleName.asString()      ktFile.appendLine("    override val $name: $typeSimpleName")      ktFile.appendLine("      get() = provide$typeSimpleName()")    }    ktFile.appendLine("}")  }}

This generates the following CoffeeComponent implementation:

package coffeeimport diy.componentSingletonclass GeneratedCoffeeComponent : CoffeeComponent {    private val provideCoffeeMaker = {        CoffeeMaker(          provideCoffeeLogger(),          provideElectricHeater(),          provideThermosiphon()        )    }    private val provideElectricHeater = componentSingleton {        ElectricHeater(provideCoffeeLogger())    }    private val provideThermosiphon = {        Thermosiphon(provideCoffeeLogger(), provideElectricHeater())    }    private val provideCoffeeLogger = componentSingleton {        CoffeeLogger()    }    override val coffeeMaker: CoffeeMaker      get() = provideCoffeeMaker()}

Notice how there's no mention of Pump or Heater in this code, instead the factories are directly retrieving the appropriate implementation.

No more reflection!

Let's create a CoffeeMaker and brew!

val component = GeneratedCoffeeComponent()val coffeeMaker = component.coffeeMakercoffeeMaker.brew()

A different approach to binding types

After I reading this article, Manu Sridharan reached out with some feedback (thanks!) and an interesting question: "Regarding the "weird trick" for bindings from Dagger 2, it might be interesting to suggest why they did it this way. My guess is because of limitations in what types of arguments you can pass into a Java annotation."

I'm not sure why the Dagger 2 team decided to use abstract methods to define bindings, but I thought it'd be an interesting experiment to try an alternative approach.

Repeatable annotation

First, I defined a new @Bind annotation (to differentiate from @Binds, no s) and decided each annotation would define a single binding from a requested type to a provided type. Since we'll need more than binding, I made it repeatable:

@Repeatable@Target(CLASS)@Retention(SOURCE)annotation class Bind(  val requested: KClass<*>,  val provided: KClass<*>)

Our CoffeeBindsModule can now be updated:

@Bind(  requested = Heater::class,  provided = ElectricHeater::class)@Bind(  requested = Pump::class,  provided = Thermosiphon::class)interface CoffeeBindsModule

While the contract is a lot clearer, that feels more verbose than the previous approach:

interface CoffeeBindsModule {  @Binds fun bindHeater(heater: ElectricHeater): Heater  @Binds fun bindPump(pump: Thermosiphon): Pump}

Annotation type parameters?

Then I thought: it'd be nice if I could enforce that provided has to extend requested. I wonder if annotations can have type parameters?

Turns out, they can!

@Repeatable@Target(CLASS)@Retention(SOURCE)annotation class Bind<REQUESTED : Any, PROVIDED : REQUESTED>(  val requested: KClass,  val provided: KClass)

Here our updated module:

@Bind(  requested = Heater::class,  provided = ElectricHeater::class)@Bind(  requested = Pump::class,  provided = Thermosiphon::class)interface CoffeeBindsModule

Only type parameters

But wait a minute: if I'm providing type arguments to the annotation, then I can read those types at compile time, and I don't need the annotation arguments!

@Repeatable@Target(CLASS)@Retention(SOURCE)annotation class Bind<REQUESTED : Any, PROVIDED : REQUESTED>

The result looks really nice:

@Bind@Bindinterface CoffeeBindsModule

One nice benefit is that the IDE can now surface binding type errors as we type:

This new @Bind annotation would be so much better than @Binds from Dagger 2:

Less boilerplate
Easier to use: the annotation requires two parameters, with names that make it clear which is which.
Less prone to errors, you can't get the types wrong or pass in too many parameters.
No more weird abstract methods that are never called or implemented.

Unfortunately, Dagger 2 is a java annotation processor and, unlike Kotlin, Java annotations cannot have type parameters:

`@Bind` Implementation

The KSP implementation is straightforward, we update our annotation processor to read the annotation type arguments instead of the interface declared methods:

fun readBinds(componentAnnotation: KSAnnotation):  Map {  val bindModules = componentAnnotation    .getArgument("modules")    .value as List  val binds = bindModules    .map { it.declaration as KSClassDeclaration }    .flatMap { it.annotations }    .filter { it isInstance Bind::class }    .associate { annotation ->      val annotationArguments = annotation        .annotationType.resolve().arguments      val requested = annotationArguments.first()        .type!!.resolve().declaration      val provided = annotationArguments.last()        .type!!.resolve().declaration      requested to provided    }  return binds}

Conclusion

The code presented in this article is available at github.com/pyricau/diy, feel free to experiment with it! Who knows, you might end up creating Dagger 3 (if you do, hit me up, I have feature requests 😉).

ANR internals: touch dispatching through the view hierarchy

Pierre-Yves Ricau — Thu, 14 Sep 2023 16:32:24 GMT

I'm writing a blog series on ANR internals, where I'll use ANRs as an excuse to learn more about how various parts of Android work. This first article is focused on touch dispatching through the view hierarchy.

ANR triggers

How is an "Application Not Responding" (ANR) error triggered?

According to the Android documentation on ANRs:

When the UI thread of an Android app is blocked for too long, an "Application Not Responding" (ANR) error is triggered.

While blocking the UI thread is the cause of most ANRs, the Android OS doesn't care what your app's main thread is doing. Instead, it has expectations for how long apps should take to handle a few specific events. ANR means the application is not responding to the system (rather than to the user):

Input dispatching timed out

Input dispatching timed-out is the ANR trigger that Android developers are most familiar with:

An ANR is triggered for your app when your app has not responded to an input event (such as key press or screen touch) within 5 seconds.

To understand how these ANRs get triggered, it's helpful to understand how input dispatching works. In this article, we'll start by looking at touch dispatching through the view hierarchy.

View touch event dispatching

To start, let's add a breakpoint in a View.OnClickListener to see what happens when we tap a button:

View.PerformClick is a Runnable that invokes View.OnClickListener#onClick :

public class View {  private final class PerformClick implements Runnable {    @Override    public void run() {      performClick();    }  }  public boolean performClick() {    ListenerInfo li = mListenerInfo;    if (li != null && li.mOnClickListener != null) {      li.mOnClickListener.onClick(this);      return true;    } else {      return false;    }  }}

(source)

The View.PerformClick runnable is posted to the main thread on MotionEvent.ACTION_UP in View#onTouchEvent, and runs later on as the main thread looper dequeues its messages.

public class View {  public boolean onTouchEvent(MotionEvent event) {    switch (action) {      case MotionEvent.ACTION_UP:        // Use a Runnable and post this rather than calling        // performClick directly. This lets other visual state        // of the view update before click actions start.        if (mPerformClick == null) {          mPerformClick = new PerformClick();        }        mHandler.post(mPerformClick)    }  }}

(source)

Now let's add a breakpoint to View#onTouchEvent to understand where touch events come from:

MessageQueue#next invokes InputEventReceiver#dispatchInputEvent from native code. The event goes through a chain of ViewRootImpl.InputStage delegates before getting dispatched through the view hierarchy via ViewGroup#dispatchTouchEvent.

We merge those two sequence diagrams:

Compose touch event dispatching

Let's add a breakpoint in a Compose click lambda to understand how Compose handles taps:

Button(    onClick = { /* breakpoint here */ },}

MessageQueue#next invokes InputEventReceiver#dispatchInputEvent from native code. The event goes through a chain of ViewRootImpl.InputStage delegates before getting dispatched to through the view hierarchy and then getting dispatched through Compose nodes which eventually invoke the click lambda.

Aside: smoke & mirrors

Notice how the view framework posts the invocation of the view listener, whereas Compose invokes the lambda immediately on MotionEvent.ACTION_UP. This is presumably more efficient (no delay in handling of taps). However, if your tap handling happens to be slow and blocks the main thread for a bit (e.g. updating the entire UI in response to a tap), the view posting allows the render thread to start a ripple animation on the button on MotionEvent.ACTION_UP and the ripple will animate while the main thread is blocked. I noticed this when, after migrating a button from views to compose, the interaction felt worse and slower even though the performance was similarly bad.

Conclusion

Today we saw that the Android framework has native code that invokes InputEventReceiver#dispatchInputEvent which then dispatches touch events to the view hierarchy of the target window. With Compose, clicks listeners are invoked immediately on MotionEvent.ACTION_UP whereas with views click listeners are invoked from a main thread post enqueued on MotionEvent.ACTION_UP.

This article was a warm-up, next time we'll dig into something a little more interesting: Looper internals.

A script to compare two Macrobenchmarks runs

Pierre-Yves Ricau — Thu, 07 Sep 2023 16:36:32 GMT

In Statistically Rigorous Android Macrobenchmarks, I laid out a methodology for rigorously comparing the outcome of two Jetpack Macrobenchmark runs. To summarize the article:

Remove sources of variations until the distribution fits a normal distribution with a stable standard deviation.
Then compute the confidence interval for a difference between two means.

When I published the article, I also shared a Google Spreadsheet template that did some of that work.

Later on, a colleague (thanks Aaron!) shared a Github repo of kscripts from Kaushik Gopal, and I realized I could easily turn my spreadsheet into a small Kotlin script to make it easier for other Square Android developers to play with comparing Macrobenchmark runs. So I did that.

Then I went on paternity leave and forgot about this, until recently when Saket Narayan reminded me that it could be worth sharing with the community.

Without further ado, here's the script.

You first need to install Kotlin, download the script and make it executable

# Install kotlinbrew install kotlin# Download the comparison scriptcurl -O https://gist.githubusercontent.com/pyricau/07fd9598c5cdec0bc9f62505b6329df7/raw/977b2a84532758fd614f6cc44dab43a242922cdb/compare.benchmarks.main.ktschmod u+x compare.benchmarks.main.kts

Then you can run the comparison script:

# Compare the json output in run1 and run2 folders.compare.benchmarks.main.kts run1/com.example.macrobenchmark-benchmarkData.json run2/com.example.macrobenchmark-benchmarkData.json###########################################################################Results for com.example.InteractionLatencyBenchmarks#openHomeScreen##################################################NavigationMs#########################DATA CHECKS All checks passed, the comparison conclusion is meaningful.Data checks for Benchmark 1-  At least 30 iterations (100)-  CV (5.26) <= 6%-  Latencies pass normality testData checks for Benchmark 2-  At least 30 iterations (100)-  CV (4.32) <= 6%-  Latencies pass normality test-  Variance less than doubles (0.66)#########################RESULTMean difference confidence interval at 95% confidence level:The change yielded no statistical significance (the mean difference confidence interval crosses 0): from -6 ms (-2.36%) to 1 ms (0.3%).#########################MEDIANSThe median went from 259 ms to 231 ms.DO NOT REPORT THE DIFFERENCE IN MEDIANS.This data helps contextualize results but is not statistically meaningful.#########################

While I'd love to get feedback and ideas for improvements (hit me up!), I'm providing this script as is, with no guarantees and no intention to maintain it. Do whatever you want with it!

Freezes & ANRs? Check memory leaks!

Pierre-Yves Ricau — Thu, 20 Jul 2023 22:09:53 GMT

In this article, I show how Android memory leaks lead to jank, freezes and ANRs more often than they lead to OutOfMemoryError crashes.

At Square, we've been tracking a User-Centric performance metric: Interaction Latency. We track this on every app navigation (example implementation: Tap Response Time: Jetpack Navigation 🗺).

In other words, for every navigation, we report a latency metric that measures the duration from when the tap was received to when the display was updated, i.e. how much latency users perceive.

val durationMillis = frameCommitted - actionUpMotionEvent.eventTimeanalytics.logNavigation(  originScreen,  destinationScreen,  durationMillis)

Resource consumption metrics like memory usage are often reported as time series, which isn't useful when trying to correlate app usage with memory leaks.

In January 2023, Pavlo Stavytskyi published Detecting Android memory leaks in production on the Lyft Engineering blog.

One interesting idea in the article was to report memory usage metrics on every screen navigation instead of as a time series because memory leaks tend to accumulate with app usage.

Let's update our navigation analytics to add memory usage:

val runtime = Runtime.getRuntime()val javaHeapUsage = runtime.totalMemory() - runtime.freeMemory()analytics.logNavigation(  sourceScreen,  destinationScreen,  durationMillis,  javaHeapUsage)

Memory limit

If Android devices had infinite memory, memory leaks wouldn't be an issue. Android devices have limited RAM, every app is allowed to use only a subset of the device RAM for its Java heap, and memory leaks become an issue when memory usage is close to the limit. That limit is configured per device and can be queried with Runtime.maxMemory():

val javaHeapLimit = Runtime.getRuntime().maxMemory()analytics.logNavigation(  sourceScreen,  destinationScreen,  durationMillis,  javaHeapUsage,  javaHeapLimit)

Example leaky session

We can now graph memory usage over time for a single session, where each data point in a single navigation. Here's a real example session with 1591 navigations where we see memory usage grow over time:

Notice how Java heap usage is constantly jumping up & down as the GC runs, but the trend is upward which indicates a memory leak. Applying a linear regression shows a slope of +146 KB per navigation.

Let's add Navigation Latency to the graph:

Notice how Navigation Latency is fairly flat throughout the session until memory usage gets close to the memory limit, at which point Navigation Latency shoots up. We can zoom in on the last 200 navigations:

In this example session, the UI freezes up for seconds at a time while the GC is blocking the main thread to reclaim memory. This starts happening when memory gets within 18 MB of the limit.

The progressive impact of memory leaks

As Java heap memory gets close to the app memory limit, the impact of memory leaks is more and more noticeable.

First, small GC pauses cause animation jank.
Then longer GC pauses cause increasingly longer UI freezes, for seconds at a time.
If the main thread freezes for more than 5 seconds while touch events are pending dispatch, the app triggers an Application Not Responding (ANR) error.
Eventually, there's so little memory left that we can't allocate new objects and the app crashes with an OutOfMemoryError exception.

Missing the real impact of memory leaks

If you have crash reporting in place and a process to fix top crashes, well done! Unfortunately, you can't just look at OutOfMemoryError crashes to decide when to look into fixing Java memory leaks, for two reasons:

Crash reporting tools typically group crashes by identical stack traces and provide a count by crash group. When memory is low an OutOfMemoryError can be thrown from anywhere in the app code, which means that every OutOfMemoryError potentially has a different stack trace. Instead of one crash entry with 1000 crashes, OutOfMemoryError crashes get reported as 1000 distinct crashes and hide in the long tail of low-occurring crashes.
As the app slows down and freezes for several seconds, mobile users will either stop using it, or kill it and restart it. So the app might never crash with OutOfMemoryError even though the customer impact is real.

Takeaways

Android memory leaks progressively lead to jank, then freezes, then ANRs and eventually OutOfMemoryError crashes (if the user hasn't already left or killed the app).
When an ANR report shows a stacktrace that doesn't seem like it could actually cause an ANR, you should look at memory usage and memory limit. If memory is close to the limit, the ANR is probably happening because the GC is blocking the main thread.
To avoid these performance issues, you should systematically fix all memory leaks surfaced by LeakCanary.
By combining memory usage & memory limit data with performance data in production, you can surface the relationship between memory leaks & performance.
- While I can't share the actual numbers, we saw a direct correlation between user activity, leak rate, and freeze / ANR rate.
A linear regression of memory usage over navigations per session can show whether a session has a memory leak, and how bad the leak is.

Tracking Android App Launch in production

Pierre-Yves Ricau — Thu, 13 Jul 2023 00:57:36 GMT

👋 In this article, I summarize my findings on Android app launch after a few years of deep dives, with the hope that these guidelines help implementations track app launches correctly.

User-Centric App Launch analytics

In User-Centric Mobile Performance, I argued that mobile app teams should primarily track user-centric performance metrics that represent the experience of using the app, that responsiveness thresholds should be context specific as user expectations vary based on what they are doing and what they're used to.

I used the example of app launches and a few days later Hanson Ho published a blog post arguing, amongst other things, that you should think about the percentage makeup of cold vs warm launches rather than just focusing on improving cold launches. Read it, it's a great perspective.

Hanson made a really good point: users don't really care about the specifics of whether an app is a cold or warm launch.

So what is it that users care about? As I highlighted above, user expectations vary based on what they are doing and what they're used to. We should put cold / hot / warm launch aside. Users can launch an app in different ways:

Tapping on a launcher icon
- Users expect this to take a few seconds.
Tapping on a notification
- Recent notifications are expected to open fast, but it's fairly common for long-running notifications to take a while to open the app.
Launch from another app (integration / chooser intent)
- This should be fairly fast, as users are in a flow trying to accomplish a task.
Bringing it back from Recents.
- This is expected to be almost instant, as the OS gives the illusion of all these apps being available to go back to at any point in time.

The first 3 cases can be determined by looking at the intent of the launched activity, and the last case by looking at whether the activity is provided with a saved instance state.

While it's useful to track whether a launch was cold / warm / hot, your primary dashboards and metrics should instead focus on the different scenarios under which users are launching the app, and target different thresholds for each.

For example, if 80% of app launches are from Recents (i.e. users constantly swap in & out of the app), then you should focus on optimizing that. And if 50% of the launches from Recents are cold launches, bringing that metric down, then you should look into why the app keeps getting killed (e.g. crash or high memory usage).

App launch, app start or app startup?

These words tend to be used interchangeably. However, I prefer Launch because it reminds me of the app launcher. Start can be confusing because an app launch can involve a process start or not. Also, in a Hot Launch the activity isn't actually started as it goes from paused to resumed.

What is an App Launch?

An App Launch is when an app that had no visible activity now has visible activities.

In other words, an App Launch is when an app that had no activity in started or resumed state now has activities in started or resumed state.

I focus on activities because Android places each process into an importance hierarchy where a process can have a foreground or visible importance yet have no created activity (see Processes and app lifecycle).
I focus on visible rather than foreground because I don't want to count as app launch the case where the app had an activity visible in the background of an activity from another app and then came to the foreground.

App Launch Temperature

The App startup time documentation does a great job of describing the difference between cold, warm, and hot launches.

However, there are different flavors of cold and warm launches, so it's worth digging into the details.

Cold Launch

A Cold Launch is a launch where there was no existing app process and the system started the app process for the purpose of launching an activity. This is key: we need to check why the system decided to start the process. If the process was started for another reason and then sometimes during the init the system asked the app to launch an activity, then it's not a cold launch.

Some interesting attributes to track for cold launches:

Is it the first launch after an install?
- First launch after install is critical for new users, yet it often triggers additional first time init work.
Is it the first launch after an upgrade?
- Upgrades often trigger additional migration work on launch.
Is it the first launch after clear data?
- This should be a similar launch to the first launch after install, but is interesting to track separately to detect when users clear data which likely indicates an issue with the app.
Did the launched activity have a saved instance state (process recreation for when bringing the activity back from Recents).

Hot Launch

A Hot Launch is a launch where there was an existing app process as well as an activity in Created state (i.e. stopped) that was then started.

Warm Launch

A Warm Launch is any launch that isn't a cold launch or a hot launch:

A launch where there was an existing app process but no activity. The activity might be created with a saved state to restore (brought back from Recents) or with no saved state.
A launch where the process was started for a reason not related to the launch, wasn't done with startup yet (i.e. Application.onCreate() wasn't finished running) then sometimes during the init the system asked the app to launch an activity. While this might look like a cold launch, users will actually not experience the full cold launch duration.

Pre Launch state

The launch temperature is determined by the state of the app before the launch. In square/papa, I defined the following list of pre-launch states:

enum class PreLaunchState(val launchType: AppLaunchType) {  /**   * This is typically referred to as a "cold start".   * The process was started with a FOREGROUND importance and   * the launched activity was created, started and resumed before our first post   * ran.   */  NO_PROCESS(COLD),  /**   * Same as [NO_PROCESS] but this was the first launch ever,   * which might trigger first launch additional work.   */  NO_PROCESS_FIRST_LAUNCH_AFTER_INSTALL(COLD),  /**   * Same as [NO_PROCESS] but this was the first launch after the app was upgraded, which might   * trigger additional migration work. Note that if the upgrade if the first upgrade   * that introduces this library, the value will be [NO_PROCESS_FIRST_LAUNCH_AFTER_CLEAR_DATA]   * instead.   */  NO_PROCESS_FIRST_LAUNCH_AFTER_UPGRADE(COLD),  /**   * Same as [NO_PROCESS] but this was either the first launch after a clear data, or   * this was the first launch after the upgrade that introduced this library.   */  NO_PROCESS_FIRST_LAUNCH_AFTER_CLEAR_DATA(COLD),  /**   * This is the coldest type of "warm start". The process was not started with   * a FOREGROUND importance yet the launched activity was created, started and resumed   * before our first post ran. This means that while the process while starting, the   * system decided to launch the activity.   */  PROCESS_WAS_LAUNCHING_IN_BACKGROUND(WARM),  /**   * This is a "warm start" where the activity brought to the foreground had to be created,   * started and resumed, and the task had no saved instance state bundle.   */  NO_ACTIVITY_NO_SAVED_STATE(WARM),  /**   * This is a "warm start" where the activity brought to the foreground had to be created,   * started and resumed, and the task can benefit somewhat from the saved instance state bundle   * passed into onCreate().   */  NO_ACTIVITY_BUT_SAVED_STATE(WARM),  /**   * This is a "hot start", the activity was already created and had been stopped when the app   * went in background. Bringing it to the foreground means the activity was started and then   * resumed. Note that there isn't a "ACTIVITY_WAS_PAUSED" entry here. We do not consider   * going from PAUSE to RESUME to be a launch because the activity was still visible so there   * is nothing to redraw on resume.   */  ACTIVITY_WAS_STOPPED(HOT);}

Here's the code to compute the pre-launch state which checks for cold starts by looking at process importance and whether the launched activity was created before the first post ran (as highlighted in Why did my process start? 🌄).

Launch start timestamp

Cold Launch

In When did my app start? I concluded that you should use:

Up to API 24: Use the class load time of a content provider which high priority.
API 24 - API 28: Use Process.getStartUptimeMillis().
API 28 and beyond: Use Process.getStartUptimeMillis() but filter out weird values (e.g. more than 1 min to get to Application.onCreate()) and fallback to the time ContentProvider.onCreate().

Since then API 33 added Process.getStartRequestedUptimeMillis() which is super promising, though I haven't tried using yet.

Process.getStartRequestedUptimeMillis(): when the user started waiting for the app to launch
Process.getStartUptimeMillis(): when the app started having an influence on launch time (when the APK starts loading)

Warm & Hot Launch

As far as I know, there's no API that provides the launch intent timestamp. There should be!

So our best option here is to leverage ActivityLifecycleCallbacks to capture timestamps:

Warm launch: onActivityPreCreated() on API 29+ and [onActivityCreated()] (https://developer.android.com/reference/android/app/Application.ActivityLifecycleCallbacks#onActivityCreated(android.app.Activity,%20android.os.Bundle)) pre API 29.
Hot launch: onActivityPreStarted() on API 29+ and onActivityStarted() pre API 29.

Launch end timestamp

The end of a launch is when the window of the launched activity has been rendered into a frame and submitted to the swap chain.

In First draw time 👩🎨 I showed how to access the decor view of the launched activity to add an OnDrawListener.

From the onDraw() callback, we need to figure out when that frame traversal is submitted to the swap chain with:

registerFrameCommitCallback() on API 29+
Handler.postAtFrontOfQueue() pre API 29.

All together in `square/papa`

I'm hopeful that some day Google will release a new AndroidX library to track app launches (LaunchStats?). Until then, you can roll your own code following the guidelines I outlined, or leverage square/papa:

class ExampleApplication : Application() {  override fun onCreate() {    super.onCreate()    PapaEventListener.install { event ->      when (event) {        is AppLaunch -> {          Analytics.logAppLaunch(            preLaunchState = event.preLaunchState            durationUptimeMillis = event.durationUptimeMillis          )        }      }    }  }}

I just realized that the AppLaunch event does not provide the intent and saved state details which are necessary to track User-Centric App Launch analytics as highlighted at the start of this article, so I filed square/papa#57 to fix that.

Header image generated by DALL-E.

User-Centric Mobile Performance

Pierre-Yves Ricau — Thu, 06 Jul 2023 20:15:44 GMT

👋 Hi, this is P-Y, over the last three years I've been steering Square's focus on mobile performance and building a framework for thinking about it and prioritizing work. In this article I share my approach, let me know what you think!

Useful metrics

As an app codebase and user base grows, it becomes harder for product teams to evaluate the performance of an app just by playing with it.

When talking about performance, it's important to be precise and to refer to performance in terms of objective criteria that can be quantitatively measured. These criteria are known as metrics.
But just because a metric is based on objective criteria and can be quantitatively measured, it doesn't necessarily mean those measurements are useful.
Source: web.dev

Useful for what? Product teams deal with competing priorities, they need a signal to know when and where to prioritize performance work, i.e. they need to know when poor performance is impacting their customer's experience and ultimately driving business metrics down.

Mobile performance metrics often take inspiration from the backend world and measure resource usage (CPU usage, memory usage) and workload durations (how long it takes to run a piece of code).

While app performance directly impacts User Experience, metrics surfacing high CPU usage or slow database reads are not useful measurements to convey the actual user experience. They're not user-centric.

For example, when exporting an Insta360 video, as a customer I want the app to maximize GPU and CPUs usage so that the export goes faster.

Insta360 threads (in purple) scheduled on all CPUs while exporting a video.

User-centric performance metrics

Mobile app teams should primarily track user-centric performance metrics that represent the experience of using the app.

This helps product teams make better performance investments. Unlike technical metrics, we can't dance around user-centric performance metrics: while we could argue on the right amount of memory usage or the right duration of a database query, we can't argue against a metric that demonstrates a bad customer experience.

Of course, app teams should still track resource usage and technical workload durations as secondary metrics, to help understand the components influencing user-centric performance metrics.

There are two broad categories of user-centric performance metrics:

Smoothness metrics, which track whether motion on screen is perceived as janky.
Responsiveness metrics, which track the delay between a user action and a visible response from the system.

Human-based thresholds

User experience research suggests that humans do not perceive latency improvements beyond specific thresholds: 11 ms for the latency of on-screen drag motion (smoothness) and 69 ms for the latency of on-screen tap interactions (responsiveness).

Apple's app responsiveness documentation makes a similar distinction:

An app that responds instantly to users interactions gives an impression of supporting their workflow. When the app responds to gestures and taps in real time, it creates an experience for users that theyre directly manipulating the objects on the screen. Apps with a noticeable delay in user interaction (a hang) or movement on screen that appears to jump (a hitch), shatter that illusion. This leaves the user wondering whether the app is working correctly. To avoid hangs and hitches, keep the following rough thresholds in mind as you develop and test your app.
< 100 ms: Synchronous main thread work in response to a discrete user interaction.
< 1 display refresh interval (8 or 17ms): Main thread work and work to handle continuous user interaction.

Smoothness

Any sort of screen motion requires frame updates to be synced with the display refresh rate, otherwise humans notice jank.

Screen motion can be interactive or non-interactive. Interactive is when a finger is touching the display and the UI is following it, i.e. a drag (slow scrolling a list, or drag & drop). Non-interactive can be animations or a fling based scroll.

Jank on interactive screen motion has a stronger negative impact on user experience than jank on non-interactive screen motion.

Smoothness is typically measured by reporting frame rate and missed frames. Keep in mind that smoothness only matters when there's actual motion on screen, and matters more when that motion is interactive.

For example, it's actually ok to take 50 ms to update the UI in response to a tap. A tap isn't a drag, there's no motion. However, if after 50 ms we start an animation to open the new UI, that animation should render every frame.

Responsiveness

A responsiveness metric is any metric that tracks the delay between a user action and a visible response from the system, i.e. an interaction. Here are a few examples:

Launching an app by tapping its icon in the launcher.
Bringing an app back to the foreground from Recents.
Tapping a Like button and seeing a counter increase.
Tapping a list item to open up its details in a new screen.
Typing with a connected hardware keyboard.
Using a connected watch to trigger taking a picture from a phone.

Responsiveness thresholds

We have different expectations for how long these interactions should take: we expect an app launch to take significantly longer than seeing a counter increase after tapping a Like button.

These different expectations come from our ability to do pattern matching. Humans are really good at picking up a trend and detecting outliers. If most apps launch in 2 seconds, users immediately notice apps that launch in 500 ms or 5 seconds.

This means we can have 2 responsiveness thresholds per metric:

a low threshold below which the app is "significantly better than most apps"
a high threshold above which the app is "significantly worst than most apps"

These thresholds can't be universal, they're highly context-specific (low-end vs high-end devices, which other apps the users are exposed to, etc).

A similar approach can be found in the Interaction to Next Paint (INP) documentation, where two duration thresholds (200 ms & 500 ms) define a score of GOOD / NEEDS IMPROVEMENT / POOR for each measured sample.

I have used similar thresholds at Square but wasn't sure how I could turn these into a single performance number, so I mostly focused on the P90 instead.

Ty Smith just pointed me to the Apdex score which also defines two thresholds to splits samples in 3 buckets (Satisfied, Tolerated, Frustrated) then provides a score as a weighted average of counts.

Critical metrics

App launch time is critical when customers use an app for short amounts of time. App launch time is less critical when customers intend to use an app continuously for a long period of time (e.g. a Lyft Driver app, a point-of-sale app, a Check-In app for visitors, etc).

After taking your order in a restaurant, a waiter needs to be able to input it in the point-of-sale app really quickly without thinking and relies on muscle memory. This requires a predictable UI and consistent tap latency. In that context, tap latency is critical whereas app launch time (which happens once a day) is not.

Similarly, scroll smoothness is probably more critical for a feed than for a list of settings

Product teams should identify which interactions are critical for their customers, then use responsiveness thresholds to prioritize performance work.

Biais in aggregates

Production metrics yield large volumes of data, and product teams look at aggregate numbers. When applying uniform sampling or percentile-based aggregates, the resulting number will be biased toward the experience of high-activity users.

In a low-intent context where users don't need to stick around (e.g. Twitter), high performance typically correlates with high activity, so the metrics will not account for users that would have used the app if the performance was better. So a good performance number could hide really poor performance that led customers to not use the app. A possible solution to avoid this is to get a single number per device (or per user) and then aggregate that single number.

In a high-intent context where users absolutely need to use the app (e.g. to take a payment), aggregates are more likely to represent the full spectrum of users, and high activity often maps to customers that are more important to the business.

Capturing these metrics correctly is hard

Both Google & Apple have failed to provide useful performance observability APIs that would allow app developers to easily track user-centric performance metrics.

Observability vendors do provide SDKs that track these for you, the problem is, they all do a terrible job of it. Seriously.

Example 1: app launch

The guidance and tooling for measuring app launch time on Android is lacking and inconsistent, let's dive into the details.

Play Store Android Vitals

The Play Store Android Vitals provide sampled & anonymous aggregate data on cold, warm & hot launches.

There's no ability to slice & dice based on custom properties so it's really hard to use it to investigate or build a metric aligned with business needs.
There are no details on how the tracking is implemented. I had to look into the AOSP sources!

The PlayStore reads data from an internal log tracker (not logcat), which the system_server process writes to. The reported launch time is the same value as what's logged to logcat by ActivityTaskManager on startup:

I/ActivityTaskManager: Displayed com.example.logstartup/.MainActivity: +1s185ms

I investigated the logged values in How adb measures App Startup 🔎 . That investigation surfaced that:

The measured start time is when system_server receives the activity start intent, which can add a long time (in my debug test: ~300 ms / 30% of app startup) before the point where app developers have an influence (which is when the APK file starts loading).
The measured end time is the timestamp of when the window of the first resumed activity is done drawing.

Jetpack Macrobenchmark

Macrobenchmark pulls its launch time data from Perfetto, which computes it based on atrace logs. For example the ActivityMetricsLogger.LaunchingState constructor starts a trace:

        LaunchingState() {            if (!Trace.isTagEnabled(Trace.TRACE_TAG_ACTIVITY_MANAGER)) {                return;            }            // Use an id because the launching app is not yet known before resolving intent.            sTraceSeqId++;            mTraceName = "launchingActivity#" + sTraceSeqId;            Trace.asyncTraceBegin(Trace.TRACE_TAG_ACTIVITY_MANAGER, mTraceName, 0);        }

The trace is then pulled by Perfetto.

In summary, Jetpack Macrobenchmark, Perfetto, Play Store Android Vitals and logcat all report almost (Perfetto traces are started with a small offset) the same value. That's great, though it should be more systematically documented.

Unfortunately, measuring app launch time in-app to report production analytics is a different story.

Production analytics

Most implementations capture cold app launch by measuring:

The start as a timestamp when the first class loads or in an onCreate() callback.
The end when the first activity is resumed.

Both the measured start and end are incorrect yet that's exactly what Firebase Analytics is doing:

This approach is incorrect for several reasons:

This doesn't track hot launch & warm launch, which the Play Store Android Vitals reports.
This also includes in the cold launch count cases where the process is first started in background for unrelated reasons and then an activity is a bit later launched (within e.g. a minute of process start). This will inflate the aggregate durations of cold start launch metrics and should not be accounted for. These fake cold starts should be ignored, by looking at process importance in Application.onCreate() and making sure that the first Activity.onResume() happens before a main thread post scheduled from Application.onCreate() (see Android Vitals - Why did my process start? 🌄).
The start time is entirely disconnected from what Play Store Android Vitals report: the loading of a random class or a ContentProvider.onCreate() call happens long after system_server receives the launch intent.
~~Unfortunately, there's no API that provides the launch intent timestamp. There should be!~~ Actually I just found out there's Process.getStartRequestedUptimeMillis() since API 33.
Process.getStartUptimeMillis() (API 28+) is captured right before APK loading and is a much better option for reporting launch start than ContentProvider.onCreate() or class load time. Unfortunately real production data surfaced that on API 28+ we sometimes get start times hours in the past (see Android Vitals - When did my app start? ).
The end time should be when the first frame traversal after first activity resume is committed (see ViewTreeObserver.registerFrameCommitCallback. The first traversal is likely to involve quite a bit of work, and capturing the end in Activity.onResume() ignores all that work.

I ended up writing a longer article on how to do this right: Tracking Android App Launch in production.

Example 2: tap interaction latency

Observability vendors provide APIs to record spans. It's tempting to use spans to record the latency of an interaction, for example here how long it took to navigate to the About screen when tapping on the About button:

showAboutScreenButton.setOnClickListener {  val span = tracer.buildSpan("showAboutScreen").start()  findNavController().navigate(R.id.about_screen)  span.finish()}

Unfortunately, this doesn't account for the actual customer experience, as there's a lot of work that happens between when the finger leaves the screen and when the click listener is invoked, as well as between when the view hierarchy is updated and when the change is visible on display. Here's a Perfetto trace that demonstrates this common mistake:

App developers often don't have the bandwidth to dive into this complexity and end up using incorrect but easier to implement measurements.

A better way?

App launch

Here's what we need from Google:

Documentation that covers how all tools across the ecosystem (Play Store Android Vitals, Logcat, Perfetto, Macrobenchmark) currently measure app launch times (for cold, hot and warm start).
~~New Android APIs to access the timestamp of when the launch activity intent was received by system_server~~. This exists now, very cool!
An Android SDK that implements the correct way to track app launch as highlighted in the section above. This should include cold start, warm start, hot start. I followed up and wrote an in-depth article on how one might go about that: Tracking Android App Launch in production
Firebase (and 3rd party vendors, OpenTelemetry, etc) could then use that SDK.

Going forward

My long term goal with these articles and the square/papa library is that either Apple & Google step up and provide strong guidance and more useful observability APIs (e.g. JankStats), or we come together as a community and build solid foundational open source measurement SDKs.

Let's start a conversation! You can reach me on Mastodon, Bluesky, Threads or Twitter 🙃.

Header image generated by DALL-E.

Avoid Java double brace initialization

Pierre-Yves Ricau — Tue, 27 Jun 2023 20:58:00 GMT

TL;DR

Avoid doing this:

new HashMap() {{  put("key", value);}};

Leak Trace

I was recently looking at the following leak trace from LeakCanary:

 GC Root: Global variable in native code com.bugsnag.android.AnrPlugin instance    Leaking: UNKNOWN     AnrPlugin.client                ~~~~~~ com.bugsnag.android.Client instance    Leaking: UNKNOWN     Client.breadcrumbState             ~~~~~~~~~~~~~~~ com.bugsnag.android.BreadcrumbState instance    Leaking: UNKNOWN     BreadcrumbState.store                      ~~~~~ com.bugsnag.android.Breadcrumb[] array    Leaking: UNKNOWN     Breadcrumb[494]                ~~~~~ com.bugsnag.android.Breadcrumb instance    Leaking: UNKNOWN     Breadcrumb.impl                 ~~~~ com.bugsnag.android.BreadcrumbInternal instance    Leaking: UNKNOWN     BreadcrumbInternal.metadata                         ~~~~~~~~ com.example.MainActivity$1 instance    Leaking: UNKNOWN    Anonymous subclass of java.util.HashMap     MainActivity$1.this$0                     ~~~~~~ com.example.MainActivity instance     Leaking: YES (Activity#mDestroyed is true)

When opening a leak trace, my first step is to look at the object at the bottom to understand what its lifecycle is, as that'll help me understand if other objects in the leak trace are expected to have the same lifecycle or not.

At the bottom, we see:

 com.example.MainActivity instance     Leaking: YES (Activity#mDestroyed is true)

The activity is destroyed and should have been garbage collected, but it's held in memory.

So, at that point, I start looking for known types in the leak trace and try to figure out if they belong to the same destroyed scope (=> they're leaking) or to a higher scope (=> they're not leaking).

At the top, we see:

 com.bugsnag.android.Client instance    Leaking: UNKNOWN

Our BugSnag client is a singleton that we use for crash reporting purposes, we create a single instance per app, so it's not leaking.

 com.bugsnag.android.Client instance    Leaking: NO

So now we can update the leak trace to focus only on the section from the last Leaking: NO to the first Leaking: YES:

 com.bugsnag.android.Client instance    Leaking: NO     Client.breadcrumbState             ~~~~~~~~~~~~~~~ com.bugsnag.android.BreadcrumbState instance    Leaking: UNKNOWN     BreadcrumbState.store                      ~~~~~ com.bugsnag.android.Breadcrumb[] array    Leaking: UNKNOWN     Breadcrumb[494]                ~~~~~ com.bugsnag.android.Breadcrumb instance    Leaking: UNKNOWN     Breadcrumb.impl                 ~~~~ com.bugsnag.android.BreadcrumbInternal instance    Leaking: UNKNOWN     BreadcrumbInternal.metadata                         ~~~~~~~~ com.example.MainActivity$1 instance    Leaking: UNKNOWN    Anonymous subclass of java.util.HashMap     MainActivity$1.this$0                     ~~~~~~ com.example.MainActivity instance     Leaking: YES (Activity#mDestroyed is true)

The BugSnag client keeps a ring buffer of breadcrumbs. Those are meant to stay in memory, they're not leaking. So let's update again:

 com.bugsnag.android.BreadcrumbInternal instance    Leaking: NO

Once again, we update the leak trace to focus only on the section from the last Leaking: NO to the first Leaking: YES:

 com.bugsnag.android.BreadcrumbInternal instance    Leaking: NO     BreadcrumbInternal.metadata                         ~~~~~~~~ com.example.MainActivity$1 instance    Leaking: UNKNOWN    Anonymous subclass of java.util.HashMap     MainActivity$1.this$0                     ~~~~~~ com.example.MainActivity instance     Leaking: YES (Activity#mDestroyed is true)

BreadcrumbInternal.metadata: the leak trace goes through the metadata field of the breadcrumb implementation.
MainActivity$1 instance Anonymous subclass of java.util.HashMap: MainActivity$1 is an anonymous subclass of HashMap, defined in MainActivity. It's the first anonymous class defined from the top of MainActivity.java (because $1).
this$0: every anonymous class has an implicit field reference to the outer class in which it is defined, and that field is always named this$0.

Translated to English: one of the breadcrumbs logged to BugSnag has a metadata map that is an anonymous subclass of HashMap that holds a reference to an outer class, the destroyed activity.

Let's look at where we log breadcrumbs in MainActivity:

void logSavingTicket(String ticketId) {  Map metadata = new HashMap() {{    put("ticketId", ticketId);  }};  bugsnagClient.leaveBreadcrumb("Saving Ticket", metadata, LOG);}

This code is leveraging a fun Java pattern known as double brace initialization. It allows you to create a HashMap and initialize it at the same time by adding code to the constructor of an anonymous subclass of HashMap.

new HashMap() {{  put("ticketId", ticketId);}};

Java anonymous classes always have implicit references to their outer class. So this code:

void logSavingTicket(String ticketId) {  Map metadata = new HashMap() {{    put("ticketId", ticketId);  }};  bugsnagClient.leaveBreadcrumb("Saving Ticket", metadata, LOG);}

is actually compiled as:

class MainActivity$1 extends HashMap<String, Object> {  private final MainActivity this$1;  MainActivity$1(MainActivity this$1, String ticketId) {     this.this$1 = this$1;     put("ticketId", ticketId);  }}void logSavingTicket(String ticketId) {  Map metadata = new MainActivity$1(this, ticketId);  bugsnagClient.leaveBreadcrumb("Saving Ticket", metadata, LOG);}

As a result, the breadcrumb is holding on to the destroyed activity instance.

Conclusion

Avoid using Java double brace initialization, it's cute but creates additional classes for no good reason and risks introducing leaks. Instead, you can do things the boring and safe way:

Map metadata = new HashMap<>();metadata.put("ticketId", ticketId);bugsnagClient.leaveBreadcrumb("Saving Ticket", metadata, LOG);

Or leverage Collections.singletonMap() which make this nicer:

Map<String, Object> metadata = singletonMap("ticketId", ticketId);bugsnagClient.leaveBreadcrumb("Saving Ticket", metadata, LOG);

Or just convert the file to Kotlin.

Header image generated by DALL-E, prompt: "A coffee mug wearing orthodontic braces, digital art".

Statistically Rigorous Android Macrobenchmarks

Pierre-Yves Ricau — Mon, 01 May 2023 20:57:41 GMT

👋 Hi, this is P.Y., I work as an Android Distinguished Engineer at Block. As we started evaluating the performance impact of software changes, I brushed up on my statistics classes, talked to industry peers, read several papers, wrote an internal doc to help us interpret benchmark results correctly and turned it into this article.

At Square, we leverage Jetpack Macrobenchmark to benchmark the UI latency of critical user interactions. We trace user interactions with Trace.beginAsyncSection and capture their durations with TraceSectionMetric.

The official documentation shows how to see and collect the results, but does not provide any guidance on interpreting these results in the context of a software change. That's unfortunate: the primary goal of benchmarks is typically to compare two situations and draw conclusions.

Should I compare the min, the median, or the max? If I get a difference of 1 ms, is that an improvement? What if I run the same 2 benchmarks again and get different results? How do I know I can trust my benchmark results?

The Statistically rigorous java performance evaluation paper demonstrates how comparing two aggregate values (e.g. min, max, mean, median) from a benchmark can be misleading and lead to incorrect conclusions.

This article leverages statistics fundamentals to suggest a scientifically sound approach to analyzing Jetpack Macrobenchmark results. If you're planning on setting up infrastructure to measure the impact of changes on performance, you should probably follow a similar approach. This is a lot of work, so you might be better off buying a solution from a vendor. I talked extensively with Ryan Brooks from Emerge Tools and was pleased to learn that they follow a similar approach to what I'm describing in this article.

Towards the end, I'll also provide an interpretation of the statistics behind the Fighting regressions with Benchmarks in CI article that the Jetpack documentation recommends as a resource.

I wasn't particularly strong at statistics in college. If you notice a mistake please let me know!

Deterministic benchmarks

In an ideal world, running a benchmark in identical conditions (e.g. a specific version of our software on specific hardware with a specific account) would always result in the same performance and the same benchmark results. Changing some of these conditions, e.g. the hardware or software, would provide a different result.

For example, let's say we have an ideal work benchmark where we run 100 iterations, and for each iteration the measured duration is always 242 ms. So the mean is 242 ms. We optimize the code, run the 100 iterations again, and the measured duration is now 212 ms for each iteration. We can conclude that we improved performance by 30 ms or 12.4%. But if every iteration yields a slightly different result, we can't make that claim.

💡 Deterministic measurements can be summarized with a mean or sum. Unfortunately, Jetpack Macrobenchmark runs are not deterministic. It's possible to design alternate benchmarking methods that don't rely on time but instead track deterministic measurements (e.g. number of CPU instructions, number of IO reads, number of allocations), which provide clear-cut results. However, when evaluating performance as perceived by humans, time is the only measurement that matters.

Fixed environment

If we change more than one condition at a time in between two benchmark runs (e.g. both the software and hardware), we can't tell if a benchmark result change is caused by a software change or hardware change.

So, as a rule of thumb, we should only have one changing variable between two benchmark changes. Since we're interested in tracking the impact of code changes (i.e. the app software stack), we need everything else to be constant (hardware, OS version, account, etc). For example, we should not compare one benchmark on a phone with version 6 of the app with another one on a tablet with version 7 of the app.

It's still worth running a specific benchmark with several varying conditions (phone, tablet, types of users, etc), as long as we only do paired comparisons of the before & after change.

Nondeterminism & probability distribution

With Jetpack Macrobenchmark, even though we're in a fixed environment (identical device, OS version, etc), we're still benchmarking incredibly complex systems with many variables at play, such as dynamic CPU frequencies, RAM bandwidth, GPU frequency, OS processes running concurrently, etc. This introduces variation in the measured performance, so every iteration will give us a different result.

💡 One core idea is that the performance our benchmark is trying to measure follows a probability distribution: a mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment (wiki). We don't know the actual function for that probability distribution.

By running benchmarks before & after a change, we sample two probability distributions. We can draw histograms to get an approximate representation of the sample distributions. Let's look at an example:

The blue benchmark has a median of 295.8 ms and the red benchmark has a median of 292.9 ms. We could conclude that there's a 2.9 ms improvement, but the histograms show that these two sample distributions look very different. Is the red benchmark better than the blue one?

More importantly, if we run the before or the after benchmark several times, we will get a different median. Here, the difference in the medians is small enough that we might sometimes get better a median before the code change (blue), even though this time red was better.

We have to use statistical techniques to derive valid conclusions or surface inconclusive benchmarks.

Identifying sources of variation

In the previous chart, the first benchmark (blue) seems to have two modes (peaks). Instead of a histogram, we can graph the measured values in iteration order:

This tells a much different story! After investigating, we discover that this benchmark triggered thermal throttling: when a device temperature goes over a target, the CPU frequencies are scaled down until the temperature goes down. As a result, everything on the device runs slower.

We can work around this variance by avoiding thermal throttling: using a device with better thermal dissipation, running the benchmark in a colder room, or running the entire benchmark at a lower CPU frequency.

Similarly, notice how the red histogram above has a long right tail with a few high measures. Graphing the benchmark in iteration order, we see the following:

This looks like a warm-up effect, where the first iterations are slower, due to e.g. cold caches or JIT compilation. To work around this, we could install the app with full AOT compiling and perform warm-up iterations.

🤨 Should we leave these sources of variation in to keep the benchmarks realistic?

No. We evaluate the impact of the change as precisely as we can by removing as many variables as we can. If we want to measure performance with & without thermal throttling, or cold start performance, we can also test for those as separate benchmarks (e.g. keep the code identical and run the benchmark with and without thermal throttling to evaluate the impact that thermal throttling has on runtime performance).

Normal distribution

As we remove all identified sources of variation, the benchmark histograms start to look more and more like a bell curve, a graph depicting the normal distribution:

A normal distribution is a parametric distribution, i.e. a distribution that is based on a mathematical function whose shape and range are determined by distribution parameters. A normal distribution is symmetric and centered around its mean (which is therefore equal to its median), and the spread of its tails is defined by a second parameter, its standard deviation.

Comparing two normal distributions with a roughly equal standard deviation is easy: one distribution is simply shifted. Since they're symmetric around their mean we can just compute the difference between the two means.

However, we're not certain that the underlying distributions are normal, we can only estimate the likelihood that they're normal based on the sample data we got from the benchmark.

Normality fitness test

There are many popular normality tests to assess the probability that a set of sample measures follows a normal distribution. Unfortunately, the golden standard to this day seems to be the eyeball test, i.e. generate a histogram or a Q-Q plot and see if it looks normal. The eyeball test is hard to automate 😏.

The Comparisons of various types of normality tests study concludes that selecting the best fitness tests depends on the distribution shape:

For symmetric short-tailed distributions, DAgostino and ShapiroWilk tests have better power. For symmetric long-tailed distributions, the power of JarqueBera and DAgostino tests is quite comparable with the ShapiroWilk test. As for asymmetric distributions, the ShapiroWilk test is the most powerful test followed by the AndersonDarling test.

The DatumBox Framework provides a ShapiroWilk.test() function that we can use to assess if the measures are not normally distributed, in which case we shouldn't use the benchmark results as is and need to perform additional work to fix error sources.

// alpha level (5%): probability of wrongly rejecting the hypothesis that the distribution is normal (null hypothesis).val alphaLevel = 0.05val rejectNullHypothesis = ShapiroWilk.test(FlatDataCollection(distribution), alphaLevel)if (rejectNullHypothesis) {  error("Distribution failed normality test")}

Sample mean difference

In the example above, the data for the first benchmark was generated by sampling two normal distributions, one with a mean of 242 ms and the other with a mean of 212 ms, so the real mean difference was 30 ms. However, the first sample distribution has a mean of 244 ms and the second has a mean of 209 ms, so the sample mean difference is 35 ms, 17% higher.

As you can see, even if the likelihood of normality is high enough, we don't know the actual real means of the underlying distributions, so we can't compute the exact difference between the two real means, and the difference between the two sample means can be significantly off. However, we can compute a likely interval for where the mean difference falls.

Confidence interval for a difference between two means

It's formula time (source)! If the following is true:

Both benchmarks yield sample data that follows a normal distribution.
Each benchmark has more than 30 iterations.
The variance of one benchmark is no more than double the variance of the other benchmark.

Then the confidence interval for the difference of means between the two benchmarks can be computed as:

The confidence interval is the difference between the means, plus minus the margin of error.
z is the Z score.
Sp is the pooled estimate of the common standard deviation, computed as:

Note: if the confidence interval crosses 0 (i.e. the interval goes from a positive value to a negative value) then the benchmarks do not demonstrate any impact for the change.

In this example above, the real mean difference from normal distributions used to generate sample data was 30 ms. The sample mean difference was 35 ms, but if we sample again we'll get another mean difference. The formula shared above allows us to say that the 95% confidence interval of the mean difference is an improvement of somewhere between 24.92 ms and 45.10 ms. If we run the two benchmarks 100 times, there's a 95% probability that the sample mean difference will fall in that interval.

In other words, we're fairly confident that the real mean difference is somewhere between 24.92 ms and 45.10 ms. But we can't get any more precise than that unless we remove the sources of noise or increase the number of iterations by the square of the target increase. E.g. if we wanted to divide the confidence interval range by 3, we'd need to multiply the number of iterations by 9, from 100 to 900 iterations 🤯.

Coefficient of variation check

The confidence interval grows linearly with the standard deviation, so if we want our benchmarks to be conclusive we need to keep the standard deviation in check. One way to do this is to compute the Coefficient of Variation (standard deviation divided by mean) for each benchmark and ensure that it's below a threshold.

Putting it all together

I created a Google Spreadsheet template that does most of the stats math for you (except the Shapiro-Wilk normality test) so that you can easily compute the impact of a change and the validity of your benchmarks. Click USE TEMPLATE and paste the iteration results of each benchmark runs.

What if the benchmarks aren't normally distributed?

Remove sources of variation

You can fix a benchmark distribution that doesn't look normal by fixing sources of variation. The simplest way to do this is to open the perfetto trace for the slowest iteration and the faster interaction, figure out what's causing the difference and fix it.

Remove outliers

If we cannot isolate & remove error sources, we can try to eliminate them by cleaning up our data set, in 2 ways:

Implement a sliding window such as only the last N iterations of a run are considered. This helps with sources of variation like classloading or JIT compiling. This is effectively the same as performing a benchmark warm-up.
Eliminating outliers
- Tukey's fences: removing values that are below Q1 - 1.5 IQR or above Q3 + 1.5 IQR. Note: Q1 = p25, Q3 = p75, IQR = Q3 - Q1.
- Another approach is to discard outliers that are more than ~2 standard deviations from the mean.
- This can work if the distribution does otherwise have a normal shape.

Generally speaking, you're better off investigating and finding the root cause rather than blindly removing outliers.

Try a log-normal distribution fit

From Scientific Benchmarking of Parallel Computing Systems:

Many nondeterministic measurements that are always positive are skewed to the right and have a long tail following a so-called log-normal distribution. Such measurements can be normalized by transforming each observation logarithmically. Such distributions are typically summarized with the log-average, which is identical to the geometric mean.

I haven't spent too much time looking into this as our distributions pass normality tests.

KolmogorovSmirnov test

From Wikipedia:

The KolmogorovSmirnov test is a nonparametric test of the equality of continuous, one-dimensional probability distributions that can be used to compare a sample with a reference probability distribution (one-sample KS test), or to compare two samples (two-sample KS test).

We can leverage this test to compute the probability that our two benchmarks are from the same probability distribution, to establish whether a change has a statistically significant impact on performance. However, this won't tell us anything about the size of that impact.

Central Limit Theorem

From Wikipedia:

The central limit theorem (CLT) establishes that, in many situations, for identically distributed independent samples, the standardized sample mean tends towards the standard normal distribution even if the original variables themselves are not normally distributed.

When we run a benchmark several times and keep the mean for each run, that mean will be normally distributed around the real mean of the underlying benchmark distribution. If we do this before and after the change, we now have 2 normal distributions of means.

We can leverage this to detect regressions by applying a two-sample t-test which will tell us the probability that the two normal distributions of means have the same mean. If they don't, then we can conclude that the underlying benchmark distributions have a different mean, i.e. that there is a change, but we can't conclude much in terms of the impact of the change.

This approach is what Google has been recommending in an article frequently referenced by Android developers: Fighting regressions with Benchmarks in CI. The article calls it "step-fitting of results" and doesn't mention the CLT or two-sample t-tests, probably because the technique was copied over from Skia Perf.

If you assume that sources of variation are a fact of life and that you won't be able to fix them and consistently get normal distributions with stable variance, then this approach is probably your best option.

Running N benchmarks before and after a change is a lot more expensive, so the suggested approach is to run a single benchmark per change, wait until N/2 changes have landed, then separate the N benchmarks around the change in a before and after group. Alternatively, you could leverage bootstrapping.

Conclusion

Jetpack Macrobenchmark runs are non-deterministic. Don't look at aggregate values such as min, max, mean or median.
Check that the benchmark measures follow a normal distribution.
- If not, find the root cause. Common fixes:
  - Killing / uninstalling unrelated apps.
  - Setting the device to Airplane Mode during the measured portion.
  - Full AOT compiling of the APK.
  - Locking CPU frequencies.
  - Stable room temperature.
  - Plugging in the device to power.
  - Killing Android Studio and anything that might run random ADB commands while profiling.
Compute the confidence interval for a difference between two means (see this Google Spreadsheet template), that's the result you want to share. Remember to share just the interval and hide the specific sample mean difference from the 2 benchmarks you ran, as it has no meaning.

Callback leaks: cancel your Picasso requests!

Pierre-Yves Ricau — Tue, 17 Jan 2023 21:06:14 GMT

👋 Hi, this is P.Y., I work as an Android Distinguished Engineer at Block. Every month I organize an internal Ensemble Leak Hunting session where we look at the top leaks reported by LeakCanary to learn how to read and investigate leak traces. This is a write-up of our latest investigation which surfaced a common mistake when using Picasso.

Before we start, you might want to get a quick refresher on how to read leak traces. In this article, we'll find the root cause of the leak, discuss how to fix it, and what could change in Picasso to help avoid this common mistake.

Here's the leak trace:

 GC Root: System class com.example.PicassoHolder class    Leaking: NO (a class is never leaking)     static PicassoHolder.singleton com.squareup.picasso3.Picasso instance     Picasso.targetToAction              ~~~~~~~~~~~~~~ java.util.WeakHashMap instance     WeakHashMap.table                  ~~~~~ java.util.WeakHashMap$Entry[] array     WeakHashMap$Entry[0]                       ~~~ java.util.WeakHashMap$Entry instance     WeakHashMap$Entry.value                        ~~~~~ com.squareup.picasso3.ImageViewAction instance     ImageViewAction.callback                      ~~~~~~~~ com.example.ProfileLayout$loadImage$1 instance    Anonymous class implementing com.squareup.picasso3.Callback     ProfileController$loadImage$1.this$0                                    ~~~~~~ com.example.ProfileLayout instance     Leaking: YES (ObjectWatcher was watching this because ProfileLayout received View#onDetachedFromWindow() callback)

Picasso singleton

 GC Root: System class com.example.PicassoHolder class    Leaking: NO (a class is never leaking)     static PicassoHolder.singleton com.squareup.picasso3.Picasso instance...

At the top of the leak trace, a Picasso instance. We know that Picasso instances are long-lived singletons, meant to stay around as UI comes and goes. So we know this is legit and we can move the investigation further down in the leak trace.

`Picasso.targetToAction`

... com.squareup.picasso3.Picasso instance     Picasso.targetToAction              ~~~~~~~~~~~~~~ java.util.WeakHashMap instance     WeakHashMap.table                  ~~~~~ java.util.WeakHashMap$Entry[] array     WeakHashMap$Entry[0]                       ~~~ java.util.WeakHashMap$Entry instance     WeakHashMap$Entry.value                        ~~~~~ com.squareup.picasso3.ImageViewAction instance...

Picasso.targetToAction is a WeakHashMap of target keys (i.e. views) to action values (i.e. what to load on those views). WeakHashMap has weak keys and strong values, i.e. keys are held by weak references, and when a weak reference to a key clears, its entry is removed. So Picasso.targetToAction holds weak references to the target views & strong references to the corresponding actions.

`ImageViewAction.callback`

... com.squareup.picasso3.ImageViewAction instance     ImageViewAction.callback                      ~~~~~~~~ com.example.ProfileLayout$loadImage$1 instance    Anonymous class implementing com.squareup.picasso3.Callback     ProfileController$loadImage$1.this$0                                    ~~~~~~ com.example.ProfileLayout instance     Leaking: YES (ObjectWatcher was watching this because ProfileLayout received View#onDetachedFromWindow() callback)

ImageViewAction.callback is a Picasso action that holds a custom callback created in ProfileLayout.loadImage(). We can see that this custom Picasso callback is an anonymous class that has a reference to its outer class, ProfileLayout. We know ProfileLayout has been detached for at least 5 seconds as that's LeakCanary's minimum trigger duration. The ProfileLayout.loadImage() code shows that the callback sets a custom background color when the image fails to load:

fun loadImage() {  picasso.load(imageUri)    .into(      imageView,      object : Callback {        override fun onSuccess() = Unit        override fun onError(t: Throwable) {          // Set dark background if image fails to load          imageView.setBackgroundColor(backgroundColor)        }      }    )}

Looking at the Picasso source code we can see that Picasso.targetToAction entries are removed when a request completes, so we can conclude that the image loading request was still in flight and that ProfileLayout was detached and prevented from being garbage collected by that request in flight.

Fix: cancelation

It's a common mistake when using Picasso: image-loading requests should be canceled if the UI goes away before the request completes, otherwise, they'll keep going and consume resources until success or failure, which could take a long time on a slow network.

We forgot to cancel the request! Let's fix that:

override fun onDetachedFromWindow() {  super.onDetachedFromWindow()  picasso.cancelRequest(imageView)}

This fixes the leak, yay! Alternatively, we could also use a tag.

Leaks everywhere?

Forgetting to cancel Picasso requests when the UI goes away is a fairly common mistake. How comes leaks don't show up more often?

First, Picasso automatically cancels any previous request when loading a new image on the same image view. This helps with adapter views: when a list item view gets recycled and bound to a new row, a new request is started for that view and the previous one is automatically canceled.

Second, even if we forget to cancel a Picasso request, that will only trigger a leak when using a custom callback with a strong reference to the detached ImageView. How comes?

`WeakHashMap`

Picasso.targetToAction is a WeakHashMap that holds weak references to the target views & strong references to the corresponding actions.

ImageViewAction itself (source) holds a weak reference to its target ImageView, and a strong reference to its optional Callback. So when the custom callback isn't set, ImageViewAction doesn't hold any strong reference to the view:

internal class ImageViewAction(  picasso: Picasso,  target: ImageView,  data: Request,  var callback: Callback?) : Action(picasso, data) {  private val targetReference = WeakReference(target)

As you can see, so far there are no strong references to any view. This means that the default usage of Picasso will not introduce leaks, even if we forget to cancel requests when UI goes away.

Now let's add a custom callback with a strong reference to a view:

Note that if we replaced the callback strong reference with a weak reference, there would be no strong reference holding the callback in memory and it would be garbage collected before the request completes, even when the view is still attached. That's not what we want. So we do need the callback strong reference, which unfortunately means we have a strong reference path to ProfileLayout and its associated view hierarchy, which causes a leak.

This is exactly what the WeakHashMap documentation warns us about:

The value objects in a WeakHashMap are held by ordinary strong references. Thus care should be taken to ensure that value objects do not strongly refer to their own keys, either directly or indirectly, since that will prevent the keys from being discarded.

`View.setTag(int, Object)`

Ideally Picasso would have a consistent memory behavior whether or not callback is set. One way to do that is to store the ImageViewAction directly in the ImageView, as a view tag. That way, Picasso.targetToAction would only hold a weak reference to the ImageView and as soon as the view is detached it becomes unreachable:

Auto cancel on detach

Alternatively, Picasso could set a detach listener on the ImageView and auto cancel the requests on detach. This would be best for saving resources but might create surprises if e.g. the app is preloading an image into a detached view.

That's all for now, hope you enjoyed reading this!

Header image generated by DALL-E, prompt: "image inspired by the style of Pablo Picasso, depicting a leaky faucet with bold, abstract shapes and bold colors".

Let's investigate a Gradle IntelliJ memory leak!

Pierre-Yves Ricau — Wed, 12 Oct 2022 17:26:46 GMT

👋 Hi, this is P.Y., I work as an Android Distinguished Engineer at Block. This article shares a team investigation by Tony Robalik, Pablo Baxter, Roger Hu and myself into a recent Gradle / IntelliJ memory leak.

On September 29th, Tony Robalik reaches out to our friends at Gradle to report memory issues with the Gradle process when importing a project in IntelliJ IDEA. The heap size keeps climbing to new heights, reaching 60+ GB! Tony writes:

Normally, after I start another build, the daemon gives up most of the memory it had used in the first build, i.e. it takes until that moment for the GC to run. In the past, I've been able to force the gc to run with jcmd GC.run and get my memory back or just run a simple build like help. However, right now, that's not happening.

Dominators

The Java heap is an object graph. One useful tool we can leverage from graph theory is something called the dominator tree:

A node d dominates a node n if every path in the object graph from GC roots to n must go through d.

In practice, the dominator tree provides us with the list of biggest objects sorted by retained size. The retained size is the sum of the size of all the objects that would become unreachable if the dominator object was unreachable.

Tony takes a heap dump of the Gradle process and shares a screenshot from the Biggest Objects - Dominators tab in YourKit:

We immediately notice that 95% of the 44 GB heap is retained by java.lang.ref.Finalizer, which means, as YourKit gently points out, that the memory is retained by an object that is pending finalization.

Pending Finalization

Once an object is unreachable, it can be garbage collected and its memory reclaimed. If that objects implements the finalize() method, then that method must be called before garbage collection. Once objects with a finalize() are detected as unreachable, they're put in a finalizer queue and are in a "pending finalization" state until finalize() is called.

Here we can see that the lowest dominator that retains most of the memory is ProjectImportActionWithCustomSerializer. It is unreachable & transitively pending finalization: even though it has no finalize() method, it is dominated by an object that is pending finalization, which means it is still indirectly reachable by that object which itself can still run code in its finalize() method. This means ProjectImportActionWithCustomSerializer cannot be garbage collected until its dominator is finalized.

I Am GCroot 🌳

To understand which references exactly are keeping ProjectImportActionWithCustomSerializer in memory, I ask Tony to compute the shortest paths from GC Roots in YourKit:

Here's how to read this trace:

At the top is ProjectImportActionWithCustomSerializer. We want to understand why it's retained in memory.

At the bottom is a GC root, here a JNIGlobal that keeps a reference to CleanerImpl$PhantomCleanableRef.

From the bottom to the top we see the chain of references that is retaining ProjectImportActionWithCustomSerializer.

The bottom part of the trace is the finalizer queue. The finalizer queue is implemented as a doubly linked list, where each Finalizer instance has a reference to the previous entry (prev) and next entry (next) in the finalizer queue, as well as a reference to the object that is pending finalization (referent).

As we move towards the top of the trace, we see that a Finalizer has a referent field referencing Executors$FinalizableDelegatedExecutorService. This is the object that implements finalize() and is pending finalization.

    private static class FinalizableDelegatedExecutorService            extends DelegatedExecutorService {        FinalizableDelegatedExecutorService(ExecutorService executor) {            super(executor);        }        @SuppressWarnings("deprecation")        protected void finalize() {            super.shutdown();        }    }

As you can see, FinalizableDelegatedExecutorService is an ExecutorService that automatically shuts down the thread pool when it becomes unreachable. Developers are expected to shut down thread pools manually when they stop being in use, but sometimes mistakes happen and this is a safety net.

The Executors$FinalizableDelegatedExecutorService.e field references a ThreadPoolExecutor instance.

The ThreadPoolExecutor.threadFactory field references a ProjectImportAction$1 instance. So we can assume ProjectImportAction$1 is an anonymous class (because its name is $1) that implements ThreadFactory.

The ProjectImportAction$1.this$0 field references the ProjectImportActionWithCustomSerializer instance. In Java, anonymous classes have a hidden reference to their outer class, compiled as a field name this$0.

Reveal

At this point we can conclude that ProjectImportActionWithCustomSerializer is a class that extends ProjectImportAction, and that ProjectImportAction defines an anonymous class that implements ThreadFactory which is then passed to a ThreadPoolExecutor.

Let's look at the ProjectImportAction sources:

  myConverterExecutor =  Executors.newSingleThreadExecutor(    new ThreadFactory() {      @Override      public Thread newThread(@NotNull Runnable runnable) {        return new Thread(runnable, "idea-tooling-model-converter");      }    }  ); }

ProjectImportAction creates a single threaded executor, and passes in a ThreadFactory in order to set the thread name. That anonymous ThreadFactory doesn't actually use the hidden this$0 reference to its ProjectImportAction outer class, unfortunately the Java compiler (unlike Kotlin) will still add that reference.

If we extract that anonymous class into a static class, this this$0 reference will disappear and the ProjectImportAction implementation will not be retained while the thread pool executor is pending finalization.

private static final class SimpleThreadFactory implements ThreadFactory {  @Override  public Thread newThread(@NotNull Runnable runnable) {    return new Thread(runnable, "idea-tooling-model-converter");  }}

Pablo Baxter files a bug and opens a pull request which is swiftly merged into the IntelliJ master branch.

Roger Hu & Tony Robalik apply this fix locally by patching the gradle-tooling-extension-api.jar jar with Recaf and confirm that the memory is now properly reclaimed 🎉 !

The git history shows that this bug was introduced in IntelliJ IDEA 2022.1 221.4165.146 (that version is the base for Android Studio Electric Eel Canary 5). Last week, folks from JetBrains said they would "apply the changes and include it in next EAP of 2022.3 and next bugfix release of 2022.2 branch" while folks from Google said "we will cherry pick in EE". I love this quick turnaround!

Are we done though?

Wait a minute, we fixed the leak, but why was the thread pool executor pending finalization for such a long time? Tony reproduces the bug a few more times and takes a peak at the finalization queue. It turns out there's a ZipEntry for a jar that is systematically hanging out near the head of the finalization queue. ZipEntry calls close() when finalized. We haven't quite figured out why close() takes so long, so we're leaving that as an exercise for you, dear reader 😘.

Header image generated by DALL-E, prompt: "a photo of canary flying holding an elephant in the air".

Using an Activity from a Hilt ViewModel

Pierre-Yves Ricau — Wed, 07 Sep 2022 21:59:49 GMT

👋 Hi, this is P.Y., I work as an Android Distinguished Engineer at Block. This blog shares a bit of hackery to be able to access an activity instance within a Hilt ViewModel. If you come up with other interesting ways to do this, let me know on Twitter! If you're mad because you think I'm encouraging bad practices, try yoga.

I need that god object

I've been playing with Hilt's support for view models in a small app, and needed my view model to start a sharing activity:

@HiltViewModelclass MyCuteLittleViewModel @Inject constructor() : ViewModel() {  // ... some code that invokes share()  private fun share(content: String) {    val intent = Intent(Intent.ACTION_SEND).apply {      type = "text/plain"      putExtra(Intent.EXTRA_TEXT, content)    }    val chooserIntent = Intent.createChooser(intent, "Share with")    val activity = TODO("Need an activity here!")    activity.startActivity(chooserIntent)  }}

View models are retained across activity config changes, so the activity isn't injectable, which makes total sense: injecting the activity in a view model would lead to leaks on config changes.

Unfortunately, the Activity class provides a lot of utility so it's fairly common to need access to it (see God object).

Most online resources recommend moving the code to the activity or a collaborator that has access to it, have that listen to events that indicate the action to perform, then send the events from the ViewModel.

You're not the boss of me.

I don't care for these "best" practices. I want that code right there where it's used, and I don't want unnecessary decoupling (also I pinky swear I'll write unit tests tomorrow).

Anyway, here's a little bit of Hilt hackery to support this without changing any Activity code.

First, let's create a CurrentActivityProvider scoped to @ActivityRetainedScoped, which will be in charge of holding the current activity instance:

@ActivityRetainedScopedclass CurrentActivityProvider @Inject constructor() {  // TODO Set and clear currentActivity  private var currentActivity: Activity? = null  fun  withActivity(block: Activity.() -> T) : T {    checkMainThread()    val activity = currentActivity    check(activity != null) {      "Don't call this after the activity is finished!"    }    return activity.block()  }}

Then we can use it as needed. Notice that withActivity() makes it slightly harder to store the activity instance in the wrong place accidentally:

@HiltViewModelclass MyCuteLittleViewModel @Inject constructor(  private val activityProvider: CurrentActivityProvider) : ViewModel() {  private fun share(content: String) {    // ...    activityProvider.withActivity {      startActivity(chooserIntent)    }  }}

Now we need to set up CurrentActivityProvider.currentActivity for each ActivityRetainedComponent scope. For that, we create an entry point scoped to the activity (ActivityComponent) which will provide access to the CurrentActivityProvider (which lives in a parent ActivityRetainedComponent scope). The entry point:

@EntryPoint@InstallIn(ActivityComponent::class)interface ActivityProviderEntryPoint {  val activityProvider: CurrentActivityProvider}

Now we can retrieve the scoped activity provider from an activity instance with:

val entryPoint: ActivityProviderEntryPoint =  EntryPointAccessors.fromActivity(this)val activityProvider = entryPoint.activityProvider

This only works if the activity is Hilt-aware, so let's check that it implements GeneratedComponentManagerHolder (🤫 it's in Hilt's internal package but it's also public so 🤷) and let's make a small Activity.withProvider() utility for that:

activity.withProvider { activityProvider ->  // TODO}    private fun Activity.withProvider(      block: CurrentActivityProvider.() -> Unit    ) {      if (this is GeneratedComponentManagerHolder) {        val entryPoint: ActivityProviderEntryPoint =          EntryPointAccessors.fromActivity(this)        val provider = entryPoint.activityProvider        provider.block()      }    }

Note: Android apps can have multiple activities in created state at the same time. The code here supports that by relying on the ActivityRetainedComponent scope which will give us a new component for each activity in the stack, but still return the same logical component when an activity is recreated through a config change.

Now let's add methods to update the activity reference on lifecycle changes:

@ActivityRetainedScopedclass CurrentActivityProvider @Inject constructor() {  private var currentActivity: Activity? = null  fun  withActivity(block: Activity.() -> T) : T { /* ... */  }  companion object {    private fun Activity.withProvider(      block: CurrentActivityProvider.() -> Unit    ) { /* ... */ }    fun onActivityCreated(activity: Activity) {      activity.withProvider {        currentActivity = activity      }    }    fun onActivityDestroyed(activity: Activity) {      activity.withProvider {        if (currentActivity === activity) {          currentActivity = null        }      }    }  }}

And finally let's hook the lifecycle callbacks from my Application class:

@HiltAndroidAppclass MyCuteLittleApp : Application() {  override fun onCreate() {    super.onCreate()    registerActivityLifecycleCallbacks(object : ActivityLifecycleCallbacks {      override fun onActivityCreated(        activity: Activity,        savedInstanceState: Bundle?      ) {        CurrentActivityProvider.onActivityCreated(activity)      }      override fun onActivityDestroyed(activity: Activity) {        CurrentActivityProvider.onActivityDestroyed(activity)      }    })  }}

With this, we can now inject CurrentActivityProvider in any ActivityRetainedComponent scope (as well as lower scopes) and easily access the activity with activityProvider.withActivity().

Testing testing 1 2 3 🎤

To make MyCuteLittleViewModel easier to test we can move the sharing responsibility to an injected collaborator, e.g. Sharer:

interface Sharer {  fun share(content: String)}class ActivitySharer @Inject constructor(  private val activityProvider: CurrentActivityProvider) : Sharer {  override fun share(content: String) {    // ...    activityProvider.withActivity {      startActivity(chooserIntent)    }  }}@Module@InstallIn(ActivityRetainedComponent::class)interface SharerModule {  @Binds fun bindSharer(sharer: ActivitySharer): Sharer}

Header image generated by DALL-E, prompt: "sword held by an Android with glasses, 3d render"

WhileSubscribed(5000)

Pierre-Yves Ricau — Tue, 30 Aug 2022 23:01:56 GMT

👋 Hi, this is P.Y., I work as an Android Distinguished Engineer at Block. I know nothing about Compose so please let me know if I messed up on Twitter!

I've been hacking on a new LeakCanary standalone app to visualize leaks, which is going to be 100% Compose. As I started following tutorials and looking at sample apps, I noticed a strange pattern in both Now In Android and tivi:

class AuthorViewModel : ViewModel() {    val authorUiState: StateFlow = authorUiStateStream()        .stateIn(            scope = viewModelScope,            // {-_-} 5000?!            started = SharingStarted.WhileSubscribed(5_000),            initialValue = AuthorUiState.Loading        )}@Composablefun AuthorScreen(authorUiState: AuthorUiState) {  when (authorUiState) {    AuthorUiState.Loading -> {      // ...    }  }}

I wonder what this WhileSubscribed(5_000) is all about! Let's look at the source:

Using WhileSubscribed() without any timeout would make sense here, i.e. we want to keep the sharing coroutine running as long as there's a UI consuming it. When that UI goes away, why would we want to wait an additional 5 seconds before we stop sharing?

I find more details in a post from the Android Developers blog:

Tip for Android apps! You can use WhileSubscribed(5000) most of the time to keep the upstream flow active for 5 seconds more after the disappearance of the last collector. That avoids restarting the upstream flow in certain situations such as configuration changes. This tip is especially helpful when upstream flows are expensive to create and when these operators are used in ViewModels.

Surprise Surprise, it's config changes once again, the bane of my Android career...

On Twitter Gabor Varadi pointed out that CoroutineLiveData has the same 5000 ms default timeout.

In my experience, introducing random delays does not properly solve whatever underlying issue I'm running into. I'm also not comfortable with the idea that every ViewModel exposing state as a flow should now have this weird 5000 magic number baked in.

What happens when we remove the delay?

class AuthorViewModel : ViewModel() {    val authorUiState: StateFlow = authorUiStateStream()        .stateIn(            scope = viewModelScope,            // Timeout is now 0!            started = SharingStarted.WhileSubscribed(),            initialValue = AuthorUiState.Loading        )}@Composablefun AuthorScreen(authorUiState: AuthorUiState)// ...

When I rotate the screen, the sharing coroutine is stopped & restarted, whereas previously it stayed in started state and didn't restart the upstream flow.

Here's why: as part of a configuration change, the activity is destroyed. Then the activity is recreated, and resumed. Somewhere during that recreation the scope that AuthorScreen used to collect authorUiState completes, which brings the subscription count to 0, and stops the state sharing. And then on the first frame post resume, sharing restarts, which re-triggers the authorUiStateStream() but also immediately reuses the latest cached value (WhileSubscribed.replayExpiration defaults to never).

My next idea was to use SharingStarted.Lazily instead, which starts sharing on subscribe and never stops:

class AuthorViewModel : ViewModel() {    val authorUiState: StateFlow = authorUiStateStream()        .stateIn(            scope = viewModelScope,            // Start on subscribe and never stop!            started = SharingStarted.Lazily,            initialValue = AuthorUiState.Loading        )}@Composablefun AuthorScreen(authorUiState: AuthorUiState)// ...

The state is shared with the viewModelScope scope so the sharing will stop as soon as the view model is cleared.

This works, but there's one limitation: the upstream flow stays active for as long as the ViewModel is around, which is usually tied to an Activity or Fragment lifecycle. If a state flow is tied only to a small part of the UI that then goes away and unsubscribes, we'll be keeping that flow active for no good reason.

The same is true of SharingStarted.WhileSubscribed(5_000) of course. That 5_000 timeout is meant as "wait long enough to be sure that if we went through a config change we'd have time to resubscribe before we stop sharing". Unfortunately, this also means that whenever that state is unsubscribed we wait an additional 5 seconds before stopping the upstream flow.

Can we create a SharingStarted that behaves like SharingStarted.WhileSubscribed but will also wait for config changes to settle and for the UI to have a chance to resubscribe before stopping the upstream flow? Let's call it WhileSubscribedOrRetained:

class AuthorViewModel : ViewModel() {    val authorUiState: StateFlow = authorUiStateStream()        .stateIn(            scope = viewModelScope,            started = WhileSubscribedOrRetained,            initialValue = AuthorUiState.Loading        )}@Composablefun AuthorScreen(authorUiState: AuthorUiState)// ...

Yes we can! I was chatting with Romain Guy and he jokingly suggested:

Little did he know... that's exactly what I did, except with even more posting!

The implementation is inspired from WhileSubscribed. Thanks Bill Phillips for suggesting CompletableDeferred:

object WhileSubscribedOrRetained : SharingStarted {  private val handler = Handler(Looper.getMainLooper())  override fun command(subscriptionCount: StateFlow<Int>): Flow = subscriptionCount  .transformLatest { count ->    if (count > 0) {      emit(SharingCommand.START)    } else {      val posted = CompletableDeferred<Unit>()      // This code is perfect. Do not change a thing.      Choreographer.getInstance().postFrameCallback {        handler.postAtFrontOfQueue {          handler.post {            posted.complete(Unit)          }        }      }      posted.await()      emit(SharingCommand.STOP)    }  }  .dropWhile { it != SharingCommand.START }  .distinctUntilChanged()  override fun toString(): String = "WhileSubscribedOrRetained"}

Wait, what?!

Choreographer.getInstance().postFrameCallback {  handler.postAtFrontOfQueue {    handler.post {      posted.complete(Unit)    }  }}

Ok so this is the fun part: the subscriptionCount updates are dispatched async on the main thread. When a config change occurs, the activity is destroyed, recreated and resumed. As part of the teardown, the subscription count decrement event is posted and runs right after Activity.onResume() but also right before the first frame renders. So we can't stop the subscription right there, as the first composition (where we resubscribe) will happen as part of the first frame.

The resubscription happens during the first frame callback, and the subscription count increment is posted and runs after the first frame.

Last but not least, composition runs during the measure part of a traversal, whereas callbacks manually posted with Choreographer.getInstance().postFrameCallback() run before that, during the animation phase.

So, this is what we do:

We receive the decrement to 0 during a post that runs in between Activity.onResume() and the first frame.
We schedule a frame callback with Choreographer.getInstance().postFrameCallback {} so that we can be called during the animation part of the first frame.
We know that a bit later during that frame callback the resubscription will happen and will trigger a post. We want to run code after that post runs.
So we enqueue a post at the front of the main thread queue with handler.postAtFrontOfQueue {}, which runs immediately after the frame callback.
And from that we enqueue a post at the back of the main thread queue with handler.post{}, which is guaranteed to run after the subscription count increase notification.
When the subscription count increase notification runs, transformLatest ensures that the work to stop (which was suspended with posted.await()) is canceled.
If the subscription count doesn't increase, we proceed with stopping the sharing.

Phew, so much trampolining!

The result is nice though: the flow stays active during config changes, and stops immediately when the subscribed UI goes away.

The root cause: state lifecycle

We have state presented on an active screen. We want the lifecycle of that state to be tied to when that screen is visible, and survive config changes.

Our core issue is that we're shoving state in a ViewModel that has a longer lifecycle than what we want the state to have.

The fix is to use fine grained scopes that are tied to the app navigation state ("what am I currently doing?") which does not get torn down on config changes. If you use ViewModels from Jetpack navigation, you already get that, so you can replace WhileSubscribed(5000) with Lazily!

Generated by DALL-E, prompt: "An Android jumping on a trampoline, vaporware, low angle"

Of sharks and heaps of sticky marshmallows

Pierre-Yves Ricau — Thu, 07 Apr 2022 19:57:56 GMT

👋 Hi, this is P.Y., I work as an Android Distinguished Engineer at Block, the rockey company formerly known as Square. I spend a lot of time focusing on performance and try to share my experience with deep-dive blog posts. I hope you like this one, let me know on Twitter!

Introduction

At Square, we run LeakCanary in CI after every UI test thanks to the DetectLeaksAfterTestSuccess test rule:

class CartTest {  @get:Rule  val rules = RuleChain.outerRule(DetectLeaksAfterTestSuccess())    .around(ActivityScenarioRule(CartActivity::class.java))  @Test  fun addItemToCart() {    // ...  }}

Last week a colleague noticed that our Android CI heap analysis occasionally took several minutes. This blog is a deep dive based on my notes from the investigation!

GC Roots

We realize this is happening only on API 23 emulators (API 23 is Android 6, aka Android Marshmallow), and we can reproduce the issue locally. I add trace sections and hook up Perfetto while the test is running:

I immediately notice something weird: the heap size increases more than expected when LeakCanary starts its analysis. Could that be somehow related to the slow down?

I capture a heap dump while the heap analysis is running and look at instance counts sorted by total shallow size in YourKit Java Profiler:

The 4th entry is surprising: 1.3M instances of GcRoot.StickyClass. This class is a part of Shark, LeakCanary's heap dump parser. Here's how the YourKit doc describes GC roots:

The so-called GC (Garbage Collector) roots are objects special for garbage collector. Garbage collector collects those objects that are not GC roots and are not accessible by references from GC roots.
There are several kinds of GC roots. One object can belong to more than one kind of root. The root kinds are:
Class - class loaded by system class loader. Such classes can never be unloaded. They can hold objects via static fields.
...

Classes loaded by the system class loader are never garbage collected, so they stick around, and are therefore known as sticky classes. In a JVM, custom classes can be loaded and then unloaded, but on Android, they're never unloaded. So any loaded class stays in memory forever and acts as a GC root that holds static field references forever.

Shark parses a heap dump and keeps the list of all GC roots in memory, and that's usually a reasonably small list. 1.3M sticky class GC roots is not expected!

Shark

I decide to write ad hoc code with Shark to analyze an API 23 heap dump more systematically and compute aggregates. Let's start by printing the counts of GC Root by type:

// openHeapGraph() parses the heap dump file contentheapDumpFile.openHeapGraph().use { graph ->  // Grab all GC roots  val roots = graph.gcRoots  // Create a map of GC root type => count of that type  val counts = roots.countBy { it.javaClass }  // Turn the map into a list of entries, sorted by the counts.  val sortedCounts = counts.entries.sortedBy { -it.value }  println(counts.joinToString("\n"))}

class shark.GcRoot$StickyClass=1342062class shark.GcRoot$JavaFrame=807class shark.GcRoot$JniGlobal=402class shark.GcRoot$ThreadObject=56class shark.GcRoot$JniLocal=54class shark.GcRoot$NativeStack=53

As expected, we see 1.3M StickyClass GC roots. Other types of GC roots have reasonable counts. Do we have 1.3M classes, though?

heapDumpFile.openHeapGraph().use { graph ->  println("class count=${graph.classes.count()}")}

class count=52940

Okay, how do we go from 53K classes to 1.3M sticky class GC roots?

class StickyClass(override val id: Long) : GcRoot()

A sticky class GC root is solely defined by the id of the root object, so let's see if we have duplicate ids, and what objects these ids correspond to:

heapDumpFile.openHeapGraph().use { graph ->  // Grab all GC roots  val roots = graph.gcRoots  // Keey only sticky class gc roots  val stickyRoots = roots.filterIsInstance(StickyClass::class.java)  // Create a map of id => count of that id in the stickyRoots list  val stickyCounts = stickyRoots.countBy { it.id }  // Turn the map into a list of entries, sorted by the counts.  val sortedStickyCounts = stickyCounts.entries.sortedBy { it.value }  // Map the id to the actual object it references join into a string  val result = sortedStickyCounts.joinToString("\n") {    "${graph.findObjectById(it.key)}: ${it.value}"  }  println(result)}

...object array @318259200 of java.lang.Class[]: 27182primitive array @-1970475008 of int[]: 27235object array @325763072 of java.lang.Class[]: 27235primitive array @-1966432256 of int[]: 28152object array @319406080 of java.lang.Class[]: 28152primitive array @1879291968 of int[]: 28261object array @1879261584 of java.lang.Class[]: 28261primitive array @-1966821376 of int[]: 30331object array @319721472 of java.lang.Class[]: 30331

What?! primitive array, object array.. these aren't classes! There are 53001 distinct object ids referenced by sticky class GC roots, out of which 52939 point to classes and 62 point to int and object arrays.

Interestingly, these int arrays have a size very close to 65K. Notice the size of the top one: 65536. You might have seen that number before... in the multidex documentation:

The Dalvik Executable specification limits the total number of methods that can be referenced within a single DEX file to 65,536.

These non-class objects that sticky class GC roots are pointing to are objects referenced by DexCache.resolvedMethods, DexCache.resolvedFields and DexCache.resolvedTypes.

That's a little weird but ok. We still don't know why we have all the duplicated GC roots. From what I gather, the class table was maintained by class_linker.cc in Android M and apparently that changed in Android N, which seemingly fixed bugs related to visiting the same class roots over and over again.

Bugfix

I can quickly fix the increased memory usage in LeakCanary (PR) by introducing a set to ignore repeated sticky class entries:

Let's not forget to add a unit test:

@Test fun `duplicated StickyClass GC roots are deduplicated`() {  val className = StringRecord(id, "com.example.VeryStickyClass")  val loadClassRecord = LoadClassRecord(1, id, 1, className.id)  val classDump = ClassDumpRecord(    id = loadClassRecord.id,    stackTraceSerialNumber = 1,    superclassId = 0,    classLoaderId = 0,    signersId = 0,    protectionDomainId = 0,    instanceSize = 0,    staticFields = emptyList(),    fields = emptyList()  )  val stickyClassRecords = (1..5).map {    GcRootRecord(StickyClass(loadClassRecord.id))  }  val hprofRecords = listOf(className, loadClassRecord, classDump) +    stickyClassRecords  val bytes = hprofRecords.asHprofBytes()  val stickyClassRoots = bytes.openHeapGraph().use { graph ->    graph.gcRoots.filterIsInstance(StickyClass::class.java)  }  assertThat(stickyClassRoots).hasSize(1)  val stickyClassRoot = stickyClassRoots.first()  assertThat(stickyClassRoot.id).isEqualTo(loadClassRecord.id)}

Conclusion

I capture a perfetto trace running the same analysis before and after the change:

Before

After

🎉 Heap consumption is halved from max 260Mb to max 140Mb!

You know what, though? Our heap analysis in CI is still super slow on API 23. Something else is going on! Wouldn't that be a great follow-up article?

Cover image: Jelly Fish On Blue by Romain Guy.

Tracing main thread messages

Pierre-Yves Ricau — Thu, 27 Jan 2022 19:53:56 GMT

👋 Hi, this is P.Y., I work as an Android Staff Engineer at Block, the non-fungible company formerly known as Square. I spend a lot of time focusing on performance and try to share my experience with deep-dive blog posts. I hope you like this one, let me know on Twitter!

Summary

This blog shows how to see what the main thread is doing in Perfetto traces by leveraging a seldom-used Looper logger API.

Profiling with Perfetto

I use Perfetto tracing to get a good picture of the system-level & high-level app behavior. The Android Quickstart is a bit overwhelming, here's what I do:

# Prepare the deviceadb rootadb shell setprop persist.traced.enable 1# Download the record_android_trace scriptcurl -O https://raw.githubusercontent.com/google/perfetto/master/tools/record_android_tracechmod u+x record_android_trace# Capture a trace for com.example.myapp./record_android_trace -o trace_file.perfetto-trace -b 500mb \-a com.example.myapp sched freq idle am wm gfx view \binder_driver hal dalvik camera input resTrace started. Press CTRL+C to stop

I perform the interaction I'm profiling, then I press CTRL+C and the script automatically opens ui.perfetto.dev with a new trace.

Perfetto shows a lot of information! If you're lost, press ? to bring up the navigation help, and learn that you can move around the timeline with the WASD keys. Then open up the row with your app's package name to see its list of threads.

Here the main thread is shown on the first 2 rows (named after the app package name, com.example.myapp), and the render thread on the second 2 rows. For each thread, we see thread states (running, etc) in the first row, and then the system traces in the following row.

BALEETED!

Let's look at an example, a delete button that shows a confirm dialog.

findViewById

After recording a trace, this is what we see in Perfetto when the dialog is shown:

And then when the user confirms deletion, the dialog is dismissed and the toast is shown:

We can tell that view inflation is happening, followed by a Choreographer frame. But it's hard to grasp a good understanding of the details, as well as what happens during the gaps in between. Is there any way we can add more details?

Looper logging

As I wrote in 2013 in A journey on the Android Main Thread, the main thread has a dedicated Looper which is in charge of looping, i.e. running main thread messages serially forever.

Looper also has a seldom-used logging API: Looper.setMessageLogging():

Control logging of messages as they are processed by this Looper. If enabled, a log message will be sent to the logger at the beginning and end of each message dispatch, identifying the target Handler and message contents.

We can use it to log the messages that run on the main thread:

class ExampleApplication : Application() {  override fun onCreate() {    super.onCreate()    Looper.getMainLooper().setMessageLogging { log ->      Log.d("ExampleApplication", log)    }  }}

For each message posted to the main thread, we get a log before the message runs and a log after:

D ExampleApplication: >>>>> Dispatching to Handler (android.view.View  RootImpl$ViewRootHandler) {c93a855} android.view.View$PerformClick  @a6c9a1a: 0D ExampleApplication: <<<<< Finished to Handler (android.view.ViewRoot  Impl$ViewRootHandler) {c93a855} android.view.View$PerformClick  @a6c9a1a

Main thread message trace sections

We can leverage the Looper logging APIs to add a trace section for each main thread message. First, we add the AndroidX tracing library:

dependencies {  implementation "androidx.tracing:tracing-ktx:1.0.0"}

Then we can use the >>>>> / <<<<< suffix in the message to determine if the log marks the start or the end of a main thread message, and delegate to androidx.tracing.Trace accordingly:

class ExampleApplication : Application() {  override fun onCreate() {    super.onCreate()    Looper.getMainLooper().setMessageLogging { log ->      if (log.startsWith('>')) {        Trace.beginSection(log)      } else if (log.startsWith('<')) {        Trace.endSection()      }    }  }}

Unfortunately, this eventually crashes:

Process: com.example.myapp, PID: 15870java.lang.IllegalArgumentException: sectionName is too long    at android.os.Trace.beginSection(Trace.java:333)    at androidx.tracing.TraceApi18Impl.beginSection(TraceApi18Impl.java:49)    at androidx.tracing.Trace.beginSection(Trace.java:81)    at com.example.myapp.ExampleApplication$onCreate$1.println(ExampleApplication.kt:29)    at android.os.Looper.loop(Looper.java:183)    at android.app.ActivityThread.main(ActivityThread.java:7356)    at java.lang.reflect.Method.invoke(Native Method)    at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:492)    at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:930)

The AndroidX Trace.beginSection() javadoc is leaving out a crucial detail from the AOSP Trace.beginSection() javadoc:

sectionName may be at most 127 Unicode code units long.

Let's fix the crash:

class ExampleApplication : Application() {  override fun onCreate() {    super.onCreate()    Looper.getMainLooper().setMessageLogging { log ->      if (log.startsWith('>')) {        // Would be nice if AndroidX automatically truncated to 127        Trace.beginSection(log.take(127))      } else if (log.startsWith('<')) {        Trace.endSection()      }    }  }}

Let's look at how the trace changes:

Before

After

The result is useful, we have a lot less empty space and we know what work belongs to the same main thread message. However, every single section starts with >>>>> Dispatching to. The most useful information is the callback class name, but it's appending at the end of the string so the 127 character limit means the callback class name is often truncated.

Better section names

Let's look at how Looper assembles the log:

    private boolean loopOnce() {        Message msg = mQueue.next();        Printer logging = mLogging;        if (logging != null) {            logging.println(">>>>> Dispatching to " + msg.target + " "                    + msg.callback + ": " + msg.what);        }

The format for that string hasn't changed since the AOSP import commit in 2008. Let's parse the log string back into its components:

val logNoPrefix = log.removePrefix(">>>>> Dispatching to ")val indexOfWhat = logNoPrefix.lastIndexOf(": ")val indexOfCallback = logNoPrefix.indexOf("} ")val targetHandler = logNoPrefix.substring(0, indexOfCallback + 1)val callback = logNoPrefix.substring(indexOfCallback + 2, indexOfWhat)val what = logNoPrefix.substring(indexOfWhat + 2)

callback is the result of calling callback.toString() on the Runnable passed to Handler.post(), which most of the time returns the default Object.toString() implementations. Coroutine continuations override toString() to provide useful information about the callsite, but add a lengthy prefix so let's get rid of that:

val callback = logNoPrefix.substring(indexOfCallback + 2, indexOfWhat)  .removePrefix("DispatchedContinuation[Dispatchers.Main, Continuation at ")  .removePrefix("DispatchedContinuation[Dispatchers.Main.immediate, Continuation at ")

We can then reassemble the values into a better section label that puts the callback first:

private fun buildSectionLabel(log: String): String {  val logNoPrefix = log.removePrefix(">>>>> Dispatching to ")  val indexOfWhat = logNoPrefix.lastIndexOf(": ")  val indexOfCallback = logNoPrefix.indexOf("} ")  val targetHandler = logNoPrefix.substring(0, indexOfCallback + 1)  val callback = logNoPrefix.substring(indexOfCallback + 2, indexOfWhat)    .removePrefix("DispatchedContinuation[Dispatchers.Main, Continuation at ")    .removePrefix("DispatchedContinuation[Dispatchers.Main.immediate, Continuation at ")  val what = logNoPrefix.substring(indexOfWhat + 2)  return if (callback != "null") {    "$callback $targetHandler $what"  } else {    "$targetHandler $what"  }}

Now we clearly see the class name of callbacks posted to the main thread, e.g. View$PerformClick:

Cost of logging

This hack is really cool, but it also leads to the app doing extra string concatenation for every main thread message. We should only do this when we want to profile the app, e.g. by adding a runtime check.

Edit: as Chris Craik and John Reck reminded me (thanks!), we should also skip calling buildSectionLabel() when not actively tracing by checking Trace.isEnabled():

val debuggable = (applicationInfo.flags and ApplicationInfo.FLAG_DEBUGGABLE) != 0val profileable = Build.VERSION.SDK_INT >= 29 && applicationInfo.isProfileableByShellif (debuggable || profileable) {  Looper.getMainLooper().setMessageLogging { log ->    if (!Trace.isEnabled()) {      return@setMessageLogging    }    if (log.startsWith('>')) {      val label = buildSectionLabel(log)      Trace.beginSection(label.take(127))    } else if (log.startsWith('<')) {      Trace.endSection()    }  }}

Pooled lambda crash

If you run this code on Android 9, you will eventually run into a crash. The AOSP codebase supports lambdas and method references but uses recyclable anonymous functions via PooledLambda to avoid creating an extra instance for every lambda (example). It's a neat optimization!

When Looper concatenates the log string, it calls callback.toString(), and that callback might be a PooledLambda. Unfortunately, on Android 9 calling toString() on a lambda with 0 argument would throw an exception, which was fixed in Android 10.

There's no work around, we can't even catch the exception, the only fix is to disable our message tracing for Android 9 / API 28.

All together

This is the final code with everything put together. If you improve upon it, feel free to let me know on Twitter. Happy profiling!

package com.example.myappimport android.app.Applicationimport android.content.pm.ApplicationInfoimport android.os.Buildimport android.os.Looperimport androidx.tracing.Traceclass ExampleApplication : Application() {  override fun onCreate() {    super.onCreate()    val debuggable = (applicationInfo.flags and ApplicationInfo.FLAG_DEBUGGABLE) != 0    val profileable = Build.VERSION.SDK_INT >= 29 && applicationInfo.isProfileableByShell    if (Build.VERSION.SDK_INT != 28 && (debuggable || profileable)) {      Looper.getMainLooper().setMessageLogging { log ->        if (!Trace.isEnabled()) {          return@setMessageLogging        }        if (log.startsWith('>')) {          val label = buildSectionLabel(log)          Trace.beginSection(label.take(127))        } else if (log.startsWith('<')) {          Trace.endSection()        }      }    }  }  private fun buildSectionLabel(log: String): String {    val logNoPrefix = log.removePrefix(">>>>> Dispatching to ")    val indexOfWhat = logNoPrefix.lastIndexOf(": ")    val indexOfCallback = logNoPrefix.indexOf("} ")    val targetHandler = logNoPrefix.substring(0, indexOfCallback + 1)    val callback = logNoPrefix.substring(indexOfCallback + 2, indexOfWhat)      .removePrefix("DispatchedContinuation[Dispatchers.Main, Continuation at ")      .removePrefix("DispatchedContinuation[Dispatchers.Main.immediate, Continuation at ")    val what = logNoPrefix.substring(indexOfWhat + 2)    return if (callback != "null") {      "$callback $targetHandler $what"    } else {      "$targetHandler $what"    }  }}

Cover image: Lost at Sea by Romain Guy.

Fixing simpleperf broken records

Pierre-Yves Ricau — Fri, 21 Jan 2022 18:13:19 GMT

Summary

This blog shows how to fix simpleperf traces which are otherwise unusable because they include samples with truncated callchain roots. Read on to learn more about what these crazy words mean!

Profiling Android apps

As an Android developer, I have many tools available to profile your Android apps. I typically use:

System-level span based tools (systrace, Perfetto) to get a good picture of the system-level app behavior. I usually start there, to answer questions like "Are the app threads fully utilizing the CPUs or waiting for IO or IPC (aka binder calls)?" or "Are other apps running in parallel, interacting with the app or starving CPUs?".
App-level stack sampling tools (sample Java Methods, simpleperf) to get a better understanding of what's going on inside the app, see what code is executing and how long each method call is taking.

simpleperf

According to the Readme:

Simpleperf is a native CPU profiling tool for Android. It can be used to profile both Android applications and native processes running on Android. It can profile both Java and C++ code on Android.

The general idea is that it runs with less overhead than the good old sample Java Methods, so the results are closer to reality.

In the past, I tried to follow the command line instructions to profile an Android application with simpleperf, but I never fully understood how to use it.

I only recently realized that simpleperf has been integrated into Android Studio for a while, under the option "C / C++ trace recording". In Android Studio Bumblebee, the option was renamed to "Callstack sample recording" and "sample Java Methods" became "legacy".

Note: Debug.startMethodTracingSampling() is still the only available API for instrumentation despite being the exact same thing as the now legacy "sample Java Methods", although apparently starting with API 29 we can now invoke simpleperf from code.

Unusable traces

When I record a simpleperf trace from a complex app, here's the result:

Notice the many thin grey vertical lines that break up the main thread call tree, all the way from the top. That's weird!

If you zoom in, you can see that the left and right spans around these vertical lines are identical stacks:

These should be just one giant call stack, not two stacks separated by a weird tiny stack. What's going on?

If you run into this issue, you can work around it by selecting a time-based span manually for the analysis. It works but it's not great.

DWARF

Simpleperf generates DWARF-based call graphs. I have no idea what that means, but the simpleperf FAQ mentions it:

Why can't we always get complete DWARF-based call graphs?
DWARF-based call graphs are generated by unwinding thread stacks. When a sample is generated, up to 64KB of stack data is dumped by the kernel. By unwinding the stack based on dwarf information, we get a callchain. But the thread stack can be much longer than 64KB. In that case, we can't unwind to the thread start point.
To alleviate the problem, simpleperf joins callchains after recording them. If two callchains of a thread have an entry containing the same IP and SP address, then simpleperf tries to join them to make the callchains longer.

In other words: for each thread stack sample, simpleperf can only capture the first 64KB at the top of the stack, and stitches it all back as a full callchain by finding the rest of it in other samples that share some common callchain entry. That's very cool!

Unfortunately, if the stack changes significantly in between consecutive samples, then simpleperf cannot find any common callchain entry, so it just keeps those truncated callchains in. Which explains why our call tree was broken up by weird super-thin vertical bars!

Stitching it back

I tweeted about this bug in October 2021 and then moved on with my life. But recently I've been using simpleperf again and I decided to see if I could fix the trace files.

I realized that those bad stack samples should be easy to spot, as they don't have the same root frames as every other sample (e.g. __libc_init followed by main). Once spotted, I can fix the bad stack samples by prepending a fake callchain based on the good samples that surround the bad sample.

Cool, let's write a trace parser! Fortunately, I found the Android Studio implementation: SimpleperfTraceParser.java.

I spent a few hours (mostly fighting gradle and protos) adapting it to do what I wanted:

if (sample.callchainList.last() == callStackRoot) {  if (brokenRecords.isNotEmpty()) {    // Reversed so that root is at index 0    val lastCallchain = lastValidSample.callchainList.reversed()    val nextCallchain = sample.callchainList.reversed()    var divergenceIndex = 0    while (divergenceIndex < nextCallchain.size      && divergenceIndex < lastCallchain.size      && lastCallchain[divergenceIndex] == nextCallchain[divergenceIndex]    ) {      divergenceIndex++    }    val sharedCallChain = nextCallchain.subList(0, divergenceIndex)      .reversed()    for (brokenRecord in brokenRecords) {      output.writeFixedRecord(brokenRecord, sharedCallChain)    }    brokenRecords.clear()  }  lastValidSample = sample  output.writeInt(recordSize)  output.write(recordBytes)} else {  brokenRecords += record  brokenMainThreadSampleCount++}

Once a trace is fixed, I can import it in the Android Studio profiler:

Much better!

Conclusion

The code is available at github.com/pyricau/simpleperf-cleanup.

I considered releasing it as a library or a CLI tool, but I figured, for now, anyone can use it reasonably easily:

git clone git@github.com:pyricau/simpleperf-cleanup.gitcd simpleperf-cleanup./gradlew app:run --args="PATH/TO/TRACE.trace"

Hopefully, this will eventually be fixed in simpleperf or AndroidStudio and we won't need this hack (the Android Studio team is aware).

This hack also made me realize it wouldn't be too hard to build additional tooling on top of simpleperf traces, e.g. to support SQL queries or code-based investigations, or new types of graphs. Stay tuned!

Cover image: Dead Tired by Romain Guy.

WorkManager multi-process for libraries

Pierre-Yves Ricau — Tue, 28 Dec 2021 07:15:42 GMT

Cover image: Beacons by Romain Guy.

Summary

This blog shows how LeakCanary builds on top of WorkManager to run work in a separate process, while also not messing with the configuration of the hosting app. WorkManager is an amazing library, but using it to schedule remote process work as a library has several gotchas.

Intro

For LeakCanary 2.8 I'm rewriting how the heap analysis is performed to stop relying on a foreground service and use WorkManager instead, because Android 12 happened. To limit memory pressure on the hosting app, LeakCanary supports running the analysis in a separate process. Let's see how this all fits together!

I need to support the following behavior:

If the app using my library depends on WorkManager and WorkManager multi process, schedule the work to run in a separate process.
If the app using my library depends on just WorkManager, schedule the work with WorkManager.
Otherwise use a simple background thread. This isn't ideal but I'm ok with losing the work on process death.

Optional WorkManager

Libraries should avoid forcing dependencies down on their consumers when possible. Let's look at how we can do that for WorkManager.

First, I add the WorkManager dependencies to our compile classpath as compileOnly so that I can write code against the WorkManager APIs, without having those dependencies appear in the published pom.xml on Maven Central

dependencies {  // Optional dependencies  // Note: using the Java artifact because the Kotlin one bundles coroutines.  compileOnly 'androidx.work:work-runtime:2.7.0'  compileOnly 'androidx.work:work-multiprocess:2.7.0'}

Then I just need a runtime check for the WorkManager class:

val workManagerInClasspath by lazy {    try {      Class.forName("androidx.work.WorkManager")      true    } catch (ignored: Throwable) {      false    }  }

Thread vs WorkManager

Here's a simple implementation that falls back on a background thread if WorkManager isn't available:

class MyWorkScheduler(private val application: Application) {  val workManagerInClasspath = // ...  val backgroundHandler by lazy {    val handlerThread = HandlerThread("Background worker thread")    handlerThread.start()    Handler(handlerThread.looper)  }  fun enqueueWork() {    if (workManagerInClasspath) {      enqueueOnWorkManager()    } else {      enqueueOnBackgroundThread()    }  }  private fun enqueueOnWorkManager() {    val request = OneTimeWorkRequest.Builder(MyWorker::class.java)      .build()    WorkManager.getInstance(application).enqueue(request)  }  private fun enqueueOnBackgroundThread() {    backgroundHandler.post {      TODO("perform the work")    }  }}class MyWorker(  appContext: Context,  workerParams: WorkerParameters) : Worker(appContext, workerParams) {  override fun doWork(): Result {    TODO("perform the work")    return Result.success()  }}

So far we have fairly standard WorkManager code. I'm intentionally staying away from setting up a WorkManager Configuration as that can only be set once, and I don't want to step on the toes of the developers using WorkManager for their own purposes.

WorkManager 2.7.0 introduces the concept of expedited work, introduced for Android 12 as an alternative to the (now crashing) foreground services. Ideally I want to leverage that new API... but I don't want to force dependency upgrades, so let's add another runtime check:

class MyWorkScheduler {  // ...  // setExpedited() requires WorkManager 2.7.0+  private val workManagerSupportsExpeditedRequests by lazy {    try {      Class.forName("androidx.work.OutOfQuotaPolicy")      true    } catch (ignored: Throwable) {      false    }  }  private fun enqueueOnWorkManager() {    val request = OneTimeWorkRequest.Builder(MyWorker::class.java).apply {      if (workManagerSupportsExpeditedRequests) {        setExpedited(OutOfQuotaPolicy.RUN_AS_NON_EXPEDITED_WORK_REQUEST)      }    }.build()    WorkManager.getInstance(application).enqueue(request)  }}

Note: the OneTimeWorkRequest.Builder API is unusual: there's no symmetry, i.e. you can't undo state changes (unset expedited, remove a tag...)

Multi process

I want to schedule work from the main app process (e.g. com.example), and that work should execute in a separate process (e.g. com.example:mywork).

RemoteWorkerService

First, I register a RemoteWorkerService that will run in the :mywork process:

class MyRemoteWorkerService : RemoteWorkerService()

Note: I'm registering a subclass of RemoteWorkerService because component names are unique per app, so this avoids conflicts if the consuming app already registered a RemoteWorkerService in its manifest. The RemoteWorkerService class should probably have been abstract.

<manifest xmlns:android="http://schemas.android.com/apk/res/android"    package="com.example">  <application>    <service      android:name=".MyRemoteWorkerService"      android:exported="false"      android:process=":mywork" />manifest>

RemoteWorker

Since my remote process uses the same APK, my remote worker should ideally be fairly identical to my previous non remote worker, for example:

class MyRemoteWorker(  appContext: Context,  workerParams: WorkerParameters) : RemoteWorker(appContext, workerParams) {  override fun doWork(): Result {    TODO("perform the work")    return Result.success()  }}

Unfortunately, the RemoteWorker class doesn't exist, we only have RemoteListenableWorker. That's ok though, same as how Worker extends ListenableWorker, I can create a RemoteWorker that extends RemoteListenableWorker:

abstract class RemoteWorker(  context: Context,  workerParams: WorkerParameters) : RemoteListenableWorker(context, workerParams) {  abstract fun doWork(): Result  override fun startRemoteWork(): ListenableFuture {    val future = SettableFuture.create()    backgroundExecutor.execute {      try {        val result = doWork()        future.set(result)      } catch (throwable: Throwable) {        future.setException(throwable)      }    }    return future  }}

Note: I'm not sure why Worker is provided but RemoteWorker isn't. That might be because blocking APIs tend to lead to implementations that don't support cancellation (as is the case here). The other thing is, there's so little difference between the remote and non remote implementations, I wish I could define a single worker class and decide where to run it when I schedule the work.

Scheduling the work

Scheduling remote work is almost identical, except that we need to provide the component name for the remote service as part of the work request (these arguments are parsed by RemoteListenableWorker):

class MyWorkScheduler {  // ...  private val remoteWorkerServiceInClasspath by lazy {    try {      Class.forName("androidx.work.multiprocess.RemoteWorkerService")      true    } catch (ignored: Throwable) {      false    }  }  fun enqueueWork() {    if (remoteWorkerServiceInClasspath) {      enqueueOnWorkManagerRemote()    } else if (workManagerInClasspath) {      enqueueOnWorkManager()    } else {      enqueueOnBackgroundThread()    }  }  private fun enqueueOnWorkManagerRemote() {    val request = OneTimeWorkRequest.Builder(MyRemoteWorker::class.java).apply {      putString(ARGUMENT_PACKAGE_NAME, application.packageName)      putString(ARGUMENT_CLASS_NAME, "com.example.MyRemoteWorkerService")      if (workManagerSupportsExpeditedRequests) {        setExpedited(OutOfQuotaPolicy.RUN_AS_NON_EXPEDITED_WORK_REQUEST)      }    }.build()    WorkManager.getInstance(application).enqueue(request)  }}

Crash

At this point, I feel pretty good about the whole thing. This should work! Let's run the code:

java.lang.IllegalStateException: WorkManager is not initialized properly.You have explicitly disabled WorkManagerInitializer in your manifest,have not manually called WorkManager#initialize at this point, and yourApplication does not implement Configuration.Provider.    at androidx.work.impl.WorkManagerImpl.getInstance(WorkManagerImpl.java:158)    at androidx.work.multiprocess.ListenableWorkerImpl.(ListenableWorkerImpl.java:72)    at androidx.work.multiprocess.RemoteWorkerService.onCreate(RemoteWorkerService.java:37)    at android.app.ActivityThread.handleCreateService(ActivityThread.java:4487)    at android.app.ActivityThread.access$1700(ActivityThread.java:247)    at android.app.ActivityThread$H.handleMessage(ActivityThread.java:2072)    at android.os.Handler.dispatchMessage(Handler.java:106)    at android.os.Looper.loopOnce(Looper.java:201)    at android.os.Looper.loop(Looper.java:288)    at android.app.ActivityThread.main(ActivityThread.java:7839)    at java.lang.reflect.Method.invoke(Native Method)    at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:548)    at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1003)

The exception message is confusing at first. I didn't do anything to WorkManager's initialization, why is it complaining?

See, that's the thing: when RemoteWorkerService.onCreate() runs in the com.example:mywork process, WorkManager needs to be already initialized. In the main process, that's automatically done by the Androidx startup library. However, the startup initializers don't run in other processes!

As a library developer, I don't have access to the Application class, so I can't make it implement Configuration.Provider or have it call WorkManager.initialize().

I can find another way to init WorkManager, however WorkManager initialization can only happen once, so if the developer set up custom WorkManager initialization then my init will conflict with their init. Unfortunately, there are no WorkManager.isInitialized() or WorkManager.getInstanceAndInitIfNotDoneYet() APIs.

Sometimes you just have to follow the desire path...

Let's create the API we need:

class MyRemoteWorkerService : RemoteWorkerService() {  override fun onCreate() {    if (!isWorkManagerInitialized()) {      WorkManager.initialize(        applicationContext,        Configuration.Builder().build()      )    }    super.onCreate()  }  private fun isWorkManagerInitialized() = try {    WorkManager.getInstance(applicationContext)    true  } catch (ignored: Throwable) {    false  }}

Rescheduling

At this point, it works! However, I quickly notice that when I schedule remote work, the :mywork process starts, the work starts, then the work is immediately canceled, then it's rescheduled and eventually runs fine. That's weird.

After debugging the WorkManager library code in 2 parallel processes (😰) I eventually figure out that when WorkManager is initialized, it runs a ForceStopRunnable which cancels and then reschedules all the work on init.

One way to prevent ForceStopRunnable from running is to set Configuration.Builder.setDefaultProcessName to the main app process name. Unfortunately that's not ideal: if the developer did set the work manager configuration in their application class and didn't set setDefaultProcessName, then ForceStopRunnable will run and I can't do anything to change that.

So we can only provide a fix for the non initialized case:

class MyRemoteWorkerService : RemoteWorkerService() {  override fun onCreate() {    if (!isWorkManagerInitialized()) {      WorkManager.initialize(        applicationContext,        Configuration.Builder()          .setDefaultProcessName(applicationContext.packageName)          .build()      )    } else {      // If the developer didn't set setDefaultProcessName in the      // Configuration.Builder then the work will be rescheduled once       // when :mywork starts and there's nothing we can do about it.    }    super.onCreate()  }  // ...}

An ever cooler hack

Edit: after thinking through this once more, I realized there was another way I could get this working, and it's probably a better hack. Here goes.

WorkManager will automatically initialize itself on first use if the application Context implements Configuration.Provider. But there's nothing about this contract that says the Application class has to be the application Context. That's one of the most annoying parts of the Context API, but for once, we can benefit from it!

The idea is to have RemoteWorkerService.getApplicationContext() return a fake app context that implements Configuration.Provider and provides the configuration we want:

class MyRemoteWorkerService : RemoteWorkerService() {  class FakeAppContextConfigurationProvider(base: Context)    : ContextWrapper(base), Configuration.Provider {    // service.applicationContext.applicationContext still returns this    override fun getApplicationContext() = this    override fun getWorkManagerConfiguration() = Configuration.Builder()      .setDefaultProcessName(packageName)      .build()  }  private val fakeAppContext by lazy {    FakeAppContextConfigurationProvider(super.getApplicationContext())  }  override fun getApplicationContext(): Context {    return fakeAppContext  }}

This is even cooler than the previous hack because, whether or not the developer implemented Configuration.Builder in their application class, we get to decide what configuration we want for our own process. This only way around it is if they called WorkManager.initialize directly.

Conclusion

Getting WorkManager multi-process to work well in a library isn't straightforward and currently requires a few hacks, but that's not surprising: Android has historically been fairly bad at building APIs with libraries in mind, and having a single Application class has always been a source of bugs for multi process apps. The AndroidX WorkManager and Startup libraries are going in the right direction!

Huge thanks to Rahul Ravikumar for his help with figuring out WorkManager multi-process. He's already working on addressing some of these issues!

Launch Response Time

Pierre-Yves Ricau — Fri, 24 Sep 2021 22:42:48 GMT

Header image: Caterpillar by Romain Guy.

In 2020 I started a series of Android performance deep dive blogs: Android Vitals. I'm currently working on releasing an Open Source library that helps with performance monitoring in production, and today I want to write a high level summary of what I've learnt and look at how we can properly measure Launch Response Time in production.

Note: until now I've been writing at dev.to/pyricau, this time I'm trying out Hashnode to see how it feels. Feedback welcome on whether I should continue here or back there.

Terminology

Launch Response Time is the time from when the system triggers App Launch to when the display has rendered the first frame of the window of the activity brought to the foreground.
App Launch is what happens when an Android application that wasn't visible is brought to the foreground. This can happen when an intent is fired to launch one of your app activities (e.g. when a user taps on your app icon in the launcher) or when an activity task is brought back from Recents. You can have an App Launch without a corresponding Process Start.
Process Start is what happens when the system_server process tells your app process to start loading your APK code and resources and calls Application.onCreate() (learn more). This happens only once per process, and usually starts with a fork of the Zygote process, but not always. You can have a Process Start not triggered by an App Launch.

App Launch or App Startup?

App Launch is often called App Startup, but I like to avoid the verb Start, as bringing back an app from recent isn't really starting anything from a user standpoint.

Process Start

Start time

In When did my app start? , I concluded that process start time should ideally be measured when your app code and resources load. You need to use a different approach depending on the API level:

Up to API 24: Use the class load time of a content provider (Process.getStartUptimeMillis() not available yet).
API 24 - API 27: Use Process.getStartUptimeMillis().
API 28 and beyond: Process.getStartUptimeMillis() is sometimes way off. Use Process.getStartUptimeMillis() but filter out weird values (e.g. more than 1 min to get to Application.onCreate()) and fallback to the time ContentProvider.onCreate() is called.

Process start for app launch

In Is this a cold start? 🦋 and Why did my process start? 🌄, I established how to detect that a process start is for an app launch:

Process importance on process start (measured as early as possible) is IMPORTANCE_FOREGROUND.
The first activity is created before a Handler post sent from Application.onCreate() or earlier is dequeued (learn more).

In my experience it's also helpful to track different kinds of process starts / cold app launches to split the data:

Detect whether this is the first process start after first install. This matters because first start might involve more work (e.g. copying APK assets to disk), and it's also the first time a user interacts with your app, so first impression matters!
Similarly, detecting whether this is a first process start after the user clearing data. This is a strong signal that something went wrong.
Detect whether this is the first process start after app version upgrade, which might trigger additional db migration work.
Detect app specific states such as whether the app starts in a logged in or logged out state. Logged in often involves a lot more startup work.
Detect whether the first launched activity has a bundle or not, i.e. whether the process was killed and the activity task is being restored.

App Launch

At the beginning of this article, I defined App Launch as what happens when an Android application that wasn't visible is brought to the foreground. In practice, this means that:

There was 0 visible activity, i.e. either:
- The process wasn't alive.
- The process was alive but had never started any activity.
- The process was alive and any started activity had been previously stopped or destroyed.
There is now at least one activity in foreground, i.e. resumed.

App Launch start time

Launch Response Time starts the time when the system triggers App Launch. I'm intentionally ignoring what happens right after a user taps a launcher icon because this is beyond our control. There are different types of App Launch, so their start time must be measured differently.

Cold Launch

A Cold Launch is what happens when the App Launch requires a Process Start. In the Process Start section above, I already outlined how to detect a Cold Launch and measure its start time.

Hot Launch

A Hot Launch is what happens when the process was alive and the activity that is being resumed needs to first be started, i.e. was previously stopped but not destroyed. As far as I know, the best way we have to measure its start time is by recording the time of the call to ActivityLifecycleCallbacks.onActivityPreStarted()).

Warm Launch

A Warm Launch is what happens when the process was alive and the activity that is being resumed needs to first be created, i.e. it was previously destroyed or never created. As far as I know, the best way we have to measure its start time is by recording the time of the call to ActivityLifecycleCallbacks.onActivityPreCreated()).

App Launch end time

Launch Response Time ends when the display has rendered the first frame of the window of the activity brought to the foreground. I'm ignoring the preview window rendering (can't use the app yet) and considering only the first drawn frame, but some apps might prefer waiting for extra async loading to finish.

We want to know when the display has rendered the first frame of the window of the resumed activity. Here's how we can do it:

From onResume() we can register a OnPreDrawListener.
Once we know the first draw for that window is happening, we have two options:
- I explained in Tap Response Time: Jetpack Navigation 🗺 that starting with API 24 we can rely on FrameMetrics.TOTAL_DURATION to know how long the frame takes to render and be issued to the display subsystem (i.e. when the render thread is done swapping the frame buffer). In practice we need API 26 because that's when FrameMetrics.INTENDED_VSYNC_TIMESTAMP was added. TOTAL_DURATION measures time from the intended vsync start, not the actual vsync start, so without this we'd end up measuring a longer launch time.
- Before API 24 (or 26) we can leverage Handler.sendMessageAtFrontOfQueue() from within the onPreDraw() callback to measure the end time, with the assumption that the front of queue message will be processed right after the current frame is done rendering. Starting with API 22, we should also call Message.setAsynchronous() to avoid that message being delayed by a Looper synchronization barrier.

Conclusion

I hope you enjoyed this summary of a year of deep dives! A huge thank you to Romain Guy for helping me make sense of all the display stuff, Chet Haase and John Reck for answering my questions about JankStats implementation details, and Jun for helping me realize my deep dives were in dire need of a high level summary and for his great feedback on this blog before I hit publish.

P-Y's blog

A weirder HashMap

mapOf and pairs

Memory hoarding

100% code-based map

Linear scan?

Scatter Map

DIY: your own Dependency Injection library!

Manual Dependency Injection

Coffee Example

Concepts

ObjectGraph

Factory

Module

FactoryHolderModule

ObjectGraph implementation

Putting it all together

Bind

Singletons

It works!

ReflectiveModule - Guice style

@Inject

Injected constructor

ReflectiveFactory

Less boilerplate!

InjectProcessor Dagger-1 style

Generated factories

InjectProcessor

InjectProcessorModule

Less reflection!

ComponentProcessor Dagger-2 style

@Component interface

KSP ComponentProcessor

@Binds

Generating the component implementation

No more reflection!

A different approach to binding types

Repeatable annotation

Annotation type parameters?

Only type parameters

@Bind Implementation

Conclusion

ANR internals: touch dispatching through the view hierarchy

ANR triggers

Input dispatching timed out

View touch event dispatching

Compose touch event dispatching

Aside: smoke & mirrors

Conclusion

A script to compare two Macrobenchmarks runs

Freezes & ANRs? Check memory leaks!

Navigation Latency

Memory usage on navigation

Memory limit

Example leaky session

Memory usage & Navigation Latency

The progressive impact of memory leaks

Missing the real impact of memory leaks

Takeaways

Tracking Android App Launch in production

User-Centric App Launch analytics

App launch, app start or app startup?

What is an App Launch?

App Launch Temperature

Cold Launch

Hot Launch

Warm Launch

Pre Launch state

Launch start timestamp

Cold Launch

Warm & Hot Launch

Launch end timestamp

All together in square/papa

User-Centric Mobile Performance

Useful metrics

User-centric performance metrics

Human-based thresholds

Smoothness

Responsiveness

Responsiveness thresholds

`ObjectGraph` implementation

`ReflectiveModule` - Guice style

`@Inject`

`InjectProcessor` Dagger-1 style

`InjectProcessor`

`InjectProcessorModule`

`ComponentProcessor` Dagger-2 style

`@Component` interface

KSP `ComponentProcessor`

`@Binds`

`@Bind` Implementation

All together in `square/papa`

`Picasso.targetToAction`

`ImageViewAction.callback`

`WeakHashMap`

`View.setTag(int, Object)`