Home Blog

Obfuscation Primitives I: State Transitions

Let's implement a simple concept with some wacky code.


Hello. I develop a Java obfuscator, which means I do a lot of thinking about transforming code to make it harder to read.

In a lot of software projects, you will find that a pattern emerges that computer scientists call a finite state machine. Essentially, it's a group of 'states' that each describe logic that selects the next state to move to.

Let's make an assumption that every state transition depends on a single character input, just to make the following code examples simpler.

In Kotlin, we can write a simple implementation of something that matches a string using a finite state machine, like so:

class StateMachine(val acceptState: Int, val execute: (Int, Char) -> Int)

fun run(stateMachine: StateMachine, input: String): Boolean {
    var state = 0
    var currentStringIndex = 0

    while (state >= 0 && currentStringIndex in input.indices) {
        val char = input[currentStringIndex++]
        state = stateMachine.execute(state, char)
    }

    return state == stateMachine.acceptState
}

So, let's design a finite state machine:

We can naïvely implement it using some nested when statements,

val matchABCZ = StateMachine(3, { state, char ->
    when (state) {
        0 -> when (char) {
            'a' -> 1
            'b' -> 2
            else -> -1
        }

        1 -> when (char) {
            'c' -> 3
            else -> -1
        }

        2 -> when (char) {
            'z' -> 2
            'c' -> 3
            else -> -1
        }

        3 -> -1

        else -> error("State does not exist")
    }
})

This is cool, but if we want to edit this state machine's behaviour at runtime or ever wish to serialize this state machine, we can't store a lambda. Let's use a look-up table for our state transitions. We'll use a 2D array and then, to advance, we can do state = lut[state][char].

class StateMachine(val acceptState: Int, val lookupTable: Array<IntArray>)

val matchABCZ = StateMachine(3, Array(4, { IntArray(Char.MAX_VALUE.toInt(), { -1 }) }).also { lut ->
    // Use some extension function sugar so that we can do things like lut[0]['a']
    operator fun IntArray.set(char: Char, value: Int) = set(char.toInt(), value)

    lut[0]['a'] = 1
    lut[0]['b'] = 2

    lut[1]['c'] = 3

    lut[2]['z'] = 2
    lut[2]['c'] = 3
})

We've made both the code and the representation in memory much more compact.

Our run function now looks like this:

fun run(stateMachine: StateMachine, input: String): Boolean {
    var state = 0
    var currentStringIndex = 0

    while (state in stateMachine.lookupTable.indices && currentStringIndex in input.indices) {
        val char = input[currentStringIndex++]
        state = stateMachine.lookupTable[state][char.toInt()]
    }

    return state == stateMachine.acceptState
}

Obfuscation

It's the bit you've been waiting for. Now, we can obfuscate our look-up table. What we're going to do is create a mathematical approximation that computes the state transition for us.

Let's assume that we already have a mathematical function T(x, y) that returns the appropriate next state to transition to based on x, the current state, and y, the current input character.

Since there are only a handful of values that we actually care about1 in our state transition table; it is incredibly easy to create an approximation of T(x, y).

We have two inputs (x and y) and 5 defined entries in our look-up table. We can approximate T(x, y) by constructing a polynomial with 5 unknown coefficients, and plugging in the 5 defined values, and solving the resulting system of (5) simultaneous equations.

In order to construct a polynomial with 5 unknowns and 2 inputs, we can use the binomial expansion of (x + y)^3 with 4 unknown coefficients, and a trailing unknown constant. This gives us a polynomial that looks like this: a x^3 + b x^2 y + c x y^2 + d y^3 + e. Then, we can just plug in values of x and y, and solve for a, b, c, d, and e.

To do this, I used SymPy.

from sympy import *

x, y = symbols('x y')
a, b, c, d, e = symbols('a:e')

T = lambda x, y: a * x**3 + b * x**2 * y + c * x * y**2 + d * y**3 + e

# lut[0]['a'] = 1
# lut[0]['b'] = 2
#
# lut[1]['c'] = 3
#
# lut[2]['z'] = 2
# lut[2]['c'] = 3

pretty_print(solve([
    Eq(T(0, ord('a')), 1),
    Eq(T(0, ord('b')), 2),

    Eq(T(1, ord('c')), 3),

    Eq(T(2, ord('z')), 2),
    Eq(T(2, ord('c')), 3)
], [a, b, c, d, e]))

And we get the values of a, b, c, d, and e out of it:

⎧   535028238     -534985412       3601586                  -884154 ⎫
⎨a: ─────────, b: ───────────, c: ─────────, d: 1/28519, e: ────────⎬
⎩    12462803      411272499      411272499                  28519  ⎭

And as literal values:

>>> 535028238 / 12462803
42.93000844192113
>>> -534985412 / 411272499
-1.3008052162515247
>>> 3601586 / 411272499
0.008757176832288025
>>> 1 / 28519
3.506434306953259e-05
>>> -884154 / 28519
-31.00227918229952

Now, we can implement this approximator in our Kotlin program:

class StateMachine(val acceptState: Int, val coefficients: DoubleArray)

fun run(stateMachine: StateMachine, input: String): Boolean {
    var state = 0
    var currentStringIndex = 0

    while (currentStringIndex in input.indices) {
        val char = input[currentStringIndex++]

        val x = state.toDouble()
        val y = char.toDouble()

        val n = stateMachine.coefficients.lastIndex

        var total = stateMachine.coefficients.last()
        for (index in 0 until n) {
            val coefficient = stateMachine.coefficients[index]
            total += coefficient * x.pow((n - index - 1).toDouble()) * y.pow(index.toDouble())
        }

        // We don't want to accept undefined values which get interpolated to something like 1.5
        fun rejectThreshold(d: Double, threshold: Double) =
            if (ceil(d) - d < threshold) ceil(d).toInt()
            else if (d - floor(d) < threshold) floor(d).toInt()
            else -1

        state = rejectThreshold(total, 0.05)
    }

    return state == stateMachine.acceptState
}

val matchABCZ = StateMachine(3, doubleArrayOf(
        42.93000844192113,
        -1.3008052162515247,
        0.008757176832288025,
        3.506434306953259e-05,
        -31.00227918229952
))

run(matchABCZ, "ac") // true
run(matchABCZ, "bzzzzzzzc") // true
run(matchABCZ, "az") // false
run(matchABCZ, "abc") // false

As you can see, the logic has now been completely transformed. Mapping a problem domain onto another always makes for a very effective obfuscation strategy, yet is often infeasible to implement in an automatic obfuscator.