Home Blog

# Obfuscation Primitives I: State Transitions

Let's implement a simple concept with some wacky code.

Hello. I develop a Java obfuscator, which means I do a lot of thinking about transforming code to make it harder to read.

In a lot of software projects, you will find that a pattern emerges that computer scientists call a finite state machine. Essentially, it's a group of 'states' that each describe logic that selects the next state to move to.

Let's make an assumption that every state transition depends on a single character input, just to make the following code examples simpler.

In Kotlin, we can write a simple implementation of something that matches a string using a finite state machine, like so:

``````class StateMachine(val acceptState: Int, val execute: (Int, Char) -> Int)

fun run(stateMachine: StateMachine, input: String): Boolean {
var state = 0
var currentStringIndex = 0

while (state >= 0 && currentStringIndex in input.indices) {
val char = input[currentStringIndex++]
state = stateMachine.execute(state, char)
}

return state == stateMachine.acceptState
}``````

So, let's design a finite state machine: We can naïvely implement it using some nested `when` statements,

``````val matchABCZ = StateMachine(3, { state, char ->
when (state) {
0 -> when (char) {
'a' -> 1
'b' -> 2
else -> -1
}

1 -> when (char) {
'c' -> 3
else -> -1
}

2 -> when (char) {
'z' -> 2
'c' -> 3
else -> -1
}

3 -> -1

else -> error("State does not exist")
}
})``````

This is cool, but if we want to edit this state machine's behaviour at runtime or ever wish to serialize this state machine, we can't store a lambda. Let's use a look-up table for our state transitions. We'll use a 2D array and then, to advance, we can do `state = lut[state][char]`.

``````class StateMachine(val acceptState: Int, val lookupTable: Array<IntArray>)

val matchABCZ = StateMachine(3, Array(4, { IntArray(Char.MAX_VALUE.toInt(), { -1 }) }).also { lut ->
// Use some extension function sugar so that we can do things like lut['a']
operator fun IntArray.set(char: Char, value: Int) = set(char.toInt(), value)

lut['a'] = 1
lut['b'] = 2

lut['c'] = 3

lut['z'] = 2
lut['c'] = 3
})``````

We've made both the code and the representation in memory much more compact.

Our run function now looks like this:

``````fun run(stateMachine: StateMachine, input: String): Boolean {
var state = 0
var currentStringIndex = 0

while (state in stateMachine.lookupTable.indices && currentStringIndex in input.indices) {
val char = input[currentStringIndex++]
state = stateMachine.lookupTable[state][char.toInt()]
}

return state == stateMachine.acceptState
}``````

## Obfuscation

It's the bit you've been waiting for. Now, we can obfuscate our look-up table. What we're going to do is create a mathematical approximation that computes the state transition for us.

Let's assume that we already have a mathematical function T(x, y) that returns the appropriate next state to transition to based on x, the current state, and y, the current input character.

Since there are only a handful of values that we actually care about1 in our state transition table; it is incredibly easy to create an approximation of T(x, y).

We have two inputs (x and y) and 5 defined entries in our look-up table. We can approximate T(x, y) by constructing a polynomial with 5 unknown coefficients, and plugging in the 5 defined values, and solving the resulting system of (5) simultaneous equations.

In order to construct a polynomial with 5 unknowns and 2 inputs, we can use the binomial expansion of `(x + y)^3` with 4 unknown coefficients, and a trailing unknown constant. This gives us a polynomial that looks like this: `a x^3 + b x^2 y + c x y^2 + d y^3 + e`. Then, we can just plug in values of x and y, and solve for a, b, c, d, and e.

To do this, I used SymPy.

``````from sympy import *

x, y = symbols('x y')
a, b, c, d, e = symbols('a:e')

T = lambda x, y: a * x**3 + b * x**2 * y + c * x * y**2 + d * y**3 + e

# lut['a'] = 1
# lut['b'] = 2
#
# lut['c'] = 3
#
# lut['z'] = 2
# lut['c'] = 3

pretty_print(solve([
Eq(T(0, ord('a')), 1),
Eq(T(0, ord('b')), 2),

Eq(T(1, ord('c')), 3),

Eq(T(2, ord('z')), 2),
Eq(T(2, ord('c')), 3)
], [a, b, c, d, e]))``````

And we get the values of a, b, c, d, and e out of it:

``````⎧   535028238     -534985412       3601586                  -884154 ⎫
⎨a: ─────────, b: ───────────, c: ─────────, d: 1/28519, e: ────────⎬
⎩    12462803      411272499      411272499                  28519  ⎭``````

And as literal values:

``````>>> 535028238 / 12462803
42.93000844192113
>>> -534985412 / 411272499
-1.3008052162515247
>>> 3601586 / 411272499
0.008757176832288025
>>> 1 / 28519
3.506434306953259e-05
>>> -884154 / 28519
-31.00227918229952``````

Now, we can implement this approximator in our Kotlin program:

``````class StateMachine(val acceptState: Int, val coefficients: DoubleArray)

fun run(stateMachine: StateMachine, input: String): Boolean {
var state = 0
var currentStringIndex = 0

while (currentStringIndex in input.indices) {
val char = input[currentStringIndex++]

val x = state.toDouble()
val y = char.toDouble()

val n = stateMachine.coefficients.lastIndex

var total = stateMachine.coefficients.last()
for (index in 0 until n) {
val coefficient = stateMachine.coefficients[index]
total += coefficient * x.pow((n - index - 1).toDouble()) * y.pow(index.toDouble())
}

// We don't want to accept undefined values which get interpolated to something like 1.5
fun rejectThreshold(d: Double, threshold: Double) =
if (ceil(d) - d < threshold) ceil(d).toInt()
else if (d - floor(d) < threshold) floor(d).toInt()
else -1

state = rejectThreshold(total, 0.05)
}

return state == stateMachine.acceptState
}

val matchABCZ = StateMachine(3, doubleArrayOf(
42.93000844192113,
-1.3008052162515247,
0.008757176832288025,
3.506434306953259e-05,
-31.00227918229952
))

run(matchABCZ, "ac") // true
run(matchABCZ, "bzzzzzzzc") // true
run(matchABCZ, "az") // false
run(matchABCZ, "abc") // false``````

As you can see, the logic has now been completely transformed. Mapping a problem domain onto another always makes for a very effective obfuscation strategy, yet is often infeasible to implement in an automatic obfuscator.