# Obfuscation Primitives I: State Transitions

Let's implement a simple concept with some wacky code.

Hello. I develop a Java obfuscator, which means I do a lot of thinking about transforming code to make it harder to read.

In a lot of software projects, you will find that a pattern emerges that computer scientists call a finite state machine. Essentially, it's a group of 'states' that each describe logic that selects the next state to move to.

Let's make an assumption that every state transition depends on a single character input, just to make the following code examples simpler.

In Kotlin, we can write a simple implementation of something that matches a string using a finite state machine, like so:

```
class StateMachine(val acceptState: Int, val execute: (Int, Char) -> Int)
fun run(stateMachine: StateMachine, input: String): Boolean {
var state = 0
var currentStringIndex = 0
while (state >= 0 && currentStringIndex in input.indices) {
val char = input[currentStringIndex++]
state = stateMachine.execute(state, char)
}
return state == stateMachine.acceptState
}
```

So, let's design a finite state machine:

We can naïvely implement it using some nested `when`

statements,

```
val matchABCZ = StateMachine(3, { state, char ->
when (state) {
0 -> when (char) {
'a' -> 1
'b' -> 2
else -> -1
}
1 -> when (char) {
'c' -> 3
else -> -1
}
2 -> when (char) {
'z' -> 2
'c' -> 3
else -> -1
}
3 -> -1
else -> error("State does not exist")
}
})
```

This is cool, but if we want to edit this state machine's behaviour at runtime or ever wish to serialize this state machine, we can't store a lambda. Let's use a look-up table for our state transitions. We'll use a 2D array and then, to advance, we can do `state = lut[state][char]`

.

```
class StateMachine(val acceptState: Int, val lookupTable: Array<IntArray>)
val matchABCZ = StateMachine(3, Array(4, { IntArray(Char.MAX_VALUE.toInt(), { -1 }) }).also { lut ->
// Use some extension function sugar so that we can do things like lut[0]['a']
operator fun IntArray.set(char: Char, value: Int) = set(char.toInt(), value)
lut[0]['a'] = 1
lut[0]['b'] = 2
lut[1]['c'] = 3
lut[2]['z'] = 2
lut[2]['c'] = 3
})
```

We've made both the code and the representation in memory much more compact.

Our run function now looks like this:

```
fun run(stateMachine: StateMachine, input: String): Boolean {
var state = 0
var currentStringIndex = 0
while (state in stateMachine.lookupTable.indices && currentStringIndex in input.indices) {
val char = input[currentStringIndex++]
state = stateMachine.lookupTable[state][char.toInt()]
}
return state == stateMachine.acceptState
}
```

## Obfuscation

It's the bit you've been waiting for. Now, we can *obfuscate* our look-up table. What we're going to do is create a mathematical approximation that computes the state transition for us.

Let's assume that we already have a mathematical function T(x, y) that returns the appropriate next state to transition to based on *x*, the current state, and *y*, the current input character.

Since there are only a handful of values that we actually care about^{1} in our state transition table; it is incredibly easy to create an approximation of T(x, y).

We have two inputs (x and y) and 5 defined entries in our look-up table. We can approximate T(x, y) by constructing a polynomial with 5 unknown coefficients, and plugging in the 5 defined values, and solving the resulting system of (5) simultaneous equations.

In order to construct a polynomial with 5 unknowns and 2 inputs, we can use the binomial expansion of `(x + y)^3`

with 4 unknown coefficients, and a trailing unknown constant. This gives us a polynomial that looks like this: `a x^3 + b x^2 y + c x y^2 + d y^3 + e`

. Then, we can just plug in values of x and y, and solve for a, b, c, d, and e.

To do this, I used SymPy.

```
from sympy import *
x, y = symbols('x y')
a, b, c, d, e = symbols('a:e')
T = lambda x, y: a * x**3 + b * x**2 * y + c * x * y**2 + d * y**3 + e
# lut[0]['a'] = 1
# lut[0]['b'] = 2
#
# lut[1]['c'] = 3
#
# lut[2]['z'] = 2
# lut[2]['c'] = 3
pretty_print(solve([
Eq(T(0, ord('a')), 1),
Eq(T(0, ord('b')), 2),
Eq(T(1, ord('c')), 3),
Eq(T(2, ord('z')), 2),
Eq(T(2, ord('c')), 3)
], [a, b, c, d, e]))
```

And we get the values of a, b, c, d, and e out of it:

```
⎧ 535028238 -534985412 3601586 -884154 ⎫
⎨a: ─────────, b: ───────────, c: ─────────, d: 1/28519, e: ────────⎬
⎩ 12462803 411272499 411272499 28519 ⎭
```

And as literal values:

```
>>> 535028238 / 12462803
42.93000844192113
>>> -534985412 / 411272499
-1.3008052162515247
>>> 3601586 / 411272499
0.008757176832288025
>>> 1 / 28519
3.506434306953259e-05
>>> -884154 / 28519
-31.00227918229952
```

Now, we can implement this approximator in our Kotlin program:

```
class StateMachine(val acceptState: Int, val coefficients: DoubleArray)
fun run(stateMachine: StateMachine, input: String): Boolean {
var state = 0
var currentStringIndex = 0
while (currentStringIndex in input.indices) {
val char = input[currentStringIndex++]
val x = state.toDouble()
val y = char.toDouble()
val n = stateMachine.coefficients.lastIndex
var total = stateMachine.coefficients.last()
for (index in 0 until n) {
val coefficient = stateMachine.coefficients[index]
total += coefficient * x.pow((n - index - 1).toDouble()) * y.pow(index.toDouble())
}
// We don't want to accept undefined values which get interpolated to something like 1.5
fun rejectThreshold(d: Double, threshold: Double) =
if (ceil(d) - d < threshold) ceil(d).toInt()
else if (d - floor(d) < threshold) floor(d).toInt()
else -1
state = rejectThreshold(total, 0.05)
}
return state == stateMachine.acceptState
}
val matchABCZ = StateMachine(3, doubleArrayOf(
42.93000844192113,
-1.3008052162515247,
0.008757176832288025,
3.506434306953259e-05,
-31.00227918229952
))
run(matchABCZ, "ac") // true
run(matchABCZ, "bzzzzzzzc") // true
run(matchABCZ, "az") // false
run(matchABCZ, "abc") // false
```

As you can see, the logic has now been completely transformed. Mapping a problem domain onto another always makes for a very effective obfuscation strategy, yet is often infeasible to implement in an automatic obfuscator.