International Association for Cryptologic Research

International Association
for Cryptologic Research

Transactions on Cryptographic Hardware and Embedded Systems, Volume 2021

Online Template Attacks: Revisited:

PoC: emulated single-trace attack on wolfSSL scalar multiplication


README

Intro

This PoC simulates an Online Template Attack against wolfSSL v4.4.0 side-channel protected ECC scalar multiplication implementation. It demonstrates the feasibility of one the attacks carried out in the paper Online Template Attacks: Revisited published at TCHES 2021.

For this task, we assume the adversary can mount a CopyCat attack on said implementation. We use an emulator based on TracerGrind to capture victim and templates traces.

This document guides the user through this PoC explaining each executed step. Additionally, we provide a script that automates the whole process in one step. Before executing it, please ensure the Requirements are satisfied, then you can try: single_command.sh.

About CopyCat attack

CopyCat is a controlled channel attack on TEEs such as SGX: [paper]. CopyCat tracks the number of executed instructions at memory page granularity.

For instance, the CopyCat trace: "4A 3B 2C" means that at memory page offset A 4 instructions were executed, 3 at B, and 2 at C.

Following our OTA paper analysis on this library (Section 4.4), if an adversary mounts a CopyCat attack on this library using all the memory page offsets this library uses it will obtain an ideal state distinguisher. This means the scalar can be recovered easily.

Build and install

We tested the procedure below on Ubuntu 18.04 and 20.04. For another Linux flavors it should work, if it doesn't you can use a VM with this Ubuntu image without updating it.

TracerGrind is built on Valgrind which might cause incompatibilities with future environments. If you notice some issues, use the vanilla Ubuntu image mentioned above.

Requirements (Ubuntu 18.04)

sudo apt-get install build-essential automake git libcapstone-dev libsqlite3-dev libtool

This PoC assumes bash, wget and python3 (tested with 3.6.9) are installed.

About next builds: TracerGrind and wolfSSL shared library are built and installed in the project directory, they don't conflict with your environment.

Build TracerGrind install

./install_tracergrind.sh

Build wolfSSL library, victim and template implementations

./build_wolfssl.sh

Test builds

Both victim and template binaries are linked to the wolfSSL shared library built before.

Victim:

It generates an ECDSA signature with fixed private key (d) and hash (h).

Executing ./libs/wolfssl/victim/ecc-sign should output something like:

r = 0x7CAB2AB4F3A0BE5860DF4D458E7481EB3F07B029F72D0E1AD6F071090011B3E7
s = 0x92F1938C047D9AE768BB6A03ABD76E70436CE16735C34B8EBD51600257EB163F

Template:

The template implementation accepts a hex represented integer as input (k), and computes kG where G is the point generator for curve secp256r1.

The output of ./libs/wolfssl/template/scalar_mult_template M 1 should be:

X = 0x6b17d1f2e12c4247f8bce6e563a440f277037d812deb33a0f4a13945d898c296

At this point we have a template implementation that will be used to capture template traces with a known scalar.

In this case we are going to perform a scalar-based OTA. For more technical details see Section 3.2 in the paper. (these details are not needed to use this PoC).

Configure trace capture scripts

Get libwolssl.so address range in TracerGrind

TracerGrind records all executed instructions by a binary. We are interested only on wolfSSL shared library instructions. Therefore we must know the address range where this library is loaded to filter it during trace capturing.

cd attack_framework/wolfssl/

./get_addr_range.sh

The output should be something like:

[L] Loaded /home/tches/ota_artifact/libs/wolfssl/wolfssl-4.4.0-stable/build/lib/libwolfssl.so.24.1.0 from  0x0000000004c39b30 to 0x0000000004c7aaef

At the end of the line we get the address range used by wolfSSL when executed by TracerGrind.

Set ADDR_RANGE variable in capture scripts

With this we are limiting the page offsets used in the attack to libwolfssl's. (please modify the command above according to ./get_addr_range.sh output)

sed -ie "s/^ADDR_RANGE=.*$/ADDR_RANGE=0x4c39b30-0x4c7aaef/" get_victim_trace.sh
sed -ie "s/^ADDR_RANGE=.*$/ADDR_RANGE=0x4c39b30-0x4c7aaef/" get_template_trace.sh

Once this configuration is done, there is no need to repeat previous steps for different attack runs.

Testing determinism

Here we are going to test the attack environment. First we are going to check if the template implementation generates the same traces for the same input. Section 3.3 of the paper fully describes this methodology.

Lets capture five template traces with the same scalar. If the implementation follows a deterministic path regarding CopyCat the traces will be identical.

for i in `seq 5`; do ./get_template_trace.sh 0x510670a78ba39c5849d92b527ed72425afdc12534106640d81430f79d49d9f25 > $i.txt; done

Compare files: md5sum 1.txt 2.txt 3.txt 4.txt 5.txt

The md5 hash of those files should be the same. If they are not, it is very unlikely that this PoC attack will succeed.

Identical traces could be due to input-independent code, thus lets try with a different scalar:

./get_template_trace.sh 0xd1be4e05f62f36df445621bd7e1438ef43e6fb5b85327014ff3deec84cf02eff > 6.txt

Comparing again will show they are different.

md5sum 1.txt 6.txt

This suggests there is some dependence between the scalar and the CopyCat trace. This test says nothing about how strong said dependence is, however the results in our paper demonstrate there is actually a leakage that can be used to uniquely distinguish the processed data.

Lets recover a bit, shall we?

This toy example is the base of the full attack that will be executed later.

This time we will only play with the template implementation to get the attack idea.

Lets capture a victim scalar multiplication trace with a "secret" scalar.

./get_template_trace.sh 0x510670a78ba39c5849d92b527ed72425afdc12534106640d81430f79d49d9f25 > victim.txt

Then we play the attacker role capturing two template traces using two scalars unrelated to the "secret" one:

One will have its most significant bit set:

./get_template_trace.sh 0xf1307c33d4dad67df8ac0e46628c1ed2b2b8b3962bfc81175843614da98e0006 > 1.txt

and the other cleared.

./get_template_trace.sh 0x0e4ad966d76ab7d6330105cbf2242c9edcff3eca187e43211ae6b5da0a055f17 > 0.txt

The targeted scalar multiplication is implemented in function wc_ecc_mulmod_ex wolfSSL. Said function contains a few implementations controlled by preprocessor directives. We are interested in the side-channel protected one link:

The targeted implementation executes a Montgomery ladder, thus the bits of the scalar are parsed left-to-right.

In the previous section we confirmed two traces of the same scalar will be identical. Moreover, according to Section 4.4 of the paper the CopyCat traces can be used to distinguish the data processed at each Montgomery ladder iteration.

Therefore the higher the number of most significant bits (MSBs) the secret scalar shares with a template one, the higher the number of bytes victim and template traces will share since their beginning.

Lets find the index of the first difference between the victim.txt trace and the templates ones.

cmp victim.txt 0.txt

cmp victim.txt 1.txt

Command outputs should look like this:

victim.txt 0.txt differ: byte 252258, line 1
victim.txt 1.txt differ: byte 248460, line 1

In this cases, 0.txt shares more starting bytes with victim.txt than 1.txt. (byte index values might vary but their relation shouldn't)

Therefore, the attacker learns the MSB of the "secret" scalar is a zero. A fact we can confirm using the "secret" scalar.

Repeating this process fixing the just recovered MSB and varying the next one can be used to recover the latter.

This is the rationale of an scalar-based Online Template Attack and will be used in the next section to recover all bits of an ECDSA nonce using a single trace.

Note that here we are exploiting some input dependent control flow leakage in the scalar multiplication implementation. However, a cool feature of OTA is that we don't need to know which concrete vulnerability is to exploit it. It could be as trivial as simple leakage or one lurking at the depths of the bignum implementation. The footnotes at Section 5.3 in the paper pinpoint some of these leaky control flows.

Full scalar recovery

This time we will target an ECDSA signature generation with hardcoded private key and hash. Our objective is to recover all the bits of the secret nonce k, which reveals the private key directly.

Lets capture a single victim trace:

./get_victim_trace.sh

This script generates two files: full_victim_trace.txt and victim_output.txt. The latter contains the r coordinates of the signature, which will be used to check if the attack succeed as r = kG.

Before executing the attack, we have to calculate the alignment between the victim trace and the templates one. Recall the template implementation only executes a scalar multiplication, while the ECDSA signature generation involves other computations.

For this task, we must ensure there exist 0.txt and 1.txt which contains traces from the template implementation when the MSB is zero and one respectively. If you are not sure about this, execute:

./get_template_trace.sh 0xf1307c33d4dad67df8ac0e46628c1ed2b2b8b3962bfc81175843614da98e0006 > 1.txt
./get_template_trace.sh 0x0e4ad966d76ab7d6330105cbf2242c9edcff3eca187e43211ae6b5da0a055f17 > 0.txt

To compute the alignment index run:

python3 find_scalar_mult_start.py

We are interested in this output field: template_start_index that will be a numeric value.

The bit recovery procedure using a scalar-based Online Template Attack is implemented in scalar_based_ota.py.

To recover all bits of the secret ECDSA nonce the attacker should execute the following script replacing the by the output of the previous command:

python3 scalar_based_ota.py --template_start_index <template_start_index>

The attack takes time, in our tests it is ~145 min. Captured traces are big text files hence slow processing. So, grab a good book and relax :)

When the script finishes you can verify the recovered scalar using r = kG. For this task, you can use the template implementation to compute kG.

../../libs/wolfssl/template/scalar_mult_template M <scalar>

The output should match the r field in the victim_output.txt file.

Partial scalar recovery

If you want to try a partial recovery you can use the ecdsa_get_k.py and the signature output information in victim_output.txt to compute the nonce k.

python3 ecdsa_get_k.py <r> <s>

Then you can execute the bit recovery script like:

python3 scalar_based_ota.py --template_start_index <template_start_index> --num_bits 8