Transactions on Cryptographic Hardware and Embedded Systems, Volume 2021
Online Template Attacks: Revisited:
PoC: emulated single-trace attack on wolfSSL scalar multiplication
README
Intro
This PoC simulates an Online Template Attack against wolfSSL v4.4.0 side-channel protected
ECC scalar multiplication implementation.
It demonstrates the feasibility of one the attacks carried out in the paper
Online Template Attacks: Revisited
published at TCHES 2021.
For this task, we assume the adversary can mount a CopyCat attack on said implementation.
We use an emulator based on TracerGrind
to capture victim and templates traces.
This document guides the user through this PoC explaining each executed step.
Additionally, we provide a script that automates the whole process in one step.
Before executing it, please ensure the Requirements are satisfied,
then you can try: single_command.sh
.
About CopyCat attack
CopyCat is a controlled channel attack on TEEs such as SGX:
[paper].
CopyCat tracks the number of executed instructions at memory page granularity.
For instance, the CopyCat trace: "4A 3B 2C"
means that at memory page offset A 4 instructions were executed, 3 at B, and 2 at C.
Following our OTA paper analysis on this library (Section 4.4),
if an adversary mounts a CopyCat attack on
this library using all the memory page offsets this library uses it will obtain an ideal state distinguisher.
This means the scalar can be recovered easily.
Build and install
We tested the procedure below on Ubuntu 18.04 and 20.04.
For another Linux flavors it should work, if it doesn't you can use a VM with
this Ubuntu image
without updating it.
TracerGrind is built on Valgrind which might cause incompatibilities with future environments.
If you notice some issues, use the vanilla Ubuntu image mentioned above.
Requirements (Ubuntu 18.04)
sudo apt-get install build-essential automake git libcapstone-dev libsqlite3-dev libtool
This PoC assumes bash, wget and python3 (tested with 3.6.9) are installed.
About next builds: TracerGrind and wolfSSL shared library are built and installed in the project
directory, they don't conflict with your environment.
Build TracerGrind install
./install_tracergrind.sh
Build wolfSSL library, victim and template implementations
./build_wolfssl.sh
Test builds
Both victim and template binaries are linked to the wolfSSL shared library built before.
Victim:
It generates an ECDSA signature with fixed private key (d
) and hash (h
).
Executing ./libs/wolfssl/victim/ecc-sign
should output something like:
r = 0x7CAB2AB4F3A0BE5860DF4D458E7481EB3F07B029F72D0E1AD6F071090011B3E7
s = 0x92F1938C047D9AE768BB6A03ABD76E70436CE16735C34B8EBD51600257EB163F
Template:
The template implementation accepts a hex represented integer as input (k
),
and computes kG
where G
is the point generator for curve secp256r1.
The output of ./libs/wolfssl/template/scalar_mult_template M 1
should be:
X = 0x6b17d1f2e12c4247f8bce6e563a440f277037d812deb33a0f4a13945d898c296
At this point we have a template implementation that will be used to
capture template traces with a known scalar.
In this case we are going to perform a scalar-based OTA.
For more technical details see Section 3.2 in the
paper.
(these details are not needed to use this PoC).
Configure trace capture scripts
Get libwolssl.so
address range in TracerGrind
TracerGrind records all executed instructions by a binary.
We are interested only on wolfSSL shared library instructions.
Therefore we must know the address range where this library is loaded to filter
it during trace capturing.
cd attack_framework/wolfssl/
./get_addr_range.sh
The output should be something like:
[L] Loaded /home/tches/ota_artifact/libs/wolfssl/wolfssl-4.4.0-stable/build/lib/libwolfssl.so.24.1.0 from 0x0000000004c39b30 to 0x0000000004c7aaef
At the end of the line we get the address range used by wolfSSL when executed by TracerGrind.
Set ADDR_RANGE
variable in capture scripts
With this we are limiting the page offsets used in the attack to libwolfssl
's.
(please modify the command above according to ./get_addr_range.sh
output)
sed -ie "s/^ADDR_RANGE=.*$/ADDR_RANGE=0x4c39b30-0x4c7aaef/" get_victim_trace.sh
sed -ie "s/^ADDR_RANGE=.*$/ADDR_RANGE=0x4c39b30-0x4c7aaef/" get_template_trace.sh
Once this configuration is done, there is no need to repeat previous steps for different attack runs.
Testing determinism
Here we are going to test the attack environment.
First we are going to check if the template implementation generates the same traces for the same input.
Section 3.3 of the paper
fully describes this methodology.
Lets capture five template traces with the same scalar.
If the implementation follows a deterministic path regarding CopyCat the traces will be identical.
for i in `seq 5`; do ./get_template_trace.sh 0x510670a78ba39c5849d92b527ed72425afdc12534106640d81430f79d49d9f25 > $i.txt; done
Compare files: md5sum 1.txt 2.txt 3.txt 4.txt 5.txt
The md5 hash of those files should be the same.
If they are not, it is very unlikely that this PoC attack will succeed.
Identical traces could be due to input-independent code, thus lets try with a different scalar:
./get_template_trace.sh 0xd1be4e05f62f36df445621bd7e1438ef43e6fb5b85327014ff3deec84cf02eff > 6.txt
Comparing again will show they are different.
md5sum 1.txt 6.txt
This suggests there is some dependence between the scalar and the CopyCat trace.
This test says nothing about how strong said dependence is,
however the results in our paper
demonstrate there is actually a leakage that can be used to uniquely distinguish the processed data.
Lets recover a bit, shall we?
This toy example is the base of the full attack that will be executed later.
This time we will only play with the template implementation to get the attack idea.
Lets capture a victim scalar multiplication trace with a "secret" scalar.
./get_template_trace.sh 0x510670a78ba39c5849d92b527ed72425afdc12534106640d81430f79d49d9f25 > victim.txt
Then we play the attacker role capturing two template traces using two scalars unrelated to the "secret" one:
One will have its most significant bit set:
./get_template_trace.sh 0xf1307c33d4dad67df8ac0e46628c1ed2b2b8b3962bfc81175843614da98e0006 > 1.txt
and the other cleared.
./get_template_trace.sh 0x0e4ad966d76ab7d6330105cbf2242c9edcff3eca187e43211ae6b5da0a055f17 > 0.txt
The targeted scalar multiplication is implemented in function wc_ecc_mulmod_ex
wolfSSL.
Said function contains a few implementations controlled by preprocessor directives.
We are interested in the side-channel protected one link:
The targeted implementation executes a Montgomery ladder, thus the bits of the scalar are parsed left-to-right.
In the previous section we confirmed two traces of the same scalar will be identical.
Moreover, according to Section 4.4 of the paper
the CopyCat traces can be used to distinguish the data processed at each Montgomery ladder iteration.
Therefore the higher the number of most significant bits (MSBs) the secret scalar shares
with a template one, the higher the number of bytes victim and template traces will share since their beginning.
Lets find the index of the first difference between the victim.txt
trace and the templates ones.
cmp victim.txt 0.txt
cmp victim.txt 1.txt
Command outputs should look like this:
victim.txt 0.txt differ: byte 252258, line 1
victim.txt 1.txt differ: byte 248460, line 1
In this cases, 0.txt
shares more starting bytes with victim.txt
than 1.txt
.
(byte index values might vary but their relation shouldn't)
Therefore, the attacker learns the MSB of the "secret" scalar is a zero.
A fact we can confirm using the "secret" scalar.
Repeating this process fixing the just recovered MSB and varying the next one
can be used to recover the latter.
This is the rationale of an scalar-based Online Template Attack and will be used in the
next section to recover all bits of an ECDSA nonce using a single trace.
Note that here we are exploiting some input dependent control flow leakage
in the scalar multiplication implementation.
However, a cool feature of OTA is that we don't need to know which concrete vulnerability is to exploit it.
It could be as trivial as simple leakage or one lurking at the depths of the bignum implementation.
The footnotes at Section 5.3 in the paper
pinpoint some of these leaky control flows.
Full scalar recovery
This time we will target an ECDSA signature generation with hardcoded private key and hash.
Our objective is to recover all the bits of the secret nonce k
, which reveals the private key directly.
Lets capture a single victim trace:
./get_victim_trace.sh
This script generates two files: full_victim_trace.txt
and victim_output.txt
.
The latter contains the r
coordinates of the signature,
which will be used to check if the attack succeed as r = kG
.
Before executing the attack, we have to calculate the alignment between the victim trace
and the templates one.
Recall the template implementation only executes a scalar multiplication,
while the ECDSA signature generation involves other computations.
For this task, we must ensure there exist 0.txt
and 1.txt
which contains traces
from the template implementation when the MSB is zero and one respectively.
If you are not sure about this, execute:
./get_template_trace.sh 0xf1307c33d4dad67df8ac0e46628c1ed2b2b8b3962bfc81175843614da98e0006 > 1.txt
./get_template_trace.sh 0x0e4ad966d76ab7d6330105cbf2242c9edcff3eca187e43211ae6b5da0a055f17 > 0.txt
To compute the alignment index run:
python3 find_scalar_mult_start.py
We are interested in this output field: template_start_index that will be a numeric value.
The bit recovery procedure using a scalar-based Online Template Attack is implemented in scalar_based_ota.py
.
To recover all bits of the secret ECDSA nonce the attacker should execute
the following script replacing the
python3 scalar_based_ota.py --template_start_index <template_start_index>
The attack takes time, in our tests it is ~145 min.
Captured traces are big text files hence slow processing.
So, grab a good book and relax :)
When the script finishes you can verify the recovered scalar using r = kG
.
For this task, you can use the template implementation to compute kG
.
../../libs/wolfssl/template/scalar_mult_template M <scalar>
The output should match the r
field in the victim_output.txt
file.
Partial scalar recovery
If you want to try a partial recovery you can use the ecdsa_get_k.py
and the signature output information in victim_output.txt
to compute the nonce k
.
python3 ecdsa_get_k.py <r> <s>
Then you can execute the bit recovery script like:
python3 scalar_based_ota.py --template_start_index <template_start_index> --num_bits 8