Transactions on Cryptographic Hardware and Embedded Systems, Volume 2021
Online Template Attacks: Revisited:
PoC: emulated single-trace attack on wolfSSL scalar multiplication
This PoC simulates an Online Template Attack against wolfSSL v4.4.0 side-channel protected ECC scalar multiplication implementation. It demonstrates the feasibility of one the attacks carried out in the paper Online Template Attacks: Revisited published at TCHES 2021.
For this task, we assume the adversary can mount a CopyCat attack on said implementation. We use an emulator based on TracerGrind to capture victim and templates traces.
This document guides the user through this PoC explaining each executed step.
Additionally, we provide a script that automates the whole process in one step.
Before executing it, please ensure the Requirements are satisfied,
then you can try:
About CopyCat attack
CopyCat is a controlled channel attack on TEEs such as SGX: [paper]. CopyCat tracks the number of executed instructions at memory page granularity.
For instance, the CopyCat trace: "4A 3B 2C" means that at memory page offset A 4 instructions were executed, 3 at B, and 2 at C.
Following our OTA paper analysis on this library (Section 4.4), if an adversary mounts a CopyCat attack on this library using all the memory page offsets this library uses it will obtain an ideal state distinguisher. This means the scalar can be recovered easily.
Build and install
We tested the procedure below on Ubuntu 18.04 and 20.04. For another Linux flavors it should work, if it doesn't you can use a VM with this Ubuntu image without updating it.
TracerGrind is built on Valgrind which might cause incompatibilities with future environments. If you notice some issues, use the vanilla Ubuntu image mentioned above.
Requirements (Ubuntu 18.04)
sudo apt-get install build-essential automake git libcapstone-dev libsqlite3-dev libtool
This PoC assumes bash, wget and python3 (tested with 3.6.9) are installed.
About next builds: TracerGrind and wolfSSL shared library are built and installed in the project directory, they don't conflict with your environment.
Build TracerGrind install
Build wolfSSL library, victim and template implementations
Both victim and template binaries are linked to the wolfSSL shared library built before.
It generates an ECDSA signature with fixed private key (
d) and hash (
./libs/wolfssl/victim/ecc-sign should output something like:
r = 0x7CAB2AB4F3A0BE5860DF4D458E7481EB3F07B029F72D0E1AD6F071090011B3E7 s = 0x92F1938C047D9AE768BB6A03ABD76E70436CE16735C34B8EBD51600257EB163F
The template implementation accepts a hex represented integer as input (
G is the point generator for curve secp256r1.
The output of
./libs/wolfssl/template/scalar_mult_template M 1 should be:
X = 0x6b17d1f2e12c4247f8bce6e563a440f277037d812deb33a0f4a13945d898c296
At this point we have a template implementation that will be used to capture template traces with a known scalar.
In this case we are going to perform a scalar-based OTA. For more technical details see Section 3.2 in the paper. (these details are not needed to use this PoC).
Configure trace capture scripts
libwolssl.so address range in TracerGrind
TracerGrind records all executed instructions by a binary. We are interested only on wolfSSL shared library instructions. Therefore we must know the address range where this library is loaded to filter it during trace capturing.
The output should be something like:
[L] Loaded /home/tches/ota_artifact/libs/wolfssl/wolfssl-4.4.0-stable/build/lib/libwolfssl.so.24.1.0 from 0x0000000004c39b30 to 0x0000000004c7aaef
At the end of the line we get the address range used by wolfSSL when executed by TracerGrind.
ADDR_RANGE variable in capture scripts
With this we are limiting the page offsets used in the attack to
(please modify the command above according to
sed -ie "s/^ADDR_RANGE=.*$/ADDR_RANGE=0x4c39b30-0x4c7aaef/" get_victim_trace.sh sed -ie "s/^ADDR_RANGE=.*$/ADDR_RANGE=0x4c39b30-0x4c7aaef/" get_template_trace.sh
Once this configuration is done, there is no need to repeat previous steps for different attack runs.
Here we are going to test the attack environment. First we are going to check if the template implementation generates the same traces for the same input. Section 3.3 of the paper fully describes this methodology.
Lets capture five template traces with the same scalar. If the implementation follows a deterministic path regarding CopyCat the traces will be identical.
for i in `seq 5`; do ./get_template_trace.sh 0x510670a78ba39c5849d92b527ed72425afdc12534106640d81430f79d49d9f25 > $i.txt; done
md5sum 1.txt 2.txt 3.txt 4.txt 5.txt
The md5 hash of those files should be the same. If they are not, it is very unlikely that this PoC attack will succeed.
Identical traces could be due to input-independent code, thus lets try with a different scalar:
./get_template_trace.sh 0xd1be4e05f62f36df445621bd7e1438ef43e6fb5b85327014ff3deec84cf02eff > 6.txt
Comparing again will show they are different.
md5sum 1.txt 6.txt
This suggests there is some dependence between the scalar and the CopyCat trace. This test says nothing about how strong said dependence is, however the results in our paper demonstrate there is actually a leakage that can be used to uniquely distinguish the processed data.
Lets recover a bit, shall we?
This toy example is the base of the full attack that will be executed later.
This time we will only play with the template implementation to get the attack idea.
Lets capture a victim scalar multiplication trace with a "secret" scalar.
./get_template_trace.sh 0x510670a78ba39c5849d92b527ed72425afdc12534106640d81430f79d49d9f25 > victim.txt
Then we play the attacker role capturing two template traces using two scalars unrelated to the "secret" one:
One will have its most significant bit set:
./get_template_trace.sh 0xf1307c33d4dad67df8ac0e46628c1ed2b2b8b3962bfc81175843614da98e0006 > 1.txt
and the other cleared.
./get_template_trace.sh 0x0e4ad966d76ab7d6330105cbf2242c9edcff3eca187e43211ae6b5da0a055f17 > 0.txt
The targeted scalar multiplication is implemented in function
Said function contains a few implementations controlled by preprocessor directives.
We are interested in the side-channel protected one link:
The targeted implementation executes a Montgomery ladder, thus the bits of the scalar are parsed left-to-right.
In the previous section we confirmed two traces of the same scalar will be identical. Moreover, according to Section 4.4 of the paper the CopyCat traces can be used to distinguish the data processed at each Montgomery ladder iteration.
Therefore the higher the number of most significant bits (MSBs) the secret scalar shares with a template one, the higher the number of bytes victim and template traces will share since their beginning.
Lets find the index of the first difference between the
victim.txt trace and the templates ones.
cmp victim.txt 0.txt
cmp victim.txt 1.txt
Command outputs should look like this:
victim.txt 0.txt differ: byte 252258, line 1 victim.txt 1.txt differ: byte 248460, line 1
In this cases,
0.txt shares more starting bytes with
(byte index values might vary but their relation shouldn't)
Therefore, the attacker learns the MSB of the "secret" scalar is a zero. A fact we can confirm using the "secret" scalar.
Repeating this process fixing the just recovered MSB and varying the next one can be used to recover the latter.
This is the rationale of an scalar-based Online Template Attack and will be used in the next section to recover all bits of an ECDSA nonce using a single trace.
Note that here we are exploiting some input dependent control flow leakage in the scalar multiplication implementation. However, a cool feature of OTA is that we don't need to know which concrete vulnerability is to exploit it. It could be as trivial as simple leakage or one lurking at the depths of the bignum implementation. The footnotes at Section 5.3 in the paper pinpoint some of these leaky control flows.
Full scalar recovery
This time we will target an ECDSA signature generation with hardcoded private key and hash.
Our objective is to recover all the bits of the secret nonce
k, which reveals the private key directly.
Lets capture a single victim trace:
This script generates two files:
The latter contains the
r coordinates of the signature,
which will be used to check if the attack succeed as
r = kG.
Before executing the attack, we have to calculate the alignment between the victim trace and the templates one. Recall the template implementation only executes a scalar multiplication, while the ECDSA signature generation involves other computations.
For this task, we must ensure there exist
1.txt which contains traces
from the template implementation when the MSB is zero and one respectively.
If you are not sure about this, execute:
./get_template_trace.sh 0xf1307c33d4dad67df8ac0e46628c1ed2b2b8b3962bfc81175843614da98e0006 > 1.txt ./get_template_trace.sh 0x0e4ad966d76ab7d6330105cbf2242c9edcff3eca187e43211ae6b5da0a055f17 > 0.txt
To compute the alignment index run:
We are interested in this output field: template_start_index that will be a numeric value.
The bit recovery procedure using a scalar-based Online Template Attack is implemented in
To recover all bits of the secret ECDSA nonce the attacker should execute
the following script replacing the
python3 scalar_based_ota.py --template_start_index <template_start_index>
The attack takes time, in our tests it is ~145 min. Captured traces are big text files hence slow processing. So, grab a good book and relax :)
When the script finishes you can verify the recovered scalar using
r = kG.
For this task, you can use the template implementation to compute
../../libs/wolfssl/template/scalar_mult_template M <scalar>
The output should match the
r field in the
Partial scalar recovery
If you want to try a partial recovery you can use the
and the signature output information in
victim_output.txt to compute the nonce
python3 ecdsa_get_k.py <r> <s>
Then you can execute the bit recovery script like:
python3 scalar_based_ota.py --template_start_index <template_start_index> --num_bits 8