[Gnuk-users] Gnuk on a faster MCU

Aurelien Jarno aurelien at aurel32.net
Mon Sep 11 20:57:32 UTC 2017


On 2017-09-11 10:17, NIIBE Yutaka wrote:
> Hello,
> Firstly, let me explain current status of Chopstx/NeuG/Gnuk.

Thanks, it's a lot clear to me where things are going.

> Aurelien Jarno <aurelien at aurel32.net> wrote:
> > I therefore started to prototype things a bit, and I "ported" Gnuk on a
> > STM32L432 MCU. I say "ported" because I have done things quick and dirty
> > and the keys are not even stored in flash, but in RAM. This MCU has a
> > Cortex-M4 CPU running at 80MHz and tiny caches (1kB for instructions,
> > 256B for data). It's available in a QFN32 case, even smaller than the
> > STM32F103. It's also able to do crystal-less USB (I haven't tried yet).
> >
> > On such a CPU, Gnuk is able to do a RSA2048 decryption in 0.84s and
> > a RSA4096 decryption in 5.18s (vs 1.27s and 8.22s on FST-01). The gains
> > are mainly due to the instruction cache, as it hides the wait states of
> > the flash memory. The remaining gain comes from the single cycle
> > multiply-and-add instructions. I have been able to get these down to
> > respectively 0.65s and 3.87s by using the UMAAL DSP instruction in
> > MULADDC and mpi_montsqr.
> Great!
> QFN32, crystal-less USB and single cycle multiply-and-add sound great to
> me.  I'm afraid STM32L432 has more features.

Yes, it has a few more interesting features like 3 capacitive sensing
channels (useful for example) to add an authentication validation and a
random number generator. I won't fully trust such a generator, but it
can provide additional entropy to the one provided by the ADC. In order
to port Gnuk quickly I actually replaced all the ADC/Neug code by a call
to the RNG. Note that the STM32L432 only has a single ADC.

I guess such a small chip without crystal can be used to create
something like tomu.im, though probably slightly bigger as there is also
a regulator to add. It just get more difficult to find a thin PCB
manufacturer, especially for prototypes or small series.

> > I am still pondering wether to try with even faster MCU, like an STM32F4
> > at 168MHz even if it comes in a bigger LQFP64 case. I would consider
> > getting a < 2s signature / decryption for a RSA4096 something
> > acceptable.
> I think that 2 seconds is acceptable.  When I started the development of
> OpenPGPcard alternative, it took like 5 seconds for RSA1024 with
> ATmega328 running 20MHz.  I didn't feel it's acceptable for my own use
> cases.  Then, for RSA2048, it was something like 2 seconds with
> STM32F103 with PolarSSL in 2010.  Thus, I started Gnuk.
> BTW, currently we are using p*q modulus.  It is known that multi prime
> modulus can speed up RSA computation (It is patented by US5848159, still
> effective).  There was a technique of p^k*q modulus, which was patented
> by US6396926.  I found that the latter patent was expired in 2010, due
> to failure to pay maintenance fee.  For me, the latter technique seems
> to be covered by more general multi prime modulus technique.  If not, I
> wonder we can use that.

Thanks for the pointer, I'll try to have a look at that. That would also
benefit the FST-01 users.

I have started to look at optimizing the existing low level math
operations. The thing I have learned with the Cortex M4, is that the
addition really comes for free with a multiplication, so algorithms
which try to trade multiplications for multiplications are not faster.
Adding two variables with a carry is actually faster using UMAAL with
the two multiplicand being zero than using ADDS and ADC.

> > It seems the biggest portability issue concerns the flash.
> Right.
> > The current code assumes that the pages are small (1 or 2kB) and that
> > the writes are done 2 bytes by 2 bytes. These assumptions are used in
> > src/flash.c, but also define the format of the data in
> > src/openpgp-do.c.
> I feel that src/openpgp-do.c requires major surgery.  It assumes "2
> bytes by 2 bytes", and data can be overwritten by more 0-bit data.

I fully agree. Also I guess flash.c should provide a bit more of
abstraction so that we can provide a different version of flash.c for a
different MCU without any change to openpgp-do.c

> > I wonder if one way to fix that would be to use a single data pool,
> > with the possibility to store longer objects like keys or certificates.
> > It would mean triggering the garbage collector each time a sensitive
> > data like a private key has been removed or replace. This is however a
> > significant change to the current code.
> I think that since data for private key is better to be handled
> carefully, it is good we have different data storage and different
> access routines.

I understand your concern. That said given the low granularity of the
sections on the more advanced STM32 MCUs, there is not a lot of
alternatives. One can imagine keeping the private keys in RAM just the
time of the flash erase, but it means the data is loss in case of power

> For certificate data object, I want to kill the feature.


Thanks for all your answers. I guess for now I'll continue working on
porting Gnuk to the STM32L432. I guess I'll try to cleanup all my
patches and start to submit the chopstx related ones in the next
days or weeks.


Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien at aurel32.net                 http://www.aurel32.net

More information about the gnuk-users mailing list