Practical homomorphic encryption: A survey


Published in:
Circuits and Systems (ISCAS), 2014 IEEE International Symposium on

Document Version:
Peer reviewed version

Queen's University Belfast - Research Portal:
Link to publication record in Queen's University Belfast Research Portal

Publisher rights
© 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

General rights
Copyright for the publications made accessible via the Queen's University Belfast Research Portal is retained by the author(s) and / or other copyright owners and it is a condition of accessing these publications that users recognise and abide by the legal requirements associated with these rights.

Take down policy
The Research Portal is Queen's institutional repository that provides access to Queen's research output. Every effort has been made to ensure that content in the Research Portal does not infringe any person's rights, or applicable UK laws. If you discover content in the Research Portal that you believe breaches copyright or violates any law, please contact openaccess@qub.ac.uk.
Abstract—Cloud computing technology has rapidly evolved over the last decade, offering an alternative way to store and work with large amounts of data. However data security remains an important issue particularly when using a public cloud service provider. The recent area of homomorphic cryptography allows computation on encrypted data, which would allow users to ensure data privacy on the cloud and increase the potential market for cloud computing. A significant amount of research on homomorphic cryptography appeared in the literature over the last few years; yet the performance of existing implementations of encryption schemes remains unsuitable for real time applications. One way this limitation is being addressed is through the use of graphics processing units (GPUs) and field programmable gate arrays (FPGAs) for implementations of homomorphic encryption schemes. This review presents the current state of the art in this promising new area of research and highlights the interesting remaining open problems.

I. INTRODUCTION

Cloud computing offers many services to users, including the option to offload storage and computation of large amounts of data to the cloud. However to take advantage of cloud computing, users must trust and share their data with the cloud service providers. One way to ensure data privacy is to encrypt data before uploading to the cloud. However, if users wish to compute on the data, the data must be downloaded and decrypted, making the main advantages of using cloud services redundant. One solution for providing secure cloud computing on untrusted public clouds is the use of homomorphic encryption: a method of encryption which allows the efficient evaluation of an arbitrary depth circuit (composed of additions and multiplications) to be evaluated directly on encrypted data. Gentry provided a blueprint for constructing an FHE scheme, an encryption scheme which allows the evaluation of a limited depth circuit. From its introduction in 2009, three main branches of homomorphic encryption schemes have developed: lattice-based, integer-based and learning-with-errors (LWE) or ring-learning-with-errors (RLWE) based encryption.

Research into SHE and FHE could ultimately lead to very exciting developments in a wide variety of areas. The main application of FHE would be to enable secure storage and computation on the cloud. This would serve to further the use and development of cloud-based computing. The technique of homomorphic encryption also lends itself to other important cryptographic applications such as multi-party computation (MPC) [5], [6].

In spite of this potential, current homomorphic encryption schemes are still not efficient enough for real time applications, for example, key generation in Gentry and Halevi's lattice based scheme [7] takes from 2.5 seconds to 2.2 hours. Moreover, a recent implementation required 36 hours for a homomorphic evaluation of AES using FHE by Gentry et al. [8]. Another important limitation of FHE schemes is memory usage; very large ciphertext and public key sizes are required to guarantee adequate security to prevent against possible lattice-based attacks. Gentry and Halevi’s FHE scheme uses public key sizes ranging from 17 MB to 2.25 GB [7]. Also in the previously mentioned evaluation of AES, memory is reported to be the main limiting factor in the implementation [8]. A further limitation to FHE schemes is the costly bootstrapping operation; bootstrapping is required to reduce and manage the noise that is incurred when homomorphic operations are carried out on ciphertexts. Recent research has therefore focused on optimisations to improve efficiency of the FHE schemes, such as minimising the need for the expensive bootstrapping operation [9] or employing batching techniques for multiple bit encryption [10]–[12]. Moreover
the development of optimised FHE architectures targeting alternative platforms to software are being explored, such as for GPU technology [13] or FPGA technology [14]–[18] to try and increase the performance capabilities of these schemes.

II. CURRENT APPROACHES

Initial research on FHE concentrated on lattice based schemes [7], [19], [20], involving relatively large public key sizes and ciphertext sizes. Lattice based FHE schemes, and indeed the other FHE schemes, base their security on hard problems associated with lattices, such as the sparse subset sum problem (SSSP) or the shortest vector problem (SVP). Single instruction, multiple data (SIMD) techniques were proposed by Smart and Vercauteren [10] for these schemes to perform tasks in parallel and thus improve efficiency.

The main focus of the theoretical cryptographic research community is currently on LWE and RLWE based FHE [21]–[23]. LWE was introduced by Regev [24], and has been shown to be as hard as the worst case lattice problems. This problem has been extended to work over rings [25], and this extension increases the efficiency of LWE. An analysis into the practicality of SHE schemes along with an implementation of a SHE scheme by Brakerski and Vaikuntanathan [26] was carried out by Lauter et al. [27].

Integer based schemes were introduced by van Dijk et al. [28] as a theoretically simpler alternative to lattice based schemes and have been further developed to offer similar performance to existing lattice based schemes [29], [30]. Moreover a proposed public key compression technique has reduced the size of the public key from over 2 GB down to a manageable 10 MB [29]. The performance of these integer based schemes has very recently been improved through a proposed batching technique [12] for the encryption of multiple plaintext bits into one ciphertext. The performance of evaluating AES using this batched scheme over the integers is comparable to the alternative implementation by Gentry et al. using the RLWE based FHE scheme [8].

Since the introduction of SHE and FHE in 2009 [4] there have been several optimisations in the field of homomorphic encryption to improve efficiency and speed. As mentioned in the introduction, bootstrapping is an expensive operation used to reduce the noise in the ciphertext by homomorphically decrypting a ciphertext using a secret key that is encrypted and hidden in the public key. There has been a lot of research into improving or avoiding the use of bootstrapping, such as the introduction of modulus switching by Brakerski et al. [9], which helps to minimise the noise generated when carrying out homomorphic operations.

There have been several recent software implementations of FHE schemes, such as [8], [12], and an open source software implementation, hcrypt, of FHE is available online [31]. The latest online introduction of a homomorphic encryption library (HElib), developed by Halevi and Shoup, features an optimised implementation of the FHE scheme proposed by Brakerski et al. [9]. It improves the performance of schemes, offering a speed up by a factor of 12 over previous implementations. It takes 3 hours instead of 36 hours for the homomorphic AES scheme [32], [33].

III. GPU AND FPGA IMPLEMENTATIONS OF HOMOMORPHIC ENCRYPTION SCHEMES

Research into the development of optimised FHE architectures targeting alternative platforms to software for implementations could prove useful in increasing the efficiency of SHE and FHE schemes. There are currently several research groups working towards this goal and recent developments in this area indicate the potential of hardware architectures to drastically increase the efficiency of homomorphic encryption schemes. In particular, the performance of the underlying crypto-primitives required in many of the FHE schemes, such as modular reduction and large multiplication, could be significantly improved through the use of GPU or FPGA technology. FPGA technology offers flexibility at a low cost, in comparison with application specific integrated circuit (ASIC) designs. In addition, modern FPGAs include embedded hardware blocks, which are optimised for multiply-accumulate (MAC) operations, and thus can be exploited when implementing large multiplications.

There has been some research already conducted into hardware implementations of LWE schemes. Hardware building blocks for the LWE cryptosystem were considered by Göttert et al. along with the use of the Fast Fourier Transform (FFT) [36]. An efficient hardware implementation of RLWE encryption is presented by Pöppelmann and Güneysu [37], as proposed by Lyubashevsky et al. [25] and Lindner and Peikert [25], is presented by Pöppelmann and Güneysu [37], which can fit on a low cost Xilinx Spartan-6 FPGA, and thus has relatively small resource usage. It compares favourably with the previously mentioned implementation of LWE by Göttert et al. in terms of hardware resource utilisation.

The recent work on hardware and GPU architectures for large-integer multiplication has improved the efficiency of FHE schemes. A hardware design of polynomial multiplication using FFT targeting lattice-based SHE and FHE schemes and implemented on FPGA was presented by Pöppelmann and Güneysu [38]. The performance of this multiplication operation implemented on a Spartan-6 FPGA is compared to the same operation with similar parameter sizes in the software implementation of a RLWE SHE scheme by Lauter et al. [27] and offers a speed up factor of 39 [38].

The first GPU implementation of a FHE scheme was presented by Wang et al. [13]. The authors implemented the small parameter size version of Gentry and Halevi’s lattice-based FHE scheme [7] on an NVIDIA C2050 GPU using the FFT algorithm, achieving speed up factors of 7.68, 7.4 and 6.59 for encryption, decryption and the recryption operations, respectively. The FFT algorithm was used to target the bottleneck of this lattice-based scheme, namely the modular multiplication of very large numbers, and the authors took advantage of the highly parallelised GPU to accelerate the performance of the FHE scheme. However, the authors state that even with the speed up the implemented FHE scheme remains...
unpractical, as high latency is incurred in the encryption and decryption steps. An extension of the authors’ work [35] involves the modification of arithmetic operations to decrease costly back and forth FFT conversions. This modified method, implemented on an NVIDIA GTX 690, achieves speed up factors of 174, 7.6 and 13.5 for encryption, decryption and the decryption operations, respectively, when compared to results of the implementation of Gentry and Halevi’s FHE scheme [7] that runs on an Intel Core i7 3770K machine. A further FPGA implementation targeting large number multiplication was proposed by Wang and Huang [18]. The authors propose an architecture for a 768K-bit FFT multiplier using a 64K-point finite field FFT as the key component. This component was implemented on both a Stratix V FPGA and a NVIDIA Tesla C2050 GPU and the implementation was around twice as fast on the FPGA as on the GPU [13].

Doröz et al. presented a full custom hardware implementation of very large integer multiplication [34]. Their design is based on Schönhage-Strassen’s Number Theoretic Transform algorithm and includes a 768-KByte cache to decrease I/O transactions. A multiplication takes 7.74 ms for million-bit operands at 666 MHz using a TSMC 90 nm library with an area requirement of 26.7 million equivalent gates. The authors mention that the performance estimates of Gentry and Halevi’s FHE primitive operations match with the previously reported software implementations on a high end Intel Xeon processor [7], and only uses a tiny fraction of the area.

Furthermore, the authors also present the first custom hardware realisation of the Gentry-Halevi FHE scheme [39]. The design introduces new hardware blocks to the previously introduced multiplier to perform all the Gentry-Halevi FHE primitives. The primitives are implemented by using a similar approach as Wang et al. [35] to reduce the number of FFT conversions. The design achieves speed up factors of 1.24, 99.44 and 10.32 for decryption, encryption and decryption operations respectively, when compared to the software implementation by Gentry and Halevi [7]. In comparison with the results of the GPU implementation by Wang et al. [13], the custom hardware design achieves speed up factors of 0.15, 12.15 and 1.35 for encryption, decryption, decryption and the decryption operations, respectively. The authors also state that they achieve these speed up factors utilising a small area of 30 million equivalent gates whereas the NVIDIA Tesla C2050 GPU and the Intel Xeon processor contains 900 million and 205 million equivalent gates respectively.

Other current work has also focused on the use of large integer multiplications required in many FHE schemes [7], [29], [30], in order to maximise the potential speed up factor for hardware implementations. The use of a Comba multiplier [40], which can be implemented efficiently using the DSP slices of an FPGA [41], was proposed by Moore et al. [16] to improve the performance of integer based FHE schemes [28], [30]. An FPGA implementation of a RLWE SHE scheme was also targeted by Cousins et al. [14], [15], in which Matlab Simulink is used to design the FHE primitives.

Lastly, Cao et al. [17] proposed a large integer multiplier using integer-FFT multipliers combined with Barrett reduction to target the multiplication and modular reduction bottlenecks featured in many FHE schemes. The encryption step in the proposed integer based FHE schemes by Coron et al. [29], [30] were designed and implemented on a Xilinx Virtex-7 FPGA. The synthesis results show speed up factors of over 40 are achieved compared to existing software implementations of this encryption step [17]. This speed up illustrates that further research into hardware implementations could greatly improve the performance of these FHE schemes. An overview of the above mentioned multipliers for different platforms is given in Table I.

### IV. Conclusions and Future Directions

Although there has been a lot of recent research in the area of homomorphic cryptography, there are many remaining open problems. In terms of the theoretical research, parameter selection for homomorphic encryption schemes is currently a complex process, as each scheme has specifically selected parameters, all of which are interlinked and these parameters are usually selected based on current possible lattice based attacks and their existing limits. An example of a weakness in this approach to parameter selection was exploited in an attack by Lee [42], where the parameter selection in Gentry and Halevi’s scheme [7] was not conservative enough to prevent a lattice based attack exploiting the sparse subset sum problem. More research into parameter selection is needed to ensure the most suitable parameters are chosen to guarantee both efficiency and security.

In terms of practical FHE implementations, further research into suitable hardware designs and optimisations of existing schemes could provide a large speed up, as indicated in [16]. Optimisations at an algorithmic level are required; for example parameters must be optimised to maximise efficiency of implementations. Moreover, batching techniques proposed for FHE schemes, [10]–[12], could greatly improve performance of any implementation and should also be investigated further. Optimisations at an architectural level are also needed. One major bottleneck in the implementation of these schemes is memory storage. Large parameter sizes and very large

<table>
<thead>
<tr>
<th>Design</th>
<th>Scheme</th>
<th>Platform</th>
<th>Mult.</th>
<th>Encrypt</th>
</tr>
</thead>
<tbody>
<tr>
<td>FFT multiplier</td>
<td>Gentry &amp; Halevi’s FHE scheme [7]</td>
<td>Stratix V FPGA</td>
<td>0.0125s</td>
<td>-</td>
</tr>
<tr>
<td>Optimised FFT</td>
<td>Gentry &amp; Halevi’s FHE scheme [7]</td>
<td>Stratix V FPGA</td>
<td>0.583 ms</td>
<td>0.0062s</td>
</tr>
<tr>
<td>Multiplier /</td>
<td>Gentry &amp; Halevi’s FHE scheme [7]</td>
<td>NVIDIA GTX 680</td>
<td>7.74 ms</td>
<td>2.08%</td>
</tr>
<tr>
<td>reduction</td>
<td>Gentry &amp; Halevi’s FHE scheme [7]</td>
<td>NVIDIA C250 GPU</td>
<td>0.765 ms</td>
<td>1.69%</td>
</tr>
<tr>
<td>Optimised FFT</td>
<td>Gentry &amp; Halevi’s FHE scheme [7]</td>
<td>Stratix V FPGA</td>
<td>0.125 ms</td>
<td>-</td>
</tr>
<tr>
<td>Multiplier /</td>
<td>Gentry &amp; Halevi’s FHE scheme [7]</td>
<td>Xilinx Virtex7 FPGA</td>
<td>-</td>
<td>0.0130s</td>
</tr>
<tr>
<td>reduction</td>
<td>Gentry &amp; Halevi’s FHE scheme [7]</td>
<td>Stratix V FPGA</td>
<td>-</td>
<td>-</td>
</tr>
</tbody>
</table>

**TABLE I**

**OVERVIEW OF FHE IMPLEMENTATIONS: TIMINGS FOR MULT AND SMALL SIZE ENCRYPT**
ciphertext sizes consume large amounts of memory, which requires memory management. Lastly, optimisations to target specific devices, such as using the embedded multipliers on an FPGA, are required. For possible FPGA implementations, the use of off-chip DDR3 memory is almost certainly required and therefore data transfer could become a performance issue. Thus, research into any optimisations reducing the memory requirements would be useful for future implementations.

In conclusion, the area of homomorphic cryptography remains an interesting area with a lot of scope for future research. Although further work on FHE schemes is needed to improve and optimise performance, the new avenue of targeting hardware or GPU technology for optimised FHE architectures also looks very promising, and brings the possibility of real time implementations of FHE a step closer.

ACKNOWLEDGMENT
Funding for this research was in part provided by the US National Science Foundation CNS Awards #1117590 and #1319130.

REFERENCES