# Seeds of SEED: H-CRAM: In-memory Homomorphic Search Accelerator using Spintronic Computational RAM

Hüsrev Cılasun, Salonik Resch, Zamshed I. Chowdhury, Masoud Zabihi, Zhengyang Zhao, Thomas Peterson, Jian-Ping Wang, Sachin S. Sapatnekar, Ulya R. Karpuzcu

University of Minnesota

This work was supported in part by NSF grant no. SPX-1725420.



- Secure data retrieval
  - Sensitive databases, bioinformatics, biomarker search
  - Encryption/decryption is a vulnerability

- Secure data retrieval
  - Sensitive databases, bioinformatics, biomarker search
  - Encryption/decryption is a vulnerability
- Homomorphic Encryption
  - Computation in encrypted domain
  - General-purpose computation is prohibitively costly
    - Encrypted computation accumulates noise on cyphertext
    - Bootstrapping: noise reduction without decryption

- Secure data retrieval
  - Sensitive databases, bioinformatics, biomarker search
  - Encryption/decryption is a vulnerability
- Homomorphic Encryption
  - Computation in encrypted domain
  - General-purpose computation is prohibitively costly
    - · Encrypted computation accumulates noise on cyphertext
    - Bootstrapping: noise reduction without decryption
- SCAM: Secure Content Addressable Memory
  - Encrypted matching can avoid bootstrapping
  - Reduces to parallelized long addition
  - Data size blowup

- Secure data retrieval
  - Sensitive databases, bioinformatics, biomarker search
  - Encryption/decryption is a vulnerability
- Homomorphic Encryption
  - Computation in encrypted domain
  - General-purpose computation is prohibitively costly
    - · Encrypted computation accumulates noise on cyphertext
    - Bootstrapping: noise reduction without decryption
- SCAM: Secure Content Addressable Memory
  - Encrypted matching can avoid bootstrapping
  - Reduces to parallelized long addition
  - Data size blowup
- Spintronic CRAM: Fuses compute and memory
  - True in-memory processing semantics
  - Enables massive parallelism

- Secure data retrieval
  - Sensitive databases, bioinformatics, biomarker search
  - Encryption/decryption is a vulnerability
- Homomorphic Encryption
  - Computation in encrypted domain
  - General-purpose computation is prohibitively costly
    - · Encrypted computation accumulates noise on cyphertext
    - Bootstrapping: noise reduction without decryption
- SCAM: Secure Content Addressable Memory
  - Encrypted matching can avoid bootstrapping
  - Reduces to parallelized long addition
  - Data size blowup
- Spintronic CRAM: Fuses compute and memory
  - True in-memory processing semantics
  - Enables massive parallelism
- H-CRAM: SCAM in CRAM to accelerate homomorphic search

• Plaintext comparison of *w*-bit *x* and *y*:

$$f(x,y) = \prod_{i=1}^{w} \overline{x_i \oplus y_i}$$
(1)

• Plaintext comparison of *w*-bit *x* and *y*:

$$f(x,y) = \prod_{i=1}^{w} \overline{x_i \oplus y_i}$$
(1)

Homomorphic equivalent is cyphertext subtraction:

$$Hom \widehat{XOR} - OR(x, y) = \sum_{i=0}^{w} c_{x_i} - c_{y_i}$$
(2)

• Plaintext comparison of *w*-bit *x* and *y*:

$$f(x,y) = \prod_{i=1}^{w} \overline{x_i \oplus y_i}$$
(1)

• Homomorphic equivalent is cyphertext subtraction:

$$Hom\widehat{XOR}-OR(x,y) = \sum_{i=0}^{w} c_{x_i} - c_{y_i}$$
(2)

Cyphertext expansion: 6.8KB per bit! (128-bit security)

• Plaintext comparison of *w*-bit *x* and *y*:

$$f(x,y) = \prod_{i=1}^{w} \overline{x_i \oplus y_i}$$
(1)

Homomorphic equivalent is cyphertext subtraction:

$$Hom\widehat{XOR}-OR(x,y) = \sum_{i=0}^{w} c_{x_i} - c_{y_i}$$
(2)

- Cyphertext expansion: 6.8KB per bit! (128-bit security)

• Plaintext comparison of *w*-bit *x* and *y*:

$$f(x,y) = \prod_{i=1}^{w} \overline{x_i \oplus y_i}$$
(1)

Homomorphic equivalent is cyphertext subtraction:

$$Hom\widehat{XOR}-OR(x,y) = \sum_{i=0}^{w} c_{x_i} - c_{y_i}$$
(2)

- Cyphertext expansion: 6.8KB per bit! (128-bit security)
- Suffers from memory access overhead

- Logic Line (LL)
- Even/Odd Select Lines (E/OSL)
- Wordline Read/Write (WLR/W)



- Logic Line (LL)
- Even/Odd Select Lines (E/OSL)
- Wordline Read/Write (WLR/W)



- Cell states Parallel/Antiparallel $\leftrightarrow$ low/high resistance ( $\sim$ 0/1)
- Memory mode: Standard read/write operations

- Logic Line (LL)
- Even/Odd Select Lines (E/OSL)
- Wordline Read/Write (WLR/W)



- Cell states Parallel/Antiparallel $\leftrightarrow$ low/high resistance ( $\sim$ 0/1)
- Memory mode: Standard read/write operations
- Compute mode: Logic gate formation between cells

- Logic Line (LL)
- Even/Odd Select Lines (E/OSL)
- Wordline Read/Write (WLR/W)



- Cell states Parallel/Antiparallel $\leftrightarrow$ low/high resistance ( $\sim$ 0/1)
- Memory mode: Standard read/write operations
- Compute mode: Logic gate formation between cells
  - A resistive network is established using control lines

- Logic Line (LL)
- Even/Odd Select Lines (E/OSL)
- Wordline Read/Write (WLR/W)



- Cell states Parallel/Antiparallel $\leftrightarrow$ low/high resistance ( $\sim$ 0/1)
- Memory mode: Standard read/write operations
- Compute mode: Logic gate formation between cells
  - A resistive network is established using control lines
  - *in*<sub>0</sub>&*in*<sub>1</sub> currents get combined, pass through *out* (preset to 1)

- Logic Line (LL)
- Even/Odd Select Lines (E/OSL)
- Wordline Read/Write (WLR/W)



- Cell states Parallel/Antiparallel $\leftrightarrow$ low/high resistance ( $\sim$ 0/1)
- Memory mode: Standard read/write operations
- Compute mode: Logic gate formation between cells
  - A resistive network is established using control lines
  - *in*<sub>0</sub>&*in*<sub>1</sub> currents get combined, pass through *out* (preset to 1)
    - out remains unchanged if  $in_0 in_1 = 00, 01, 10$
    - out reset to 0 if  $in_0in_1 = 11$

- Logic Line (LL)
- Even/Odd Select Lines (E/OSL)
- Wordline Read/Write (WLR/W)



- Cell states Parallel/Antiparallel $\leftrightarrow$ low/high resistance ( $\sim$ 0/1)
- Memory mode: Standard read/write operations
- Compute mode: Logic gate formation between cells
  - A resistive network is established using control lines
  - *in*<sub>0</sub>&*in*<sub>1</sub> currents get combined, pass through *out* (preset to 1)
    - out remains unchanged if  $in_0 in_1 = 00, 01, 10$
    - out reset to 0 if  $in_0in_1 = 11$
  - Practically universal NAND gate!

# H-CRAM: SCAM in CRAM

- RCA: O(n), CLA and others:  $O(n \log n) (O(\log n) \text{ depth})$
- CLA adder based SCAM logic flow



## H-CRAM: SCAM in CRAM

- RCA: O(n), CLA and others:  $O(n \log n) (O(\log n) \text{ depth})$
- CLA adder based SCAM logic flow



- Mapped to **Processing** and **Control** arrays
- · De Bruijn graph topology for inter-array connectivity

| Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Processing<br>Proces |  |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| Processing<br>Unit<br>Processing                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |  |

| bits | <- <u>64 bits</u> -+ | + 64 bits + | 64 bits | + 64 bits | bits | _ |
|------|----------------------|-------------|---------|-----------|------|---|
| LL   | SLE                  | SLO         | WLW     | WLR       | AEN  | - |
|      |                      |             |         |           |      |   |
| LL   | SLE                  | SLO         | WLW     | WLR       | AEN  | - |
|      |                      |             | 1       |           |      |   |
| ш    | SLE                  | SLO         | WLW     | WLR       | AEN  | - |
|      |                      |             |         |           |      |   |
| ш    | SLE                  | SLO         | WLW     | WLR       | AEN  | - |
|      |                      |             | 1       |           |      |   |
| LL   | SLE                  | SLO         | WLW     | WLR       | AEN  | - |
|      |                      |             |         |           |      |   |
|      |                      |             | -       |           |      |   |
|      |                      |             | 1.5     |           |      | _ |
|      |                      |             | (a)     |           |      |   |

| • |   |                  | 64                                | bits             |                                   |   |       | • |
|---|---|------------------|-----------------------------------|------------------|-----------------------------------|---|-------|---|
| A | в | G <sub>L</sub> s | P <sub>L</sub> s/C <sub>L</sub> s | G <sub>H</sub> s | P <sub>H</sub> s/C <sub>H</sub> s | s | Buff. |   |
| A | в | G <sub>L</sub> s | P <sub>L</sub> s/C <sub>L</sub> s | G <sub>H</sub> s | P <sub>H</sub> s/C <sub>H</sub> s | s | Buff. | l |
|   |   |                  | :                                 |                  |                                   |   |       | ľ |
| A | в | G <sub>L</sub> s | P <sub>L</sub> s/C <sub>L</sub> s | G <sub>H</sub> s | P <sub>H</sub> s/C <sub>H</sub> s | s | Buff. |   |
|   |   |                  | (                                 | b)               |                                   |   |       |   |



H-CRAM CLA is 2× faster vs. HEGA, ~EDP vs. 2T+1 FeFET



H-CRAM CLA is 2× faster vs. HEGA, ~EDP vs. 2T+1 FeFET



H-CRAM scales efficiently with word size!

## Vision

- H-CRAM enables
  - Multi-grain parallelism
  - Memory access overhead elimination
  - True processing in-memory semantics, tight coupling of memory&compute

## Vision

- H-CRAM enables
  - Multi-grain parallelism
  - Memory access overhead elimination
  - True processing in-memory semantics, tight coupling of memory&compute
- Efficient acceleration of long addition in SCAM

## Vision

- H-CRAM enables
  - Multi-grain parallelism
  - Memory access overhead elimination
  - True processing in-memory semantics, tight coupling of memory&compute
- Efficient acceleration of long addition in SCAM
- Future work: general purpose homomorphic computing
  - How to extend H-CRAM to more complex homomorphic computation without bootstrapping?

Thank you! Questions?