# Writeup for PCTF 2022: Pressure

## Challenge Code

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128  from nacl.bindings.crypto_scalarmult import ( crypto_scalarmult_ed25519_noclamp, crypto_scalarmult_ed25519_base_noclamp, ) from nacl.bindings.crypto_core import ( crypto_core_ed25519_scalar_mul, crypto_core_ed25519_scalar_reduce, crypto_core_ed25519_is_valid_point, crypto_core_ed25519_NONREDUCEDSCALARBYTES, crypto_core_ed25519_BYTES ) import struct import os import ast import hashlib import random def sha512(b): return hashlib.sha512(b).digest() CONST = 4096 SECRET_LEN = int(random.randint(128, 256)) SECRET = [random.randint(1, 255) for i in range(SECRET_LEN)] with open('flag', 'r') as f: FLAG = f.read() def hsh(s): h = sha512(s) assert len(h) == crypto_core_ed25519_NONREDUCEDSCALARBYTES return crypto_scalarmult_ed25519_base_noclamp(crypto_core_ed25519_scalar_reduce(h)) def generate_secret_set(r): s = set() for (i, c) in enumerate(SECRET): s.add(hsh(bytes(str(i + 25037 * r * c).strip('L').encode('utf-8')))) return s def genr(): i = 0 while i == 0: i, = struct.unpack('

## Private Set Intersection

The challenge is made up of two parts: handle_client1 and handle_client2, each of which is an instance of a private set intersection protocol based on the Decisional Diffie-Hellman assumption.

Private set intersection (PSI) is a cryptographic protocol that enables two parties (Alice && Bob) to exchange information that they both share, and meanwhile leaks nothing about what they don’t share. The picture below illustrates the idea explicitly.

picture from wikipedia

In the above example, Alice has a set of {1, 2, 3}, and Bob possesses {1, 3, 4, 5, 9}. By applying a PSI protocol, both Alice and Bob can have the knowledge that they share the subset {1, 3}. However, Alice should not be aware of that the Bob’s other subset {4, 5, 9}, and at the same time, Bob is also not supposed to be conscious of Alice’s other subset {2}.

To accomplish such a task, a naive approach is to use hash functions. At first, Alice and Bob both calculate the hash digest of each of the elements of their sets. Alice calculate from her set $A = \{x_1, x_2, …, x_w\}$ to a hash set $H(A) = \{H(x_1), H(x_2), …, H(x_w)\}$, and Bob from $B = \{y_1, y_2, …, y_v\}$ to $H(B) = \{H(y_1), H(y_2), …, H(y_v)\}$. Then, they exchange $H(A)$ and $H(B)$ to each other, and take the intersection of these two hash sets. From the intersection, they can learn the shared elements.

$H(\cdot)$ can be a secure hash function, or anything that functions like a random oracle.

Using hash functions is quite a simple as well as efficient way to do PSI. But, it does not protect the privacy of the two parites if the original sets do not have considerable entropy. Any malicious adversaries could mount a dictionary attack on the hash sets and recover the original sets.

Therefore, more steps are required to build a more secure PSI scheme. One that bases on the Decisional Diffie-Hellman (DDH) assumption is a PSI scheme that is a little bit more secure than that uses hash functions. The DDH assumption, in short, can be stated as: for random $a, b, c$, one cannot distinguish $(g^a, g^b, g^c)$ from $(g^a, g^b, g^{ab})$. Intuitively, the assumption is quite reasonable for that every element in the group is randomly distributed.

picture from the slide in BIU Winter School 2017

The PSI scheme based on DDH works as following:

• Firstly, Alice and Bob calculate the two hash sets

\begin{aligned} H(A) &= \{ H(x_1), H(x_2), …, H(x_w) \} \newline H(B) &= \{ H(y_1), H(y_2), …, H(y_v) \} \end{aligned},

so that each element of the hash set can be easily mapped to a group $G$.

• After that, both of them choose a private scalar, where Alice chooses $\alpha$ and Bob selects $\beta$.

• They then mask the hash set by multiplying (group operation) every element by that private scalar. They get \begin{aligned} H(A)^{\alpha} &= \{H(x_1)^{\alpha}, H(x_2)^{\alpha}, …, H(x_w)^{\alpha}\} \newline H(B)^{\beta} &= \{H(y_1)^{\beta}, H(y_2)^{\beta}, …, H(y_v)^{\beta}\} \end{aligned}

• To continue, they exchanges the masked hash sets $H(A)^{\alpha}, H(B)^{\beta}$.

• After receiving the other masked hash set, they multiple that set by their private scalar again. By doing so, they reach two sets:

\begin{aligned} H(A)^{\alpha \beta} &= \{H(x_1)^{\alpha \beta}, H(x_2)^{\alpha \beta}, …, H(x_w)^{\alpha \beta}\} \newline H(B)^{\alpha \beta} &= \{H(y_1)^{\alpha \beta}, H(y_2)^{\alpha \beta}, …, H(y_v)^{\alpha \beta}\} \end{aligned}

• They exchange $H(A)^{\alpha \beta}, H(B)^{\alpha \beta}$.

• Finally, they compare the the two sets $H(A)^{\alpha \beta}, H(B)^{\alpha \beta}$. Any match implies that they both share that matched element.

Note that the order of the masked set must be kept. Otherwise, the two parties would not be able to identify which specific element they are sharing, but only got the number of the shared elements.

What if a passive attacker wants to acquire some information from this protocol?

What the attacker can see is the only four sets $H(A)^{\alpha}, H(B)^{\beta}, H(A)^{\alpha \beta}$ and $H(B)^{\alpha \beta}$. $H(A)$ and $H(B)$ are both masked by the group scalar multiplication, and every masked element is randomly distributed over the group. Thus, the attacker knows nothing about the two hash sets, and cannot mount the dictionary attack.

What if one of the participant (i.e. Alice or Bob) wants to cheat and gain advantage over the other?

In fact, this is feasible. When exchanging $H(A)^{\alpha \beta}, H(B)^{\alpha \beta}$ , one participant can cheat by not send his/her set to the other, but using the other one. For example, after Bob has received $H(A)^{\alpha \beta}$ from Alice, he can resend $H(A)^{\alpha \beta}$ back to Alice. In this way, Bob avoids leaking his information, but acquires the shared information.

As a result, more sophisticated steps are required to build a robust PSI scheme. Here is a brief survey if it catches the reader’s interest. And plently of papers of this topic are waiting for those who want to explore.

Warning: the rabbit hole goes deep :)

Anyway, the toy PSI scheme described above is enough for us to play. And this challenege is built on top of that.

## Interaction

Back to this challenge, after picking up what is a PSI scheme in a short minute, it’s time to read the code and understand what the interaction between the server and client looks like.

The interaction is broken up into two parts: handle_client1 and handle_client2. The following will explain what happens in these two parts.

### handle_client1

server set:

• Firstly, the server generates a 32-bit random number $r$.
• Secondly, the server generates a random number SECRET_LEN between 128 and 256 that represents the length of a secret list. The secret list is produced by randomly choosing SECRET_LEN numbers ranging from 1 to 255. Each number c in the secret list is then mapped to a point on the ED25519 elliptic curve group by calculating $H(i + 25037\cdot r \cdot c)$, where $H(\cdot)$ is SHA512, and $i$ is the index of that number in the list. This forms the first half part of the server set.
• Thirdly, 4095 random elements are produced by calculating $H(k + 4096\cdot(r \bmod{k}))$, where $k$ starts from 1 and ends to 4095. This makes up the other half part of the server set.
• In total, about more than 4000+ points forms the server set s:

$$H(y_1), H(y_2), …, H(y_w)$$

• Later, the server generates a random scalar b (512-bit), and masks the server set s by multiplying every element of the server set by b. The multiplication is operated on the elliptic curve group, so the result set server_s looks random and leaks nothing about the original server set s.

$$b \cdot H(y_1), b \cdot H(y_2), …, b \cdot H(y_w)$$

client set:

• We are required to input the (masked) client set client_s:

$$a \cdot H(x_1), a \cdot H(x_2), …, a \cdot H(x_v)$$

• Soon after the user sends his (masked) client set client_s, the server multiply client_s by b and reaches the combined set server_combined_client .

$$ab\cdot H(x_1), ab \cdot H(x_2), …, ab \cdot H(x_v)$$

• Finally, the server shuffles, and prints out the combined set server_combined_client and the masked server set server_s.

Upon receiving the two sets server_combined_client and server_s from the server, we can multiply the server_s by a and compare the result with server_combined_client. From the comparison, we can acquire some information about the shared elements. Unfortunately, in this challenge, the two sets are randomly shuffled such that we cannot identify which specific element we are sharing with the server set s, only to get how many elements we are sharing.

### handle_client2

server set:

• The server generates another random scalar b' (512-bit), and masks the server set s to another server_s':

$$b’ \cdot H(y_1), b’ \cdot H(y_2), …, b’ \cdot H(y_w)$$

• Good news is that the masked server set server_s is immediately sent to us by the server.

client set:

• We are required to input the (masked) client set client_s':

$$a’ \cdot H(x_1), a’ \cdot H(x_2), …, a’ \cdot H(x_v)$$

• The server multiply client_s' by b' to reach the combined set masked_c

$$a’ b’ \cdot H(x_1), a’ b’ \cdot H(x_2), …, a’ b’ \cdot H(x_v)$$

• We are supposed to input the other combined set masked_s:

$$a’ b’ \cdot H(y_1), a’ b’ \cdot H(y_2), …, a’ b’ \cdot H(y_w)$$

• If masked_s passes the following 3 check, the server will send the FLAG to us.

1. The length of masked_s is the same as that of server_s.

2. masked_s shares nothing with server_s.

3. The two sets masked_s and masked_c MUST be the same.

## Solution

### Recover r

It looks like that, in order to capture the flag, we should be able to make sure that every $x_i$ equal to every $y_i$. This requices us to be access to all the elements of the server set s. So, there should be some method that we can recover the whole server set s. Following the idea, we look back on how the server set s is made up.

The key point seems to recover the secret number r (32-bit). If we can recover r, it is not hard for us to rebuild the server set s.

In a short time, we find that the second part of the server set is quite suspicious.

Look at the expressionk + CONST * (r % k), where k ranges from 1 to 4095. When k starts to increase from 1, the values k + CONST * (r % k) are very small. We can brute force and recover it by inspecting whether we share any element with the server in the handle_client1 phase.

Upon recovering the value, we can get some information about r % k. We can get the value $x_i$ such that $$x_i \equiv r \pmod{k_i}.$$ Given enough congruence equations like this, we can apply the Chinese remainder theorem to recover r.

But it is sad that the combined sets are disordered by the server before sending to us. And it is hard for us to identify the shared elements.

Is there any way that can identify the shared elements?

@Hermes comes up with a fabulous method.

When brute forcing k + CONST * (r % k), for each k, we will iterate through all the possible values j = (r % k) and send the points $P_j = H(k + CONST \cdot j))$ to the server. In fact, we can tag each point by multiplying by a scalar $s_j = k + CONST \cdot j$. That is, instead of sending one point to the server every guess, we send a pair of two points $P_j$ and $s_k \cdot P_j$ to the server.

If the guess is correct, a pair of two masked points $b \cdot P_j$ and $b \cdot (s_k \cdot P_j)$ will be sent to us, where the latter point is exactly s_k times of the former. We can search for such pairs in the combined sets server_combined_client and identify which specific element we are sharing with the server!

Implementing this method enables us to recover $r$ as well as find all the 4065 elements of the second part.

Proof of Concept script (credit to @Hermes) that runs a successful recovery locally:

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123  from nacl.bindings.crypto_scalarmult import ( crypto_scalarmult_ed25519_noclamp, crypto_scalarmult_ed25519_base_noclamp, ) from nacl.bindings.crypto_core import ( crypto_core_ed25519_scalar_mul, crypto_core_ed25519_scalar_reduce, crypto_core_ed25519_is_valid_point, crypto_core_ed25519_NONREDUCEDSCALARBYTES, crypto_core_ed25519_BYTES ) import struct import os import ast import hashlib import random def sha512(b): return hashlib.sha512(b).digest() CONST = 4096 SECRET_LEN = int(random.randint(128, 256)) SECRET = [random.randint(1, 255) for i in range(SECRET_LEN)] def hsh(s): h = sha512(s) assert len(h) == crypto_core_ed25519_NONREDUCEDSCALARBYTES return crypto_scalarmult_ed25519_base_noclamp(crypto_core_ed25519_scalar_reduce(h)) def generate_secret_set(r): s = set() for (i, c) in enumerate(SECRET): s.add(hsh(bytes(str(i + 25037 * r * c).strip('L').encode('utf-8')))) return s def genr(): i = 0 while i == 0: i, = struct.unpack('

### Pass the Checks

What left to us is the first half of the server set s, which is produced by calculating $H(i + 25037\cdot r \cdot c)$. Since we have recovered r, we can continue to brute force by guessing all the possible c and check if we share those elements with the server.

But, we stuck here… :(

Recovering r can only be done when handle_client1 finishes. In handle_client2, there is no chance for us to mount the brute forcing attack any more. So, it is impossible to recover the first part if we have recovered r.

We manage to think out all kinds of ways that we can recover the first part. We also try many other attack surfaces, only to find nothing useful.

So, there is no way that we can recover the whole server set s?

Alright, I accpet that.

But the challenge is solvable.

It soon occurs to me that, there must be some way that, without the knowledge of the whole server set s, we can pass the checks in handle_client2, .

What if I only have knowledge of just a few elements of the server set s. Can I pass the checks?

Following this idea, I look into the checks again.

The checks don’t include any information about the server set s. What they depend on are the 2 input sets provided by us, which looks suspicious.

The most strict requirement is that, the masked_s set (which is directly provided by us) must be equal to the masked_c set (client_s' multiplied by b'). The only difference is b'. We cannot recover b' since it is a very hard discrete log problem. What we can get about b' is the server provided server_s.

By recovering r, we have the knowledge of most elements of the server set s. And b' multiplied of these elements are given to us by the server.

Acutally, we can somehow construct two sets that satisfy the requirement and pass the checks, by using only one known element $H(y_i)$ and its b' multipled $b’ \cdot H(y_i)$. The following will show how.

The fact is that we even don’t have to recover r to pass the checks.

Notice that the server set MUST contain an element $H(1)$.

 1 2 3  s = generate_secret_set(r) for k in range(1, CONST): s.add(hsh(bytes(str(k + CONST * (r % k)).strip('L').encode('utf-8')))) 

This is always True in that, when calculating the second part of the server set, the enumerator k starts from 1 and the result of k + CONST * (r % k) is 1 in the beginning.

As a result, when processing handle_client2, one element of server_s must be $b \cdot H(1)$. We can arbitrary select one from server_s, and get the correct one with the probability of approximately being $1/n \approx 1/4000$.

By using this known element, we can construct client_s as

$$2 \cdot H(1), 3 \cdot H(1), …, (n+1) \cdot H(1)$$

and masked_s as

$$2 b \cdot H(1), 3 b \cdot H(1), …, (n+1) b \cdot H(1)$$

This is enough to pass the checks…

It turns out that only one shared element is enough to pass the checks.

What we assume at first is that we must recover all the elements. What a miss!

exp.py:

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141  from pwn import * from nacl.bindings.crypto_scalarmult import ( crypto_scalarmult_ed25519_noclamp, crypto_scalarmult_ed25519_base_noclamp, ) from nacl.bindings.crypto_core import ( crypto_core_ed25519_scalar_mul, crypto_core_ed25519_scalar_reduce, crypto_core_ed25519_is_valid_point, crypto_core_ed25519_NONREDUCEDSCALARBYTES, crypto_core_ed25519_BYTES ) import struct import os import ast import hashlib import random def sha512(b): return hashlib.sha512(b).digest() def hsh(s): h = sha512(s) assert len(h) == crypto_core_ed25519_NONREDUCEDSCALARBYTES return crypto_scalarmult_ed25519_base_noclamp(crypto_core_ed25519_scalar_reduce(h)) # code for recovering $r$ CONST = 4096 def egcd(a, b): ''' Extended Euclidean Algorithm. returns x, y, gcd(a,b) such that ax + by = gcd(a,b). ''' u, u1 = 1, 0 v, v1 = 0, 1 while b: q, r = divmod(a, b) u, u1 = u1, u - q * u1 v, v1 = v1, v - q * v1 a, b = b, r return u, v, a def CRT_constructive(ai, mi): # # make sure every two *m*s in *mi* are relatively prime # lcm = lambda x, y: x * y // gcd(x, y) # mul = lambda x, y: x * y # assert(reduce(mul, mi) == reduce(lcm, mi)) assert(isinstance(mi, list) and isinstance(ai, list)) from functools import reduce M = reduce(lambda x, y: x * y, mi) ai_ti_Mi = [a * (M // m) * egcd(M // m, m)[0] for (m, a) in zip(mi, ai)] return reduce(lambda x, y: x + y, ai_ti_Mi) % M primes = [2,3,5,7,11,13,17,19,23,29] S1 = [] P = [] for i in primes: for j in range(i): S1.append(hsh(bytes(str(i + CONST * j).strip('L').encode('utf-8')))) S1.append(crypto_scalarmult_ed25519_noclamp((i + CONST * j).to_bytes(32,'little'), S1[-1])) DEBUG = 0 if DEBUG: HOST = "127.0.0.1" PORT = 9999 else: HOST = "pressure.chal.pwni.ng" PORT = 1337 for turn in range(0x1337): print(f"\n\n\nturn: {turn}") conn = remote(HOST, PORT) # context.log_level = 'debug' # hashcash if not DEBUG: cmd = conn.recvline().strip().decode() res = subprocess.run(cmd.split(" "), capture_output=True) print(res.stdout.strip()) conn.sendline(res.stdout.strip()) # handle_client1 print("handle_client1...") conn.recvuntil(b"Send your data!\n") conn.sendline(repr(S1).encode()) # recover $r$ # client_resp1 = ast.literal_eval(conn.recvline().strip().decode()) # client_resp2 = ast.literal_eval(conn.recvline().strip().decode()) # remainders = [-1]*len(primes) # for e in client_resp1: # if e in client_resp2: # print(e) # for i in range(len(primes)): # for j in range(primes[i]): # if crypto_scalarmult_ed25519_noclamp((primes[i] + CONST * j).to_bytes(32,'little'), e) in client_resp1: # if remainders[i]!=-1: # raise ValueError # else: # print(i,j,remainders[i],primes[i] + CONST * j)# r % primes[i]) # remainders[i] = j # recovered_r = CRT_constructive(remainders,primes) # print(f"recovered_r: {recovered_r}") print("handle_client2...") conn.recvuntil(b"Let's see if we share anything! I'll be the initiator this time.\n") server_ss = ast.literal_eval(conn.recvline().strip().decode()) H1_b = server_ss[0] # guess, probabilty is 1/len(server_ss) H1 = hsh(bytes(str(1).strip('L').encode('utf-8'))) print("generating client_s && masked_s...") client_s = [crypto_scalarmult_ed25519_noclamp(i.to_bytes(32, 'little'), H1) for i in range(2, len(server_ss)+2)] masked_s = [crypto_scalarmult_ed25519_noclamp(i.to_bytes(32, 'little'), H1_b) for i in range(2, len(server_ss)+2)] print(len(server_ss), len(client_s), len(masked_s)) print(f"sending client points {len(repr(client_s))} bytes...") conn.recvuntil(b"Send client points: \n") conn.sendline(repr(client_s).encode()) print(f"sending masked server points {len(repr(client_s))} bytes...") conn.recvuntil(b"Send masked server points: \n") conn.sendline(repr(masked_s).encode()) print("waiting for reply...") reply = conn.recvline().decode() print(reply) conn.close() if "PCTF" in reply: break 

Running the script on 20 threads for about 1~2 hours, we finally capture the flag.