Examine `Broadcast` for optimization potential.
poanetwork
While debugging an issue with failing tests, I encountered some inefficiency in the reliable broadcast algorithm, at least inside the edge-case of a two node network. Consider the following partial log:
```
B: [0] -> [1]: HoneyBadger(0, Message { epoch: 0, content: Subset(Broadcast(0, Value(Proof { #1, root_hash: c002..beaa, value: b808..6d14, .. }))) })
D: [0] -> [1]: HoneyBadger(0, Message { epoch: 0, content: Subset(Broadcast(0, Echo(Proof { #0, root_hash: c002..beaa, value: 0000..3a39, .. }))) })
E: [1] -> [0]: HoneyBadger(0, Message { epoch: 0, content: Subset(Broadcast(0, Echo(Proof { #1, root_hash: c002..beaa, value: b808..6d14, .. }))) })
G: [1] -> [0]: HoneyBadger(0, Message { epoch: 0, content: Subset(Broadcast(0, Ready(c002..beaa))) })
J: [0] -> [1]: HoneyBadger(0, Message { epoch: 0, content: Subset(Broadcast(0, Ready(c002..beaa))) })
```
This is almost in line with the algorithm description on [page 7](https://eprint.iacr.org/2016/199.pdf); we are sending `VAL(h, b_i, s_i)` to every node in the network and expect an `ECHO(h, b_i, s_i)` back for each, finally a `READY(h)`. The full text-book message log would look like this:
```
A: [0->0] VAL(h, b_0, s_0)
B: [0->1] VAL(h, b_1, s_1)
C: [0->0] ECHO(h, b_0, s_0)
D: [0->1] ECHO(h, b_0, s_0)
E: [1->0] ECHO(h, b_1, s_1)
F: [1->1] ECHO(h, b_1, s_1)
G: [1->0] READY(h)
H: [1->1] READY(h)
I: [0->0] READY(h)
J: [0->1] READY(h)
```
This is twice as many messages, because `A`, `C`, `F`, `H` and `I` are messages that need not leave the node and are short-circuited inside the code already (instead of returning it an `Step` output, it is directly handled by calling the respective `handle_` function).
## Possible optimizations
There might be some more room for optimization here; consider message `E`. In this case, node 1 is echoing back what node 0 sent in its `VAL` message. This message contains no new information for node 0, it merely functions as an acknowledgement. The resilience gained here is minimal, a smart attacker that controls a faulty node can always reply with a correct `ECHO` message to avoid suspicion.
In general, it might be possible to omit the `ECHO` back to the sender of the `VAL` that triggered it, resulting in the following sequence:
```
B: [0->1] VAL(h, b_1, s_1)
D: [0->1] ECHO(h, b_0, s_0)
G: [1->0] READY(h)
J: [0->1] READY(h)
```
Some other message back to the original node could potentially be dropped, but this a bigger trade-off: We are trading efficiency (less messages) for a weaker "non-malicious" error detection; the messages can be checked to ensure nodes that meant well are up, but will not deter a corrupted node that is actively working to deceive the network.
## Gains
The gains per removed message in a network of `N` nodes are modest for large networks: Instead of sending `N-1 + N(N-1) + N(N-1)` messages (`VAR`, `ECHO`, `READY`), we would be sending `N-1 + (N-1)(N-1) + N(N-1)` messages; at 25 nodes this is a 2% reduction per broadcast, smaller networks benefit much more (e.g. around 9% with 5 nodes). Broadcast messages account for about 2/3rds of all messages (correct me if I am wrong), also the `ECHO` message carries a substantial payload (as opposed to `READY`).
I'd like some feedback on these observations, I don't know if implementing this potential optimization is worthwhile right now, but it is something to keep in mind for the future. More important is verifying that no cryptographic properties are altered by this modification; if that is the case, we can keep this issue around as a warning ;).