# Discrete neural algorithmic reasoning

*In this work, we achieve perfect neural execution of several algorithms by forcing the node and edge representations to be from a fixed finite set. Also, the proposed architectural choice allows us to prove the correctness of the learned algorithms for any test data.*

### Neural algorithmic reasoning The link has been copied to clipboard

For the BFS problem, we are given a graph and a starting node. For each node, we predict its parent in the BFS tree.

### Outline of our work The link has been copied to clipboard

- State discretization does not allow the model to use complex and redundant dependencies in data;
- Hard attention is needed to ensure that attention weights will not be annealed for larger graphs. Also, hard attention limits the set of possible messages that each node can receive;
- Separating discrete and continuous flows is needed to ensure that state discretization does not lose information about continuous data.

### Our approach The link has been copied to clipboard

*states*. We ensure such property by adding discrete bottlenecks after the message passing procedure.

For the BFS problem, we use two node states (Discovered, NotDiscovered) and two edge states (Pointer, NotAPointer). At the initial step, only the starting node has the state Discovered.

For the BFS problem, we use positional information to break ties in traversal order. Each node chooses as a parent the neighbor from the previous distance layer with the smallest index. As graphs can be arbitrarily large, operating with positional information would require infinite precision, so we propose not to discretize it.

*select best*from RASP [3] and allows the attention block to attend depending on the predefined states and to use scalar priorities only to break ties.

For the BFS problem, the proposed selector allows us to exactly perform computations like “for unvisited node select the best visited neighbor”.

*discrete*manipulations over input data (e.g., addition instead of linear combination with arbitrary learnable weights), so we propose to learn discrete manipulations with scalars, which can be performed by external modules depending on discrete node/edge features (see the illustration below). We refer to our paper for the details.

### Results The link has been copied to clipboard

For the BFS problem, we can inspect the model and verify that the attention of Visited→Unvisited dominates the attention of Unvisited→Unvisited, so, at each step, each unvisited node will receive the message from a visited neighbor if such exists, or from an unvisited node otherwise. Then, by passing the corresponding states to the MLP after the message passing, we can check if the receiver node becomes visited or not regarding the received message, and so on.

### Future work The link has been copied to clipboard

## References

- Rodionov G. and Prokhorenkova L. “Discrete neural algorithmic reasoning.” 2024.←
- Shi Y. et al. “Masked label prediction: Unified message passing model for semi-supervised classification.” 2020.←
- Weiss G., Goldberg Y., Yahav E. “Thinking like transformers.” ICML 2021.←
- Minder J, Grötschla F, Mathys J, Wattenhofer R. “SALSA-CLRS: A sparse and scalable benchmark for algorithmic reasoning.” LOG 2023.←
- Ibarz B. et al. “A generalist neural algorithmic learner.” LOG 2022.←
- Bohde M., Liu M., Saxton A., Ji S. “On the Markov property of neural algorithmic reasoning: Analyses and methods.” ICLR 2024.←