Boosting Sample Efficiency and Generalization in Multi-agent Reinforcement Learning via Equivariance

Venue: NeurIPS
Year: 2024
Authors: Joshua McClellan, Naveed Haghani, John Winder, Furong Huang, Pratap Tokekar
Topic: RL

🌟 Highlights

In a nutshell and between all the fluff, the only key difference between EGNN and E2G is the addition of the $\phi_{u_2}(m_i)$ term. It takes some careful reading to realize this. I believe the authors fluffed and obscured this a bit so a lazy reviewer wouldn't reject it with the reasoning that it was only a minor incremental improvement to EGNN. However, the paper still addresses a fundemental challenge in a fairly rigorous way. I would have accepted the paper regardless.

📝 Summary

Rotational and reflection symmetries (of the O(n) symmetry group) are fairly common in Reinforcement Learning scenarios. While solutions such as EGNN has been proposed to exploit equivariance, EGNN still suffers from both early exploration bias and sub-optimal generalization. The authors introduce E2GN2, a modified version of EGNN which seeks to eliminate early exploration bias and further improves generalization. The authors show empirically that E2GN2 outperforms SOTA on standard MARL tests as well as provides rigorous theoretical proofs.

🧩 Key Contributions

E2GN2 - Exploration-enhanced Equivariant Graph Neural Networks, which exploits environmental symmetries in the form of either equivariance, or invariance.
The authors show that E2GN2 has no early exploration bias and is equivariant to both rotations and reflections.
The authors show that E2GN2 has the ability to generalize well to test scenarios over baselines given it's equivariance guarantees

✅ Strengths

I enjoyed the experiments section. The generalization section in particular was very intuitive
Figure 1, showing precisely the problem domain they were targeting did a great job setting everything up right away.
The key title and description the others try to push is "sample efficiency". Their contributions sum up to sample efficiency (i.e. the agent learns a policy for a given scenario and is able to generalize by design to O(n) configurations) and improving generalization. Overall, I think the paper hits those points well.

⚠️ Weaknesses / Questions

Some of the terms and terminology could have been repeated, interpreted better, or simply put in a table. It was difficult for a reader, even one versed in reinforcement learning and group theory, to keep track of everything. For this reason, I'm dedicating some of this section to what some of the terms mean in case I re-read this paper. Here's typical EGNN:
- $\textit{h}_{i}$ = the invariant features (do not transform under group actions like type)
- $\textit{u}_{i} \in R^n =$ are equivariant features (do transform under group actions like rotations)
- $\textbf{m}_{ij} = \phi_e (h_i^l, h_j^l, || (u_i^l - u_i^j) ||^2$ = a multi-layer perception $\phi_e$ against invariant features and squared distance of equivariant features (coordinates).
- $u_i^{l+1} = u_i^l + C\sum_{j \neq i} (u_i^l - u_j^l) \phi_u(m_{ij})$ = update coordinate embeddings in an equivaraint manner to transforms from $E(n)$
- $m_i = \sum_{j \neq i}m_{ij}, h_i^{l+1} = \phi_h(h_i,m_i)$ = Invariant feature updates I assume?
E2GN2 (the contribution by the author) modifies the equation above to include another MLP for $m_i$ .
- Before: $u_i^{l+1} = u_i^l + C\sum_{j \neq i} (u_i^l - u_j^l) \phi_u(m_{ij})$
- After: $u_i^{l+1} = u_i^l\phi_{u_2}(m_i) + C\sum_{j \neq i} (u_i^l - u_j^l) \phi_u(m_{ij})$
- In the author's words, this extra MLP serves to "offset the bias from the previous layer and solve the early exploration problem".
I read over section 4.3 a few times and I still don't feel it holds together well. I would have suggested tying it in or working in the section better.

🧪 Appendix: Equivariant Graph Neural Networks in 3D

This is a quick framework I wrote while studying equivariant graph neural network and refreshing myself on group theory. Putting it here for future reference and using 3D $O(3)$ since that's what I imagine I may use it for in the future.

We define an orthogonal group in three dimensions as:
- $O(3) = \{ R \in \mathbb{R}^{3x3} | R^T R = R R^T = I \}$
- Each element $R$ can be a 3D rotation or reflection
We can define a unit sphere in $\mathbb{R}^3$ $R^{3}$ as
- $S^2 = \{(x,y,z) \in \mathbb{R}^3 : x^2 + y^2 + z^2 = 1\}$
We can define the group action of $O(3)$ $O (3)$ on $S^2$ $S^{2}$ as the follow matrix-vector multiplication:
- $g \cdot p = gp$
We can define the transitivity of the action in $O(3)$ $O (3)$ like the following:
- $\forall p,q \in S^2, \exists g \in O(3) : g \cdot p = q$
- Intuitively, this means for any two points on the sphere, you can always find a 3D rotation or reflection that transforms one point to the other.
A neural network $\phi_e$ $ϕ_{e}$ such as EGNN/E2GN2 is equivariant under $O(3)$ $O (3)$ if
- $f(gx) = gf(x), \forall g \in O(3), x \in \mathbb{R}^3$
- Thus, the neural network $\phi_e$ acts as the intertwiner, or the equivariant linear map
The really cool thing about this, and the reason I wrote all this pre-amble, is that you can essentially train a neural network $\phi_e$ to select the best equivariant map

🔍 Related Work

Equivariant Graph Neural Networks
Reinforcement Learning
Graph Neural Networks

📄 Attachments

PDF: 📄 View PDF
Paper Link: 🔗 External Page