Skip to main content

Boosting Sample Efficiency and Generalization in Multi-agent Reinforcement Learning via Equivariance

Venue
NeurIPS
Year
2024
Authors
Joshua McClellan, Naveed Haghani, John Winder, Furong Huang, Pratap Tokekar
Topic
RL

🌟 Highlights

  • In a nutshell and between all the fluff, the only key difference between EGNN and E2G is the addition of the Ο•u2(mi)\phi_{u_2}(m_i) term. It takes some careful reading to realize this. I believe the authors fluffed and obscured this a bit so a lazy reviewer wouldn't reject it with the reasoning that it was only a minor incremental improvement to EGNN. However, the paper still addresses a fundemental challenge in a fairly rigorous way. I would have accepted the paper regardless.

πŸ“ Summary

Rotational and reflection symmetries (of the O(n) symmetry group) are fairly common in Reinforcement Learning scenarios. While solutions such as EGNN has been proposed to exploit equivariance, EGNN still suffers from both early exploration bias and sub-optimal generalization. The authors introduce E2GN2, a modified version of EGNN which seeks to eliminate early exploration bias and further improves generalization. The authors show empirically that E2GN2 outperforms SOTA on standard MARL tests as well as provides rigorous theoretical proofs.

🧩 Key Contributions

  • E2GN2 - Exploration-enhanced Equivariant Graph Neural Networks, which exploits environmental symmetries in the form of either equivariance, or invariance.

  • The authors show that E2GN2 has no early exploration bias and is equivariant to both rotations and reflections.

  • The authors show that E2GN2 has the ability to generalize well to test scenarios over baselines given it's equivariance guarantees

βœ… Strengths

  • I enjoyed the experiments section. The generalization section in particular was very intuitive

  • Figure 1, showing precisely the problem domain they were targeting did a great job setting everything up right away.

  • The key title and description the others try to push is "sample efficiency". Their contributions sum up to sample efficiency (i.e. the agent learns a policy for a given scenario and is able to generalize by design to O(n) configurations) and improving generalization. Overall, I think the paper hits those points well.

⚠️ Weaknesses / Questions

  • Some of the terms and terminology could have been repeated, interpreted better, or simply put in a table. It was difficult for a reader, even one versed in reinforcement learning and group theory, to keep track of everything. For this reason, I'm dedicating some of this section to what some of the terms mean in case I re-read this paper. Here's typical EGNN:

    • hi\textit{h}_{i} = the invariant features (do not transform under group actions like type)
    • ui∈Rn=\textit{u}_{i} \in R^n = are equivariant features (do transform under group actions like rotations)
    • mij=Ο•e(hil,hjl,∣∣(uilβˆ’uij)∣∣2 \textbf{m}_{ij} = \phi_e (h_i^l, h_j^l, || (u_i^l - u_i^j) ||^2 = a multi-layer perception Ο•e\phi_e against invariant features and squared distance of equivariant features (coordinates).
    • uil+1=uil+Cβˆ‘jβ‰ i(uilβˆ’ujl)Ο•u(mij)u_i^{l+1} = u_i^l + C\sum_{j \neq i} (u_i^l - u_j^l) \phi_u(m_{ij}) = update coordinate embeddings in an equivaraint manner to transforms from E(n)E(n)
    • mi=βˆ‘jβ‰ imij,hil+1=Ο•h(hi,mi)m_i = \sum_{j \neq i}m_{ij}, h_i^{l+1} = \phi_h(h_i,m_i) = Invariant feature updates I assume?
  • E2GN2 (the contribution by the author) modifies the equation above to include another MLP for mim_i.

    • Before: uil+1=uil+Cβˆ‘jβ‰ i(uilβˆ’ujl)Ο•u(mij)u_i^{l+1} = u_i^l + C\sum_{j \neq i} (u_i^l - u_j^l) \phi_u(m_{ij})
    • After: uil+1=uilΟ•u2(mi)+Cβˆ‘jβ‰ i(uilβˆ’ujl)Ο•u(mij)u_i^{l+1} = u_i^l\phi_{u_2}(m_i) + C\sum_{j \neq i} (u_i^l - u_j^l) \phi_u(m_{ij})
    • In the author's words, this extra MLP serves to "offset the bias from the previous layer and solve the early exploration problem".
  • I read over section 4.3 a few times and I still don't feel it holds together well. I would have suggested tying it in or working in the section better.

πŸ§ͺ Appendix: Equivariant Graph Neural Networks in 3D

This is a quick framework I wrote while studying equivariant graph neural network and refreshing myself on group theory. Putting it here for future reference and using 3D O(3)O(3) since that's what I imagine I may use it for in the future.

  • We define an orthogonal group in three dimensions as:
    • O(3)={R∈R3x3∣RTR=RRT=I}O(3) = \{ R \in \mathbb{R}^{3x3} | R^T R = R R^T = I \}
    • Each element RR can be a 3D rotation or reflection
  • We can define a unit sphere in R3\mathbb{R}^3 as
    • S2={(x,y,z)∈R3:x2+y2+z2=1}S^2 = \{(x,y,z) \in \mathbb{R}^3 : x^2 + y^2 + z^2 = 1\}
  • We can define the group action of O(3)O(3) on S2S^2 as the follow matrix-vector multiplication:
    • gβ‹…p=gpg \cdot p = gp
  • We can define the transitivity of the action in O(3)O(3) like the following:
    • βˆ€p,q∈S2,βˆƒg∈O(3):gβ‹…p=q\forall p,q \in S^2, \exists g \in O(3) : g \cdot p = q
    • Intuitively, this means for any two points on the sphere, you can always find a 3D rotation or reflection that transforms one point to the other.
  • A neural network Ο•e\phi_e such as EGNN/E2GN2 is equivariant under O(3)O(3) if
    • f(gx)=gf(x),βˆ€g∈O(3),x∈R3f(gx) = gf(x), \forall g \in O(3), x \in \mathbb{R}^3
    • Thus, the neural network Ο•e\phi_e acts as the intertwiner, or the equivariant linear map
  • The really cool thing about this, and the reason I wrote all this pre-amble, is that you can essentially train a neural network Ο•e\phi_e to select the best equivariant map

πŸ” Related Work

  • Equivariant Graph Neural Networks

  • Reinforcement Learning

  • Graph Neural Networks

πŸ“„ Attachments

PDF
πŸ“„ View PDF
Paper Link
πŸ”— External Page