While I was out, my colleagues finally root-caused and fixed a nasty low-probability bug causing crashes on a specific Hubris-based machine.
It's an exciting consequence of our firmware being weird along three axes, simultaneously:
1. We use privileged/unprivileged mode on the Cortex-M7 (most systems don't, or use unprivileged only very cursorily)
2. We use the MPU to effect component memory isolation.
3. We use the STM32H7 FMC to access expansion peripherals on an FPGA through a parallel bus.
tl;dr: ST's default memory mapping for the FMC, combined with the ARMv7-M default memory attribute map, combined with our decision to have the kernel bypass the MPU (thereby using that default memory map), _combined with_ the M7's access speculation behavior to "normal" memory... meant that we'd occasionally get spurious accesses into the FPGA that would take out the system, despite having no code able to actually _do_ that.
This deserves a blog post but you can kinda reconstruct the details by reading the issue thread backwards: https://github.com/oxidecomputer/hubris/issues/2198