Here we demonstrate that context-sensitive two-point neurons enable extremely energy-efficient multisensory speech processing. In this audio-visual hearing aid use case, the neurons use visuals and environmental information to clean speech in a noisy environment. The simulation below shows that a 50-layered deep neural net uses 1250-times fewer context-sensitive two-point neurons, at any time, during training than point neurons. This opens new cross-disciplinary avenues for future on-chip DNN training implementations and posits a radical shift in current neuromorphic computing paradigms.