Light travels through a vacuum about a million times faster than sound travels through air. Yet inside the human nervous system, the order flips: a beep reaches the muscles of your finger noticeably sooner than a flash does. The asymmetry is small in absolute terms — about 30 to 50 ms — but it is consistent across studies, persists into old age, and quietly shapes the rules of sports from sprinting to cricket.

The reason has nothing to do with the speed of the signals in the outside world and everything to do with how the brain converts each signal into action. Audio gets a shorter, more direct route to motor output. Vision takes the scenic path.

The Headline Numbers

Across decades of laboratory measurements with millisecond-accurate hardware, two values keep recurring for healthy young adults:

ModalityTypical simple RTApprox neural floor
Auditory (tone or click)170-200 ms~100 ms
Visual (flash or color change)220-250 ms~150 ms
Tactile (skin tap)120-160 ms~80-100 ms

Auditory beats visual by roughly 30 to 50 ms. Tactile is faster still, because the somatosensory pathway can short-circuit through the spinal cord and reach motor neurons without a cortical detour. Each modality has its own floor, set by how long it takes to transduce the physical signal into neural impulses and how many synapses sit between the sensor and the muscle.

The Neural Pathway Difference

The clearest explanation lives in the anatomy. A photon striking the retina starts a biochemical cascade in a photoreceptor cell. That cascade takes around 20 to 30 ms — slow, by neural standards, because it relies on a chain of enzyme reactions before any voltage change occurs. The signal then ascends through the retinal interneurons, exits via the optic nerve, synapses in the lateral geniculate nucleus (LGN) of the thalamus, projects to primary visual cortex (V1), and only then can begin driving motor structures. Several synapses, each adding 1 to 2 ms, and a slow first step.

A sound wave striking the cochlea moves hair cells mechanically. The mechanical-to-electrical conversion takes well under a millisecond — orders of magnitude faster than visual transduction. The signal climbs through the cochlear nucleus, the superior olivary complex, the inferior colliculus, the medial geniculate nucleus (MGN), and finally the auditory cortex. Five staging areas, more than the visual pathway in raw count, but the auditory route has an advantage the visual route does not: motor responses can be initiated from the lower levels of the chain, especially the inferior colliculus, without waiting for cortical processing. The startle reflex is the extreme example — a loud sound can trigger a whole-body response in roughly 50 ms through a circuit that never touches the cortex at all.

Vision has no comparable subcortical shortcut for fine motor responses. There is a retinotectal pathway involved in orienting eye movements, but a finger press on a key essentially requires cortical involvement. The brain pays for that with extra synaptic delay.

Sprint Starts: Why Sub-100 ms Is a False Start

Track and field offers the cleanest real-world demonstration. At the start of a 100 m race, sprinters press against a pair of force-instrumented blocks. A starter pistol fires, and the force on the blocks is sampled continuously. The reaction time is the interval between the pistol and the first detectable increase in pressure.

Under World Athletics rules, any reaction below 100 ms is automatically scored as a false start. The reasoning is physiological. A real auditory reaction has to traverse the cochlear chain, reach motor cortex, descend through the corticospinal tract, cross the neuromuscular junction, and recruit enough muscle fibers to produce force at the foot. Even with every step at its theoretical minimum, the total is at least 90 to 100 ms. A reading under that suggests the athlete started moving before the gun and got lucky on the timing — not that they out-conducted the speed of their own nerves.

There is a subtle audio engineering lesson hidden inside the rule. For years, the gun was a physical starter fired in the air. Athletes in lane 1 heard it before athletes in lane 8 by a few milliseconds — sound travels through air at only about 343 m/s, and the gap across eight lanes is real. Modern blocks have a speaker mounted at each individual block, fed from the starter signal electronically. Every athlete hears the start at the same instant. The switch happened precisely because reaction-time differences of single-digit milliseconds matter at the elite level.

Touch Is Faster Still

Tactile reaction times routinely come in below auditory, around 120 to 160 ms. The reason is another shortcut. A tap on the skin activates mechanoreceptors whose axons project directly into the spinal cord. From there, a reflexive response — the well-known stretch reflex, for instance — can be issued in under 50 ms by the spinal cord alone, with no input from the brain. Even a voluntary tactile response benefits from this proximity: the brain still decides, but the round-trip from skin to spinal cord is shorter than any cortical loop.

The reaction-time order — touch fastest, then sound, then vision — mirrors the anatomical order of how directly each sense can reach the motor system.

Multimodal Stimuli

What happens when a flash and a beep arrive together? Empirically, the resulting reaction time is close to whichever stimulus was faster on its own — usually the auditory one — plus a small additional boost of a few milliseconds. This "redundant signals" effect has been studied since the 1960s, and in some experimental setups it exceeds what a simple "race" between two independent channels could produce, a result known as a race-model violation. It implies that the brain partly integrates information across senses rather than just picking the winner of a race.

The practical takeaway is that audio cues, paired with visual ones, do not just provide redundancy. They speed up the overall response.

If Audio Is Faster, Why Doesn't the Brain Use It for Everything?

Speed is only one axis. Vision dominates spatial localization. The retina contains roughly 100 million photoreceptors arranged with sub-degree precision; the cochlea contains around 15,000 hair cells arranged by frequency, not by spatial location. The brain has to compute sound location indirectly from interaural time and level differences, with an accuracy of only a few degrees at best. For anything requiring "where exactly is it?", vision is far more useful, even if it costs an extra 30 ms.

Sound is favored for time-critical events that demand a fast response without precise localization. A sudden bang, a sharp edge in a musical performance, the crack of a bat on a ball — these are events where being fast matters more than being precise about location.

Sports Examples

Cricket batsmen facing fast bowling routinely use auditory information to read the spin and line of the ball off the bowler's hand and, in the case of swing bowling, the sound of the ball off their own bat for spin classification. Tennis players doing the same with the sound of a racket strike report similar effects — the auditory channel arrives slightly earlier than the visual one and primes the motor response before the eyes fully resolve the ball's flight.

In Formula 1 and other motorsports, drivers use engine pitch and turbocharger noise as primary feedback about RPM and slip angle. By the time a visual instrument reading updates and the brain reads it, the auditory information has already arrived and triggered any needed correction. Visual instruments are for diagnostic checking, not real-time control.

The Hardware Confound: What Actually Gets Measured

Online reaction tests measure something only loosely related to the neural reaction time. The path from neuron to logged millisecond looks roughly like this:

  1. Stimulus presented by the test page.
  2. Display or audio device renders the stimulus (variable latency).
  3. User perceives the stimulus, decides, executes the response.
  4. Input device transmits the keypress or click to the OS.
  5. Browser receives the event and computes the elapsed time.

Steps 2 and 4 add latency that has nothing to do with the user's nervous system:

SourceTypical added latency
60 Hz monitor frame queue~8-16 ms
Panel response time (LCD)~5-30 ms
OS audio output buffer (wired)~5-20 ms
Bluetooth audio~150-300 ms
USB keyboard polling~1-10 ms
Wireless keyboard~5-20 ms

On a typical Bluetooth audio setup, the latency added by the headset alone exceeds an entire human reaction time. The measurement is no longer meaningful. Even on wired output, the difference between low-latency audio paths (ASIO on Windows, Core Audio on macOS) and the default browser path can be 20 to 50 ms. For an audio reaction test to be informative, use wired output, not Bluetooth.

For the visual side, monitor input lag is the dominant confound. Television-style displays and many ultra-wide gaming monitors apply image processing that can push lag past 50 ms. A dedicated low-lag gaming monitor with a high refresh rate adds only a few milliseconds. If you want to compare your audio and visual reaction times honestly, run both on the same machine, the same session, and minimize the hardware differential where you can. The general principles for fair measurement are covered in detail in our guide to measuring reaction time online.

Aging Affects the Two Channels Differently

Both auditory and visual reaction times slow with age, but visual slows somewhat faster. The visual transduction cascade in photoreceptors becomes less efficient, the lens stiffens and scatters more light, and contrast sensitivity declines. The auditory pathway is more resilient at the level of central processing, though peripheral hearing loss can confound measurement at older ages. The 30 to 50 ms audio advantage typically widens by another 10 to 20 ms past age 60. Population-level numbers for both modalities, broken down by age band, are collected in the average reaction time by age overview.

Trying It Yourself

The audio-vs-visual gap is one of the easier neuroscience effects to reproduce on your own equipment, provided you control for hardware. Use wired audio output, the same monitor, and the same input device for both tests. Run several sessions across different days and times, and compare the medians rather than the single fastest trial — a few outliers below 150 ms can skew a small sample dramatically.

Our Audio Reaction Test times your response to a tone played through your default audio device, with the same trial structure as our visual equivalent. Compare your audio and visual medians on the same hardware — the 30 to 50 ms gap should show up in your data, provided you keep your Bluetooth headphones out of the picture.