The influence of spatial congruency on audiovisual interaction in a visual-attention task in which only the visual segment of a bimodal audiovisual stimulus was required to attend was investigated using behavioral and electrophysiological measures in humans. The behavioral results showed that the responses to audiovisual target stimuli were faster than that to unimodal visual target stimuli. Moreover, we recorded event-related potentials (ERPs) to unimodal visual stimuli, unimodal auditory stimuli, spatially congruent, and incongruent bimodal audiovisual stimuli. The audiovisual interaction effects were detected by comparing the ERPs to audiovisual stimuli with the sum or the ERPs to unimodal visual and auditory stimuli. The ERPs results showed audiovisual interaction were detected as a negativity in mid-central scalp sites at around a latency of 300 ms during the later stage of processing under both a spatially congruent condition and incongruent condition. But the amplitude around 300ms was significantly larger than that under a spatially congruent condition. Thus, audiovisual interaction depends on the spatial congruity of the bimodal audiovisual stimuli.