Different Effects of Attentional Mechanisms between Visual and Auditory Cueing

Audio-visual integration interacts with attentional mechanisms. Additionally, salient auditory stimuli automatically draw attention to an audio-visual event, while spatial attention can modulate audio-visual integration. Attention induced by auditory inputs (sound-driven attention) facilitates visual perception. Similarly, visual attention improves performance on a visual task. However, the difference between attention driven by auditory and visual cues is not clear. When visual attention facilitates visual perception, there is a trade-off between spatial and temporal resolution. In contrast, audition has superior temporal resolution to vision. In the present study, we investigated the difference between auditory and visual cue-driven attention with respect to this trade-off. The results indicated that visual cueing increased spatial resolution but decreased temporal resolution. On the other hand, auditory cueing affected the efficiency of visual processing (i.e., response time) for temporal gap detection. These findings suggest that auditory cueing capitalizes on resources available for visual processing. In contrast, visual cueing may increase activation of the spatial channel instead of inhibiting the temporal channel, as proposed in previous study. Overall, there appear to be clear differences between mechanisms involved in auditory and visual cues-driven attention.


Introduction
Individuals perceive the external environment through sensory information, which can involve multisensory information.In using multisensory information, each sense compensates for ambiguity in perception in relation to other senses.Such multisensory integration is fundamental for producing stable and efficient perception.
Multisensory interactions interface with attentional mechanisms during perceptual processing.Synchronous sound presentation with visual stimuli drives attention and affects visual processing due to two types of attentional mechanism: bottom-up and top-down (see Talsma, Senkowski, Soto-Faraco, & Woldorff, 2010, for a review).Regarding bottom-up mechanisms, attention is driven automatically by salient stimuli, and is captured by multisensory events.For example, when a brief sound is presented concurrently with a target color or orientation change, search efficiency improves dramatically even among difficult visual search displays (Van der Burg, Olivers, Bronkhorst, & Theeuwes, 2008;Van der Burg, Talsma, Olivers, Hickey, & Theeuwes, 2011).A salient sensory stimulus elicits a strong neural response that is automatically linked to a stimulus in another modality (Talsma et al., 2010).In addition, auditory stimuli can speed up attentional shifts toward visual targets (Keetels & Vroomen, 2011).
For top-down mechanisms, spatial attention induced by various cues modulates multisensory integration processing.Furthermore, top-down attention directed toward one sensory stimulus spreads to stimuli in other modalities that occur at the same time (Busse, Roberts, Crist, Weissman, & Woldorff, 2005).When attention is directed to visual lip movements that match a spoken auditory sentence, activity increases in multiple brain areas (Fairhall & Macaluso, 2009).In contrast, allocating spatial attention towards irrelevant lip movements interferes with recognition of audio-visual speech signals (Senkowski, Saint-Amour, Gruber, & Foxe, 2008).When multisensory stimuli compete for processing resources (i.e., the saliency of an individual sensory signal is low), top-down selective attention is likely to be necessary for multisensory integration (e.g., Talsma, Doty, & Woldorff, 2007).
However, the differences in attentional mechanisms between attention induced by auditory and visual cues have not been investigated sufficiently.In the auditory domain, attention induced by auditory input (sound-driven attention) also increases the sensory input signal in audio-visual processing because auditory inputs induce feedback projections from the early auditory cortex to the early visual cortex during audio-visual interactions (Watkins, Shams, Josephs, & Rees, 2007;Watkins, Shams, Tanaka, Haynes, & Rees, 2006).Visual attention also increases the sensory input signal via visual cueing (e.g., Carrasco, Williams, & Yeshurn, 2002).However, visual attention does not simply improve processing but also reduces temporal channel performance, while increasing spatial channel performance (Yeshurun & Levy, 2003).Therefore, we examined the differences between attention induced by auditory and visual cueing using temporal and spatial gap detection tasks.We hypothesized that response speed would increase with auditory cueing according to an increasing sensory input signal.On the other hand, temporal gap detection sensitivity would decrease with visual cueing instead of increasing spatial gap detection sensitivity.

Experiment 1
Experimental 1 examined the differences in the attentional mechanisms underlying auditory and visual cues-driven attention.Here, we examined effects on temporal resolution by measuring a two-flash fusion threshold.The experimental paradigm was adopted from a previous study (see Yeshurun & Levy, 2003).We hypothesized that the sensitivity would decrease by visual cue compared with neutral cue condition.On the other hand, response time (RT) would be shorter in the auditory cue condition than in the neutral cue condition.

Participants
A group of 11 Tohoku University graduate and undergraduate students (8 women and 3 men) participated in Experiment 1.All participants reported normal or corrected-to-normal vision and audition.None had been informed as to the purpose of the experiment.They were awarded experimental credit for their participation.

Materials
The visual target was a white (43.5 cd/m 2 ) disk (1.0 deg in diameter), and the fixation was a white cross (about 1.4 deg).The target and fixation were displayed on a black (1.6 cd/m 2 ) background.The target was presented 10 deg to the left or right of the fixation.The visual cue was a red (15.3 cd/m 2 ) frame (1.5 × 1.5 deg).In the neutral and auditory cue conditions, the red frame was presented at the location of the fixation (i.e., presented in the center of the display).In the visual cue condition, the red frame appeared at the same location where the target would be presented (i.e., presented at either the left or right side of the fixation according to the location of the target).The red frame remained during the target offset period.The auditory cue was a pure tone, with a frequency of 1000 Hz and sound pressure level of 80 dB.The auditory cue was presented simultaneously with target onset.The auditory cue duration was only 50 ms.

Apparatus
The stimuli were generated and controlled by custom-made MATLAB scripts (MathWorks, Inc.), Cogent Graphics and 2000 toolbox (www.vislab.ucl.ac.uk/cogent.php), and a PC (XPS720, Dell; OS: Windows Vista, Microsoft).Visual stimuli were displayed on a CRT-display (Trinitron GDM-F520, Sony; resolution: 1024 × 768 pixels; refresh rate: 60 Hz).The auditory stimuli were conveyed through an audio interface (Edirol FA-66, Roland) and headphones (HDA200, Sennheiser).The synchrony of the visual and auditory stimuli was confirmed using a digital oscilloscope (TS-80600, Iwatsu).The experiment was conducted in a dark room with 43.6 dB (A) of background noise.The participants viewed the monitor binocularly at a distance of 60.0 cm with their heads stabilized using a chin rest.

Procedure
Each trial began with the participant pressing the "0" key.The fixation cross was presented for 1000 ms followed by the visual cue (Figure 1).The target was presented 100 ms after the onset of the visual cue.For the auditory cue condition, the pure tone was presented simultaneously with target onset.In the case of a double flash, two disks appeared, each for 50 ms, separated by one of three inter-stimulus intervals (ISIs; 17, 33, or 50 ms).In the case of a single flash, a single disk was presented for one of three durations (117, 133, or 150 ms).All participants completed 20 trials for each 3 (Cueing; neutral, auditory cue, or visual cue) × 3 (ISI; 17, 33, or 50 ms) × 2 (Flash; single or double) condition for a total of 360 trials.Participants were asked to discriminate the number of flashes (one or two).Accuracy and RT were recorded.

Results and Discussion
Sensitivity (d') and criterion (c) scores for single vs. double flash discriminations were computed using signal detection theory (Yeshurun & Levy, 2003).Additionally, the average RT was computed using correct response RT data.The results are shown in Figure 2. A two-way analysis of variance (ANOVA) with Cueing (3) × ISI (3) was conducted for d' scores.The main effect of Cueing was significant (F (2, 20) = 10.97,p < .001,η p 2 = .52).Multiple comparisons (Ryan's method) indicated that sensitivity was higher in the neutral and auditory cue conditions compared to the visual cue condition (ps < .005).No significant difference in sensitivity between the neutral and auditory cue conditions was observed (p = .68).Moreover, the main effect of ISI was also significant (F (2, 20) = 31.67,p < .001,η p 2 = .76).Multiple comparisons indicated that sensitivity was higher in the 33 and 50 ms ISI conditions than in the 17 ms ISI condition (ps < .001).The difference in sensitivity between the 33 ms and 50 ms ISI conditions was not significant (p = .26).The interaction between Cueing and ISI was not significant (F (4, 40) = 0.76, p = .56,η p 2 = .07).Differences in criterion score between cueing conditions were small (neutral: c = 0.05; auditory cue: c = 0.05; visual cue: c = -0.09).Finally, a two-way ANOVA with the Cueing (3) × ISI (3) was conducted for the RT data.The main effect of Cueing was significant (F (2, 20) = 4.78, p < .05,η p 2 = .32).Multiple comparisons indicated that RTs were shorter in the auditory cue condition as compared to the neutral condition (p < .01).The main effect of ISI (F (2, 20) = 0.68, p = .52,η p 2 = .06)and the two-way interaction (F (4, 40) = 0.80, p = .54,η p 2 = .07)were not significant.In this experiment, visual cueing reduced the participants' ability to detect a temporal gap, replicating Yeshurun and Levy's (2003) findings.In contrast, a co-occurring tone did not influence temporal sensitivity.The results of the RT analyses revealed that the auditory cue accelerated temporal judgments.Therefore, the auditory cue did not impair vision's temporal resolution but rather improved visual temporal judgment speed.Yeshurun and Levy (2003) have proposed that spatial visual cues improve spatial resolution instead of impairing temporal resolution.
In Experiment 2, we investigated the difference in spatial gap detection sensitivity between visual and auditory cueing.

Experiment 2
In Experiment 1, a difference in temporal gap detection performance was observed between auditory and visual cue-driven attention.This difference may be induced by the trade-off between temporal and spatial resolution in attention.In Experiment 2, we attempted to confirm this hypothesis by examining differences in spatial resolution performance between visual and auditory cueing.

Participants
The participants were the same as in Experiment 1.

Materials
The visual target was a white circle (1.0 deg in diameter), with a 0.2 deg gap on the left or right side.The visual target was presented on the left or right side of the fixation.The distance between the fixation and target was 5.0 deg.The duration of the visual target was one of three types (67, 85, or 100 ms).The other stimuli and conditions were the same as in Experiment 1.

Procedure
Each trial began by the participant pressing the "0" key.The fixation was presented for 1000 ms followed by the attentional cue (Figure 3).The cueing conditions were the same as in Experiment 1.The target was presented for one of three durations, 100 ms after cue onset.All participants completed 20 trials for each 3 (Cueing; neutral, auditory cue, or visual cue) × 3 (Duration; 67, 85, or 100 ms) × 2 (Gap location; left or right) for a total of 360 trials.Participants were asked to discriminate the location of the gap (left or right).

Results and Discussion
Sensitivity (d') and criterion (c) scores for gap location discrimination were computed according to signal detection theory (Yeshurun & Levy, 2003).Additionally, the average RT was computed using correct response RT data.The results are shown in Figure 4.A two-way ANOVA with Cueing (3) × Duration (3) as factors was conducted for d' scores.The main effect of Cueing was significant (F (2, 20) = 16.46,p < .001,η p 2 = .62).Multiple comparisons indicated that sensitivity was higher in the visual cue condition than in the neutral and auditory cue conditions (ps < .001).No significant difference in sensitivity was observed between the neutral and auditory cue conditions (p = .70).The main effect of Duration was also significant (F (2, 20) = 4.34, p < .05,η p 2 = .30).Moreover, the interaction between Condition and Duration was significant (F (4, 40) = 4.74, p < .005,η p 2 = .32).The simple main effect of Cueing was significant at the 85 and 100 ms durations (F (2, 60) = 13.92,p < .001,η p 2 = .32;F (2, 60) = 17.01, p < .001,η p 2 = .36,respectively).Multiple comparisons indicated that performance was higher in the visual cue condition than in the neutral and auditory cue conditions (ps < .001).The simple main effect of Duration was also significant in the visual cue condition (F (2, 60) = 13.24,p < .001,η p 2 = .31).Multiple comparisons indicated that sensitivity was higher for durations of 85 ms and 100 ms than for the 67 ms duration condition when a visual cue was presented (ps < .001).Differences in criterion scores between cueing conditions were small (neutral: c = 0.08; auditory cue: c = 0.16; visual cue: c = 0.10).Finally, a two-way ANOVA with the Cueing (3) × ISI (3) was conducted for the RT data.The main effect of Cueing (F (2, 20) = 1.66, p = .22,η p 2 = .14)and ISI (F (2, 20) = 0.02, p = .99,η p 2 = .00)were not significant.On the other hand, the interaction between Cueing and ISI was significant (F (4, 40) = 4.11, p < .01,η p 2 = .29).The simple main effect of cueing was significant for the 100 ms duration conditions (F (2, 60) = 5.53, p < .01,η p 2 = .15).Multiple comparisons indicated that the RT was longer in the auditory cue condition than in the other two cueing conditions (ps < .01).The results of Experiment 2 showed that visual cueing increased spatial gap detection sensitivity, consistent with Yeshurun and Levy (2003).In contrast, a simultaneous tone did not affect spatial gap detection sensitivity and RT.In Experiment 1, a reduction in temporal resolution was not observed in the auditory cue condition.Therefore, differences do emerge between auditory and visual cue-driven attention.Visual attention increases spatial resolution but decreased temporal resolution.In contrast, sound-driven attention did not modulate spatial or temporal resolution in the present experiments.Furthermore, response speed decreased in the auditory cue condition.Thus, an auditory tone could interrupt spatial gap detection at a decisional level.

General Discussion
The current study examined differences in the mechanisms underlying auditory and visual cues-driven attention.
Visual cueing increased spatial gap detection sensitivity and decreased temporal detection sensitivity.In contrast, auditory cueing did not influence the sensitivity to detect either spatial or temporal gaps.Instead, auditory cueing partially increased processing efficiency (i.e., led to faster RTs).Therefore, we observed a clear difference in attention between the visual and auditory cueing conditions.
Facilitation due to visual cueing revealed a trade-off between spatial and temporal resolution.The same trade-off has been found by Yeshurun and Levy (2003).Other studies have also shown that visual attention affects spatial resolution (Gobell & Carrasco, 2005), contrast sensitivity (Carrasco, Ling, & Read, 2004), and perceived size (Anton-Erxleben, Henrich, & Treue, 2007).Visual attention increases perceptual performance via two distinct mechanisms: stimulus enhancement and external-noise exclusion (Dosher & Lu, 2000).Additionally, a valid cue decreases discrimination uncertainty by selecting and restricting the target location and thereby, reducing the decision loads (Davis, Kramer, & Graham, 1983).Yeshurun and Levy (2003) have proposed that facilitation for a visual target can be attributed to activation of a trade-off between the spatial and temporal channels of vision.
In other words, visual attention increases activation of the spatial channel and facilitates perceptual performance while decreasing the temporal channel activation.This characteristic of visual attention was replicated in the present study.In addition, visual cueing hardly influenced the RT.In a previous study using a similar paradigm (Yeshurun & Levy, 2003), the facilitation of discrimination speed was not observed.In the present study, while a visual cue controls spatial attention, the task involves temporal or spatial gap discrimination of a visual target.Therefore, the controlling of spatial attention by a visual cue would make it hard to affect processing efficiency during a gap detection task.However, further research may be needed to better validate this possibility.
In the auditory modality, attentional capture is related to activation in a more dorsal network comprising the left precentral gyrus, right superior parietal gyrus, and right intraparietal sulcus (Watkins, Dalton, Lavie, & Rees, 2007).In the visual modality, attentional capture is associated with activation in the left prefrontal and bilateral superior parietal cortices (de Fockert, Rees, Frith, & Lavie, 2004).These brain regions are very close to each other, consistent with a common supra-modal network for stimulus-driven attentional shifts (Downer, Grawley, Mikulis, & Davis, 2002).In visual attention, signal enhancement is related to thalamic activation, and noise exclusion is associated with a network spanning the pulvinar to V4 (Posner & Raichle, 1994).Audio-visual integration correlates with activation in the superior colliculus and superior temporal sulcus (Fairhall & Macaluso, 2009).Therefore, the neural mechanisms of attentional capture are very similar in vision and audition.
However, there are different mechanisms underlying how auditory and visual cues guide attention and facilitate visual processing.
In previous studies, it has been shown that a simultaneous auditory stimulus can affect the perception of the number of presented visual stimuli, which is well known as the fission and fusion illusions (e.g., Andersen, Tiippana, & Sams, 2004;Shams, Kamitani, & Shimojo, 2000).In the fusion illusion, only a single flash is perceived even if two flashes are presented when a simultaneous single beep is presented.This phenomenon affects the temporal gap detection sensitivity of visual stimuli in the present study.However, an auditory cue did not affect the temporal gap detection sensitivity of visual stimuli in Experiment 1.Moreover, the fission and fusion illusions occurred when visual stimuli were presented at shorter intervals (e.g., 17 ms).In the present study, the duration of the visual stimuli was longer, for 50 ms.Thus, the effects of the fission and fusion illusions would be weak in this study.
The present study indicated the differences between auditory and visual cue-driven attention.Both auditory and visual cue-driven attention facilitates visual processing in different fashions.Moreover, the underlying mechanisms of the attentional facilitation effects are clearly different.Therefore, the functions of auditory and visual cue-driven attention should be distinguished and discussed.

Figure 1 .
Figure 1.Schematic representation of the procedure in Experiment 1.The upper three and lower three streams indicate the double disk and single disk conditions, respectively

Figure 2 .
Figure 2. Temporal gap detection sensitivity and response time in Experiment 1.(a) Sensitivity for each ISI and Cueing condition.(b) Response time for each ISI and Cueing condition.Error bars represent standard errors of the mean (n = 11)

Figure 3 .
Figure 3. Schematic representation of the procedure in Experiment 2. The left three and right three streams indicate the right-side gap and left-side gap conditions, respectively

Figure 4 .
Figure 4. Spatial gap detection sensitivity and response time in Experiment 2. (a) Sensitivity for each duration and Cueing condition.(b) Response time for each duration and Cueing condition.Error bars represent standard errors of the mean (n = 11)