- The Time-Course of Categorization of Real-Life Scenes with Affective
Content
- V. Maljkovic, P. Martini, and H. Farid
- Vision Sciences (VSS), Sarasota, FL, 2004
Purpose: To establish the temporal dynamics of the human
ability to extract meaning from scenes.
Methods: EXP 1: 384 color images with emotional valence from
the IAPS set were presented (masked) once to each of 96 subjects, at
durations from one video-frame (13 ms) to 1710ms. Subjects rated each
image valence on a 9-point scale. We calculated mean ratings per
exposure and derived hazard functions for different valence
categories. EXP 2: Three image classes were tested in a blocked
design: positive/negative images, landscapes/cityscapes and
animals/vehicles. Each image was presented (masked) for
13-50msec. Subjects categorized the images in a 2AFC design and
accuracy of categorization was calculated per exposure.
Results: EXP 1: Valence was reliably discriminated after a
single video frame and asymptoted at ~1s. The derived hazard functions
show that categorization rates for positive and negative images are
the same, with a transient peak at ~50ms, and a sharp decline by
200ms. EXP 2: Performance remained constant at ~95% for
landscapes/cityscapes and animals/vehicles at all exposures;
performance for emotional scenes improved from ~60% at one frame
exposure to ~75% at 50 ms exposure. To determine if low-level
features could be responsible for these results we built a statistical
model consisting of 24 low-level measurements of luminance and spatial
frequency. A linear classifier was able to almost perfectly separate
the landscapes/cityscapes and animals/vehicles, but was unable to
separate the valence categories.
Conclusions: Image meaning is available at exposures as brief
as one video-frame. While rapid categorization of some image classes
could exploit differences in low-level image properties, no such
differences seem to be available for emotional scenes, and yet image
meaning can be extracted from them reliably and quickly. This suggests
a true act of object recognition, dependent on mechanisms functioning
on similarly fast scales.
|