ISSN 1671-3710
CN 11-4766/R

Advances in Psychological Science ›› 2021, Vol. 29 ›› Issue (12): 2147-2160.doi: 10.3724/SP.J.1042.2021.02147

• Regular Articles • Previous Articles     Next Articles

The neural mechanisms for human voice processing: Neural evidence from sighted and blind subjects

MING Lili1,2, HU Xueping1,2,3   

  1. 1School of Linguistic Science and Art, Jiangsu Normal University, Xuzhou 221000, China;
    2Key Laboratory of Language and Cognitive Neuroscience of Jiangsu Province, Collaborative Innovation Center for Language Ability, Xuzhou 221009, China;
    3School of Chinese Language and Culture, Nanjing Normal University, Nanjing 210097, China
  • Received:2021-02-23 Online:2021-12-15 Published:2021-10-26

Abstract: The human voice, as an important part of one’s auditory environment, contains a large amount of paralinguistic information to help identify other individuals. Especially for blind individuals, the lack of visual face experience makes voice information the main source of perceiving another person's individual characteristics. The present study attempts to analyze and summarize the universal human voice processing mode and the specific voice processing mechanism of blind individuals by combining the research on sighted and blind groups (mainly including voice-selective processing and voice-identity processing).
The existing functional magnetic resonance imaging (fMRI) literature has found that compared with non-vocal sounds, the bilateral superior temporal sulcus/gyrus (STS/G) showed stronger neuronal activation for the human voice, indicating that the STS/S were voice-selective regions and appeared stronger in the right hemisphere. FMRI research among blind individuals once again verified the conclusion that the right STS plays an essential role in voice-selective processing. However, it has also been found that the voice-sensitive response of the left STS in the blind group was higher than that in the sighted group, which may indicate that the blind group showed a reduced hemispheric lateralization tendency. A similar conclusion has also been found at different levels of voice information, such as voice identity, voice emotion and speech processing. In addition, the reduced hemispheric lateralization tendency of voice processing is not only positively correlated with the age of onset of blindness but also may be explained by the more coordinated auditory processing mechanism between the two cerebral hemispheres in blind individuals.
In addition to the core system of voice processing, the right anterior fusiform gyrus (aFG), generally identified as face-selective processing and face-identity processing, can also be involved in voice-selective processing and voice-identity processing in blind individuals. As such, we suggest that the right aFG in blind individuals may exhibit cross-modal brain reorganization to participate in voice processing. The evidence from how deaf individuals’ "temporal voice area" participates in face-selective and face-identity processing further supports this elaboration of cross-modal reorganization in the brain regions associated with individual identification. Furthermore, because the "temporal voice area" in deaf individuals enhances the connection strength with the visual cortex, the neural basis of cross-modal reorganization may arise from the "unmasking effect" after sensory deprivation; for example, the visual brain region (aFG) in blind individuals can recruit and enhance existing auditory or tactile inputs to process (nonvisual) information about a speaker's identity. However, this inference needs to be further tested. Notably, the fusiform region in the sighted group was also involved in voice processing. More specifically, the fusiform face area (FFA) showed cross-modal information (without visual cues) or was integrated and facilitated by visual information (with known visual cues) in voice-identity recognition/identification. Therefore, the voice processing mechanism in the blind and sighted groups did not follow the same pattern: the visual region in blind individuals demonstrated long-term cross-modal reorganization, while the visual region in sighted individuals performed short-term cross-modal information processing or multimodal information integration; however, both emphasized the close relation between face identity and voice identity processing.
In summary, after systematically combing and analyzing fMRI research on voice processing in blind and sighted groups, the following questions need to be further explored and clarified: (1) How are the voice processing strategies (driven by top-down and bottom-up processing) of blind individuals different from those of sighted individuals? (2) Are the functions of the fusiform gyrus modality-specific or modality-general representations? (3) How are different levels of voice information integrated to realize the dynamic perceptual process of multiple cues in auditory speech flow?Answering these questions will improve our current understanding of the theoretical system and neuroanatomical mechanism of voice processing and, further, have important implications for auditory processing, speech cognition and artificial intelligence.

Key words: voice identity processing, sighted subjects, blind subjects, fusiform gyrus, face processing

CLC Number: