D-036
Phonetic and Phonemic Segmentation: Neural Representations of Speech Attributes in Natural Dialogue
Juan Octavio Castro1,2, Joaquín E. González1, Jazmín Vidal Domínguez1, Agustín Gravano3,4, Pablo E. Riera1,5, Juan E. Kamienkowski1,5,6
  1. Laboratorio de Inteligencia Artificial Aplicada, Instituto de Ciencias de la Computación, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires - CONICET, Argentina
  2. Departamento de Física, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Argentina
  3. Laboratorio de Inteligencia Artificial; Escuela de Negocios, Universidad Torcuato Di Tella, Argentina
  4. CONICET, Argentina
  5. Departamento de Computación, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Argentina
  6. Maestría de Explotación de Datos y Descubrimiento del Conocimiento, FCEyN-FI, UBA, Argentina.
Presenting Author:
Juan Octavio Castro
joctavio287@gmail.com
The study of speech in natural environments poses challenges for traditional electroencephalogram (EEG) analysis approaches. In recent years, machine learning models—particularly regularized linear encoding models—have enabled a transition toward experimental designs that incorporate dynamic and naturalistic stimuli, such as speech during dialogue. This work aims to understand how different speech attributes are encoded in the brain within the context of unscripted natural dialogue. To this end, we extract low-level attributes (envelope, pitch, spectrogram, among others), high-level attributes (phonemes, phonological features, among others), and attributes derived from representations obtained with deep neural networks (Wav2Vec2.0, Whisper). The results show that the inclusion of high-level attributes significantly improves the prediction of brain signals across all frequency bands. In particular, predictions based on phonemes and phonological features suggest that neural sensitivity is consistent with the hypothesis of a hierarchical language processing system.