Hey! I’m Puyuan Peng. I’m a second year PhD student in computer science at UT Austin. I’m very fortunate to have David Harwath as my advisor and I’m with the Speech, Audio, and Language Technologies (SALT) Lab. Before coming to Austin, I did my master’s in statistics at the University of Chicago, where I spent a wonderful summer working with Karen Livescu and Herman Kamper. I did my undergrad in Math and Applied Math at Beijing Normal University.

In my free time, I like to workout and sing.

contact: pyp [at] utexas [dot] edu


Visually Grounded Speech Processing and Understanding

May 2022 at Developmental Intelligence Laboratory, Department of Psychology, UT Austin, USA
Jan 2022 at Karen Livescu Group, Toyota Technological Institute at Chicago, USA.
Jan 2022 at Cognitive Machine Learning Group, Departement d’Etudes Cognitives, Ecole Normale Supérieure, France.


(*denotes equal contribution)

Textless Phrase Structure Induction From Visually-Grounded Speech
Cheng-I Jeff Lai*, Freda Shi*, Puyuan Peng*, Yoon Kim, Kevin Gimpel, Shiyu Chang, Yung-Sung Chuang, Saurabhchand Bhati, David Cox, David Harwath, Yang Zhang, Karen Livescu, James Glass
technical report

Zero-shot Video Moment Retrieval With Off-the-Shelf Models
Anuj Diwan*, Puyuan Peng*, Raymond J. Mooney
Workshop on Transfer Learning for Natural Language Processing, 2022
pdf code

Word Discovery in Visually Grounded, Self-Supervised Speech Models
Puyuan Peng, David Harwath
Interspeech, 2022
pdf code

MAE-AST: Masked Autoencoding Audio Spectrogram Transformer
Alan Baade, Puyuan Peng, David Harwath
Interspeech, 2022
pdf code

Self-Supervised Representation Learning for Speech Using Visual Grounding and Masked Language Modeling
Puyuan Peng, David Harwath
The 2nd Workshop on Self-supervised Learning for Audio and Speech Processing at AAAI, 2022
pdf code

Fast-Slow Transformer for Visually Grounding Speech
Puyuan Peng, David Harwath
ICASSP, 2022
pdf code

A Correspondence Variational Autoencoder for Unsupervised Acoustic Word Embeddings
Puyuan Peng, Herman Kamper, and Karen Livescu
The 1st Workshop on Self-Supervised Learning for Speech and Audio Processing at NeurIPS, 2020