Andy M. SarroffThis topic was investigated during my research assistantship for Dr. Ye Wang in the Sound and Music Computing Lab at the School of Computing in the National University of Singapore. In particular, I explored ways in which the health of elderly people may be benefitted by using music computation, information retrieval, large-scale database searching and crawling, and human biometry.
I worked as a research scientist at Sourcetone LLC, who develops Music Emotion Recognition (MER) technology. This work presents an interesting challenge, as emotional experience is highly personal and can vary greatly from song to song. Unlike most studies, their research categorizes emotion induction, rather than attribution.
(See a recent research proposal.)
Modeling human perception of spaciousness in recorded music is one of my primary research interests. Spaciousness, the impression of size, environment, source placement, and other key characteristics of music, is largely left out of music analysis tasks. Spaciousness is a multidimensional perceptual feature that transmits much important information in music. Fluctuations in spaciousness give us emotional cues, while virtual placement of sources through mixing techniques help us to imagine a soundstage, to name a few important information streams. Still, Music Information Retrieval researchers tend to ignore spatial cues in analysis. In fact, most of the time, only one channel of a stereophonic signal is used, or the signal is otherwise mono-summed. I feel that improved research into the perceived spatial information streams of music will yield rich rewards in overall performance for machine perception, music recommendation, and music control applications.
These were some of the motivating factors behind much of my graduate research and, ultimately, my Masters thesis, which was conducted in 3 stages:
A working model was developed, but there is still extensive research to be done. First, only three dimensions of spaciousness were examined:
Researchers have identified dozens of dimensions of perceived spaciousness. Predictive models would benefit from inclusion of additional dimensions. Secondly, they can be strengthened by using larger song databases, more human subjects, and experimentation with different learning algorithms. I hope to continue this research further.
A logical conclusion to a robust model of spaciousness (or any other predictive model of human perception) is to give a user new parametric control over musical attributes. I would like to investigate new ways to allow music listeners and producers to control music in perceptually meaningful ways. Such controls could take the form of EQ-like knobs that manipulate dimensions of spaciousness, for example.
I've been working with machine learning algorithms in a variety of ways, including using Support Vector Regression to nonlinearly map audio to subjective perception, auto-encoders for reduction of dimensionality, and supervised selection of robust feature sets. Upcoming tasks bring further experimentation with unsupervised clustering to reveal hidden structures for personalization and/or contextualization.
| 2011-present |
Dartmouth College, Hanover, NH PhD Student, Computer Science Department Supervison by: Michael Casey. |
| 2010-2011 |
Dartmouth College, Hanover, NH Masters Student, Computer Science Department Supervison by: Tanzeem Choudury and Andrew Campbell. |
| 2006-2009 |
New York University, New York City, Master of Music Music Technology Program Thesis Title: "Spaciousness in recorded music: Human Perception, objective measurement, and machine prediction" Thesis Advisor: Juan P. Bello |
| 1996-2000 | Wesleyan University, Middletown, CT, Bachelor of Arts in Music |
| 20011-present |
Google, Inc. Sponsored under Google Faculty Research Reward. |
|||||||||||||||||||||||||||||
| 20011-present |
Neukom Institute for Computational Science Sponsored Graduate Fellowship. |
| May, 2009 |
Music Technology Student-of-the-Year Award Awarded once a year to one student by the New York University Music Technology program for "outstanding achievement and citizenship." |
| May, 2008 |
Dean's Grant to Support Graduate Student Research Competitive 1-year grant awarded by the Steinhardt School to outstanding students for sponsored research. |
I produced music for several years in New York City. I began by working for Greene Street Recording. I worked briefly at Mission Sound Recording and Loho Studios in 2001, before moving to RPM Electronic Sound Studios, where I soon became Chief Engineer. Toward the beginning of 2004, I opened a production facility. Working under the name Woodshop Sound, I recorded, mixed, and mastered albums until 2007. (Note: The web site is inactive and very old!)
My professional experience in music continues to inform my work as a researcher. To this day, I am very interested in modeling the perceived attributes of production in recorded music.
I have played drums in several bands, releasing two recordings. At Wesleyan University, I studied Samba percussion; South Indian Mridangam and vocal percussion (solkattu); Javanese gamelan; and vibraphone with Jay Hoggard.

In this research project we investigate automatic methods to identify rhythmic patterns embedded within the multiple layers of sound in dance music, popular music, and recordings from non-Western regions of the World. Of particular interest is finding recurring rhythmic patterns in a large database of recordings. Sponsored by Google Inc., (Faculty Research Award) and the Neukom Institute for Computational Science (2011-2012 Graduate Fellowship).

It is well known that reverberation negatively impacts the performance of many signal analysis and decomposition tasks, such as blind source separation, speech recognition, and speaker diarization. We investigate methods for compactly describing a reverberant environment that may offer discriminative advantages to these tasks. Investigated while at Gracenote, Inc.

Automatic Playlist Generation (APG) is an algorithmic means for grouping and ordering music. APG systems are an integral part of automatic music recommendation systems; they generate sequences of songs and aim to maximize the listening time of their subscribers. It is therefore important that adjacent songs preserve controlled novelty in APG systems. This project investigated ways to encode expert knowledge of song transitions to generate playlists.

In order to facilitate a predictive model for perceived spaciousness, data was collected from human subjects. The study was launched in two phases—on the internet, and in a laboratory. It was programmed with html, javascript, php, flash, and mysql. Within approximately 2 weeks, over 1500 responses were collected. This data was used to confirm the consistency of perceived spaciousness across subjects and then used to train a predictive model. The web site, which continues to collect data, has attracted visitors from approximately 25 countries.

Wide-Volve is a graphical environment (programmed in Matlab) in which the lateral angle profile of the spectral components of an input file can be modified and/or matched to another song’s profile. This processor, using an azimuthal discrimination strategy, places spectral components of an input audio file into azimuthal bins. The absolute lateral angle is calculated per frequency bin using the sine-sine panning law. The user sets a power threshold above which signal spectra are plotted against absolute lateral angle. To boost perceptual relevancy, spectral components are plotted on a frequency axis warped to the Bark scale. A selector tool allows selection of frequency spectra. The spectra is scaled to the desired absolute lateral angle or, if a target file has been loaded, to the target file’s lateral angle profile. After previewing, a new audio file can be saved. A full description of the processor can be found in this paper.

This java application was built to easily browse a large database of collected human responses to audio. It automatically populates itself by querying a data base and retrieving whichever tables and columns are desired. The bottom panel shows all of the songs in the database, their average ratings, and relative standard deviations in ratings. By clicking on a graphical representation of a song, a user can hear the audio file. The top panel allows the user to choose which responses to visualize; responses can be filtered by any table column and unique data field. While this browser was built to explore a very specific experiment database, it was designed to be easily generalized to other experiments and databases.

A Max patch which provides a convenient interface for scaling the FFT coefficients of the mid or side signal of a stereophonic signal. Although direct scaling of coefficients can lead to ringing, this processor can be used to generate quite interesting sounds from ordinary stereophonic audio.
| August-December, 2009 |
Sourcetone, LLC, New York Title: Music Analysis, Classification Research, and Product Development Description: Research in Music Emotion Recognition, including machine learning, classification, and signal analysis. |
| Summer, 2011 |
Gracenote, Emeryville, CA Title: Research Intern, Media Technology Lab Projects: Source/location invariant characterization of reverberation in audio. |
| Summer, 2008 |
Sennheiser Electronic Corporation, R&D USA, Palo Alto, CA Title: Audio DSP Engineer Project: Development and implementation of a methodology for the objective evaluation of a new microphone. |
| Summer, 2008 |
AuSIM, Inc., Palo Alto, CA Title: Engineer Projects: Calibration assistance for the AuSim “Vectsonic” system for the External Effects Room at NASA Langley Research Laboratory in Hampton, VA. Testing and troubleshooting of a wearable communication system “3DVx” and its components, including WiFi radios, orientation trackers, GPS, touch displays, and auditory displays. |
| Summer, 2007 |
AuSIM, Inc., Palo Alto, CA Title: Engineer Projects: Development for audio and acoustics products delivered to the U. S. Army Research Lab (ARL) in Aberdeen, MD, and McGill University. These included system configuration and interface development for a new 5-room laboratory at ARL's Environment for Auditory Research (EAR) facility, and interface design for a robotic acoustic capture microphone array commissioned by McGill University. |
| January-July, 2010 |
Dr. Ye Wang, Computer Science Departent, School of Computing, National University of Singapore Research topics included large-scale search and retrieval of music content; music information retrieval for health related applications. |
| Academic Years, 2008-2009 2007-2008 |
Dr. Juan P. Bello, New York University Projects included Machine Listening on the Studio (funded by the Steinhardt Technology Award); experimentation and research in objective measurements for spaciousness and predictive modeling of music perception; and organization of a core NYU Music Information Retrieval research group and seminar series. |
| January-July, 2010 |
Teaching Assistant, Computer Science Department, School of Computing, National University of Singapore Designed and taught 3.5 lectures for a new module, “Sound and Music Computing.” Assisted Dr. Ye Wang in all other matters related to the class (grading, labs, mentoring, etc.). |
| Academic Year, 2008-2009 |
Music Department Tutor, New York University, Music Technology Program Trained and assisted students in Matlab programming; digital signal theory and processing; and Music Information Retrieval. |
Sarroff, A.M. and Bello, J.P. Toward a Computational Model of Perceived Spaciousness in Recorded Music. The Jaurnal of the Audio Engineering Society, vol. 59, no. 7/8, pp. 498-513, 2011.
Miluzzo, E., Papandrea, M., Lane, N.D. Sarroff, A.M., Giordano, S., and Campbell, A.T. Tapping into the Vibe of the City using VibN, a Continuous Sensing Application for Smartphones. In Proc. of First International Symposium on Social and Community Intelligence (SCI'11), co-located with 13th International Conference on Ubiquitous Computing (Ubicomp 2011), Beijing, Sept. 2011. |
Zhao, Z., Wang, X., Xiang, Q., Sarroff, A.M., Li, Z. and Wang, Y. Large-scale music tag recommendation with explicit multiple attributes. In Proceedings of ACM Multimedia 2010, Firenze, Italy. October, 2010. |
Sarroff, A.M. and Bello, J.P. Predicting the Perceived Spaciousness of Stereophonic music Recordings. In Proceedings of the 6th Sound and Music Computing Conference (SMC-09), Porto, Portugal. July, 2009. |
Sarroff, A.M. and Bello, J.P. Measurements of Spaciousness for Stereophonic Music. In Proceedings of the 125th Convention of the Audio Engineering Society, San Francisco, USA. October, 2008. |
Sarroff, A.M. Spaciousness in recorded music: Human perception, objective measurement, and machine prediction. M.M. Thesis, New York University. 2009. |
Kong, Q., Sarroff, A.M., Topel, S., and Casey, M., "Getting Into the Groove with Hierarchical Independent Component Analysis", Neural Information Processing Systems, Workshop on Music Processing, December, 2011. |