Human listeners seem to have an impressive ability to recognize a wide variety of natural sounds. However, there is surprisingly little quantitative evidence to characterize this fundamental ability. Here the speed and accuracy of musical-sound recognition were measured psychophysically with a rich but acoustically balanced stimulus set. The set comprised recordings of notes from musical instruments and sung vowels. In a first experiment, reaction times were collected for three target categories: voice, percussion, and strings. In a go/no-go task, listeners reacted as quickly as possible to members of a target category while withholding responses to distractors (a diverse set of musical instruments). Results showed near-perfect accuracy and fast reaction times, particularly for voices. In a second experiment, voices were recognized among strings and vice-versa. Again, reaction times to voices were faster. In a third experiment, auditory chimeras were created to retain only spectral or temporal features of the voice. Chimeras were recognized accurately, but not as quickly as natural voices. Altogether, the data suggest rapid and accurate neural mechanisms for musical-sound recognition based on selectivity to complex spectro-temporal signatures of sound sources.