Karaoke singing is a popular form of entertainment in several parts of the world. Since this genre of performance attracts amateurs, the singing often has artifacts related to scale, tempo, and synchrony. We have developed an approach to correct these artifacts using cross-modal multimedia streams information. We first perform adaptive sampling on the user's rendition and then use the original singer's rendition as well as the video caption highlighting information in order to correct the pitch, tempo and the loudness. A method of analogies has been employed to perform this correction. The basic idea is to manipulate the user's rendition in a manner to make it as similar as possible to the original singing. A pre-processing step of noise removal due to feedback and huffing also helps improve the quality of the user's audio. The results are described in the paper which shows the effectiveness of this multimedia approach.
ASJC Scopus subject areas
- Computational Theory and Mathematics
- Computer Graphics and Computer-Aided Design
- Information Systems
- Electrical and Electronic Engineering
- Theoretical Computer Science