It is becoming increasingly more necessary that speech recognition systems contain an accurate lexicon, consisting of likely word pronunciations that actually occur within a given domain. Given the increasing size of speech databases, it would appear that data driven approaches are best suited to derive such pronunciations. Presently, however, such an approach often introduces implausible pronunciations, resulting in a higher degree of confusability within the decoder. In this paper, we outline a novel data driven approach which aims to improve the quality of extracted word pronunciations through the removal of co-articulation effects and acoustic model misclassifications from the speech data. A number of selection constraints are additionally employed to exclude any improbable pronunciation alternatives. Initial experiments have shown that the approach does indeed provide plausible pronunciation alternatives without introducing improbable pronunciations.