We investigate how patent classification influences the interpretation of patent statistics. Innovation researchers currently make use of various patent classification schemas. Their classification methodologies are hard to replicate. Using machine learning techniques, we construct a transparent, replicable patent taxonomy, and a new automated methodology for classifying patents. We then contrast our new schema with existing ones using a long-run patent dataset. In a quantitative analysis of patent characteristics, we find strong evidence of classification bias; our interpretation of regression coefficients is schema-dependant. We suggest that much of the innovation literature needs to be re-examined in light of our findings.