Resource misallocation as a mediator of fitness costs in antibiotic resistance

Antimicrobial resistance poses a threat to global health and the economy. It is widely accepted that, in the absence of antibiotics, drug resistance mutations carry a fitness cost. In the case of rifampicin resistance in fast-growing bacteria, this cost stems from a reduced transcription rate of the RNA polymerase resulting in slower ribosome biosynthesis. However, this relationship does not apply in the slow-growing Mycobacterium tuberculosis, where the true mechanism of fitness cost of rifampicin resistance as well as the impact of compensatory evolution remain unknown. Here we show, using global transcriptomic and proteomic profiling of selected M. tuberculosis mutants and clinical strains, that the fitness cost of rifampicin resistance in M. tuberculosis is the result of the physiological burden caused by aberrant gene expression. We further show that the perceived burden can be increased, effectively suppressing the emergence of drug resistance.


19
Antimicrobial resistance poses a threat to global health and the economy. It is widely accepted 20 that, in the absence of antibiotics, drug resistance mutations carry a fitness cost. In the case of 21 rifampicin resistance in fast-growing bacteria, this cost stems from a reduced transcription rate of 22 the RNA polymerase resulting in slower ribosome biosynthesis. However, this relationship does 23 not apply in the slow-growing Mycobacterium tuberculosis, where the true mechanism of fitness cost 24 of rifampicin resistance as well as the impact of compensatory evolution remain unknown. Here 25 we show, using global transcriptomic and proteomic profiling of selected M. tuberculosis mutants 26 and clinical strains, that the fitness cost of rifampicin resistance in M. tuberculosis is the result of 27 the physiological burden caused by aberrant gene expression. We further show that the perceived 28 burden can be increased, effectively suppressing the emergence of drug resistance. 29 Keywords 30 Antimicrobial resistance, fitness cost, tuberculosis, proteomics, transcriptomics, epistasis, 31 compensatory evolution, resource allocation 32 33 Rifampicin targets the bacterial RNA polymerase (RNAP), and resistance to rifampicin is usually 48 mediated by mutations in the β subunit of RNAP (Campbell et al., 2001). Several studies point to 49 the rate of transcription, particularly as it pertains to the synthesis of ribosomal RNA and 50 ribosomal proteins, as an important mediator of growth rate (Gourse et al., 1996;Thiele et al., 51 2009). A slowing down of transcription is therefore the prime mechanistic candidate for the cost 52 of rifampicin resistance (Qi et al., 2014;Reynolds, 2000). The mechanism linking RNAP activity 53 to ribosome biosynthesis provides a compelling explanation for the cost of rifampicin resistance 54 in rapidly dividing bacteria such as Escherichia coli and Pseudomonas aeruginosa whose growth relies 55 on the rapid replenishment of biosynthetic machinery lost through cell division (Ehrenberg et al., 56 2013). Importantly, the fitness cost of rifampicin resistance can be mitigated or even reversed 57 through the acquisition of secondary, compensatory mutations in the α, β and β' subunits of 58 RNAP that seem to restore normal enzyme function (Qi et  for a shared expression signature of rifampicin-resistance across the tested strain pairs, we show a 86 correlation between the fitness cost of the rifampicin-resistance conferring mutation and the 87 extent to which its presence imparts a deviation from the proteome composition of the wild-type. 88 Finally, we show that this correlation could be exploited to suppress the emergence of rifampicin 89 resistance. 90 Strains comprised in the "Evolutionary trajectory of rifampicin resistance" set were derived from a single clinical isolate (DS, N0155) by isolation of a Ser450Leu mutant in the lab and the subsequent passage for 200 generations in the absence of rifampicin. These strains were used to identify expression changes that are reversed by compensation -signature of compensation. The generalizability of our finding was checked using the "Genetic diversity strain set" containing five independent clinical isolates and their rifampicin-resistant derivatives. All rifampicin resistant strains shared the same resistance mutation -RpoB Ser450Leu. B. Experimental outline for the sampling and analyses.

91
Compensatory mutations mitigate resistance-imposed expression changes We previously reported the result of a directed evolution experiment in which we identified a 98 mutation in the BBDP domain: RpoC Leu516Pro as a putative compensatory mechanism for the 99 fitness cost of the rifampicin-resistance conferring mutation RpoB Ser450Leu in a clinical 100 isolate (Comas et al., 2012). The strains generated by that study comprise the original drug-101 susceptible isolate (DS), its laboratory-derived rifampicin-resistant mutant (RpoB Ser450Leu, 102 RifR) and the resulting evolved strains obtained by serial passage in the absence of rifampicin for 103 200 generations (DS evo and RifR evo , respectively, see Figure 1A). Together these strains offer a 104 representative snapshot of the evolutionary process that passes through the initial emergence of 105 (costly) drug resistance and leads to the establishment of a mature drug-resistant strain whose 106 fitness is indistinguishable from its drug susceptible ancestor. We therefore hypothesised that 107 comparative transcriptomic and proteomic expression profiling of these strains will allow us to 108 determine the signature of the fitness cost associated with rifampicin resistance. 109 First, we determined the relative fitness of RifR. Using a mixed effect linear regression model to 110 analyse growth assays, we noted a 26.4% decrease (CI 95% : 21.5 -31.0%, p < 0.001) in the growth 111 rate of RifR when compared to DS. The comparison of their evolved counterparts -DS evo and 112 RifR evo -showed no significant differences (-1.2%, CI 95% : -10.8 -7.1%, p = 0.814), illustrating the 113 fact that RpoC Leu516Pro does indeed compensate the fitness cost of rifampicin resistance. 114 We aimed to identify differences in the baseline, unperturbed, gene expression as a proxy for 115 describing the biological basis for reduced fitness in RifR. We sampled actively growing bacterial 116 cultures of each of the four strains, extracting total RNA and protein to be profiled using RNA 117 sequencing (RNAseq) and sequential window acquisition of all theoretical mass spectra 118 (SWATH-MS), respectively (see Figure 1B). In total, we were able to obtain RNA transcript 119 counts for all present regions of the Mtb genome and reliably quantify 2,886 proteins across our 120 samples ( Figure S1). We used differential expression analysis to test our hypothesis that the 121 compensatory mutation RpoC Leu516Pro had the net effect of reversing, at least partially, the 122 expression changes brought about by the rifampicin resistance mutation RpoB Ser450Leu. We 123 named this trend a "signature of compensation" -see Figure 2A and we derived it by identifying 124 genes that are uniquely differentially expressed in RifR compared to the other three strains in our 125 dataset. To maximise the probability of identifying the signature of compensation, we chose an 126 inclusive definition of differential expression: a p-value of less than 0.05 after adjusting for 127 multiple testing (see Methods). In keeping with our inclusive approach, we also deliberately did 128 not use an effect size threshold (e.g. minimum log-fold change). 129 Figure 2: Signature of compensation. A. The relative fitness of drug resistant strains (DR) is expected to be lower than wild type (DS) at first, but then is expected to increase due to compensatory evolution. The phenotypic equivalent of this trend is illustrated as an increase/decrease in a measurable trait upon the emergence of resistance that is then returned to its previous level through compensation. We refer to this dynamic as the "Signature of Compensation". B. Plot of transcript counts per million bases (TPM) and label free quantifications (LFQ) of cellular proteins for genes whose expression is perturbed by the Ser450Leu mutation in RpoB and returned to wild type in the presence of the compensating Leu516Pro mutation. All results were standardized across measurements for a single gene to allow the comparison between strains. Grey traces show genes that are significantly more highly expressed in RifR, yellow traces show genes that were significantly less highly expressed in RifR. The red and blue bold lines show the median of the sample for more and less highly expressed proteins, respectively. Data of three independent biological replicates for each strain are shown.

130
Using these criteria, we identified 536 transcripts that could be involved in the cost of resistance. 131 289 transcripts were less abundant and 247 were more abundant in RifR compared to the other 132 samples. Similarly, 536 proteins showed a significant signature of compensation: 260 proteins 133 were more and 276 were less-abundant in RifR (see Figure 2B). Gene set enrichment analysis of 134 the transcriptomic and proteomic data pointed to iron homeostasis being significantly affected. 135 Specifically, it indicated a higher expression, in RifR, of genes that are repressed by the iron-136 dependent regulator (IdeR, Rv2711) in iron replete conditions. Among them, there was a 137 significant enrichment of genes involved in polyketide and non-ribosomal peptide synthesis, 138 which include the biosynthetic machinery for the sole Mtb siderophore: mycobactin (see Figure  139 S2-4). These changes suggested that RifR faced a shortage of iron in our experimental conditions. 140 The availability of iron is an essential requirement for Mtb growth, both in culture and during 141 infection, and iron acquisition systems are therefore key virulence factors (Jones et al., 2014;142 Reddy et al., 2013;Wells et al., 2013). Hence, an increased requirement for iron could manifest 143 itself as a loss of relative fitness. The fact that RpoB Ser450Leu led to a modification of the 144 expression of genes involved in iron homeostasis and that RpoC Leu516Pro reversed the effect 145 provides a compelling alternative mechanism underpinning the apparent fitness cost of 146 rifampicin resistance. If the disruption of iron homeostasis drives fitness cost, we would expect 147 that iron supplementation should mitigate the relative cost of RpoB Ser450Leu. Furthermore, 148 based on the expression profile, we expected that RifR should produce more mycobactin at 149 baseline than DS, potentially influencing the overall growth rate of the mutant. 150 We addressed the first hypothesis by comparing growth rates of RifR and DS in the presence or 151 absence of 10 µM hemin -an additional source of iron that is by itself sufficient to support the 152 growth of a mutant defective in mycobactin biosynthesis. Importantly, hemin and mycobactin 153 provide two separate routes of iron uptake, which allows us to side-step issues that might emerge 154 from deficient iron transport (Jones et al., 2014). The presence of hemin did not change the cost 155 of RifR, which we calculated to be 18.6% in the absence and 20.9% in the presence of hemin for 156 this experiment (Mixed effect linear model, p = 0.737). Similarly, hemin did not impact the 157 growth rate of DS (-4.7%, CI 95% : -16.3 -2.3%, p = 0.128). In summary, iron did not appear to 158 limit the growth of RifR under normal conditions. 159 IdeR-regulated genes that are either induced (black inner circle) or repressed (white inner circle) in low iron conditions. Hexagons represent IdeR-independent iron responsive genes that are induced (white inner hexagons) or repressed (black inner hexagons) in low iron conditions. We used blue and red to indicate significantly lower or higher RNA expression in RifR, respectively -(n=12, see Methods for further details). Diamonds represent transcriptional modules as defined by Petersen et al, black diamonds indicate modules that contain at least 3 IdeR-responsive genes. Edges connect gene nodes with the module nodes they belong to. Labels 1-7 refer to Module 502 (1), Module 525 (2), Module 267 (3), Module 446 (4), Module 231 (5), Module 086 (6) and Module 295 (7) from the original publication. B. Relative mycobactin levels expressed as maximum peak heights for DS and RifR in normal medium (grey dots) and iron-supplemented medium (10µM hemin, red dots). Each filled circle represents the quantification of an independent biological replicate. Unfilled circles represent the mean of the observations.

160
Next, we addressed the production of mycobactin. We prepared whole cell extracts from DS and 161 RifR grown in both, normal medium and medium supplemented with 10 µM hemin. We found 162 that on average RifR produced more mycobactin than DS, corroborating the physiological 163 relevance of the increased baseline expression of mycobactin biosynthesis genes. We also 164 observed a slight decrease in the production of mycobactin in bacteria grown in the hemin-165 supplemented medium, pointing to a modification of the expression of mycobactin biosynthesis 166 cluster in response to iron (See Figure 3). Given that the growth rate was not affected by the 167 presence of hemin, these findings suggest that mycobactin itself does not modulate the growth 168 rate of the mutant. It is therefore possible that the higher expression of the biosynthetic cluster 169 itself might impart a fitness cost. 170 Interestingly, while significantly enriched, only half of the genes reported to be repressed by IdeR 171 (Rodriguez et al., 2002) in iron-replete conditions were part of the signature of compensation (22 172 out of 40 genes). This prompted us to take a closer look at the IdeR regulon and its regulation. modules included IdeR-independent iron-responsive genes. All the genes that we identified as 180 candidates for compensation belonged to Modules 1-4, while none of the genes included in the 181 other modules were found to be differentially expressed in RifR. A key difference among 182 modules was that IdeR-regulated genes represented more than half of all the genes in modules 183 affected by compensation but fewer than half in those that were not part of the "signature of 184 compensation". Mapping proteomic data onto the same expression network produced similar 185 results (see Figure S5). Interestingly, few of the IdeR-independent iron-responsive genes were 186 part of the signature of compensation. This pattern implies a modulation of the canonical 187 function of IdeR, either through regulatory inputs from other transcription factors, or some 188 other mechanism. 189 These results supported our hypothesis that mutations in rpoB impart changes to the baseline 190 expression profile of Mtb that could be reversed in the presence of a compensatory mutation in 191 rpoC. Combining the expression data with our findings that iron supplementation and mycobactin 192 levels did not affect RifR growth rates, we concluded that the transcriptional changes were not 193 driven by the demand for iron. Instead, these changes might be a reflection of a dysfunction of 194 RNAP -e.g. differences in promoter specificity or modified interaction with IdeR, whose 195 downstream consequences may impose a fitness effect. For example, as the mycobactin 196 biosynthesis cluster comprises several large proteins, their excessive production could represent a 197 drain on the cell's resources. If true, we would expect such effects to be universal across all Mtb 198 strains carrying this rpoB mutation. 199 The impact of RpoB Ser450Leu is shaped by epistasis 200 We wanted to test the hypothesis that higher expression of the mycobactin biosynthetic cluster is 201 a general feature of rifampicin resistance in Mtb and therefore the underlying cause of its fitness 202 cost. To do so, we generated RpoB Ser450Leu mutants in five genetically diverse clinical isolates 203 belonging to two different Mtb lineages and profiled them. Globally, Mtb can be grouped into 204 seven distinct genetic lineages each with a specific geographic distribution (Gagneux, 2018). Mtb 205 lineages can differ in their interaction with the human host, the dynamics of disease progression, 206 and also in their apparent propensity to acquire drug resistance (Coscolla and Gagneux, 2014;207 Ford et al., 2013). We chose strains belonging to Lineage 1 and 2, because of their large 208 phylogenetic separation (see Figure S6) and more importantly, because drug resistance is often 209 associated with Lineage 2 and relatively rare in Lineage 1 (Borrell and Gagneux, 2009). We 210 expected that the comparison of the transcriptome and proteome between the Ser450Leu 211 mutants and their cognate wild type ancestor would allow us to identify general patterns of 212 fitness cost linked to this mutation. 213 It is important to note that this comparison did not include any compensated strains, i.e. strains 214 carrying mutations in the BBDP domain. We were therefore unable to focus our analysis 215 exclusively on genes whose expression was corrected by the presence of an rpoC mutation. 216 Nonetheless, direct comparison of RifR and DS is virtually indistinguishable from the signature 217 of compensation when considering IdeR-regulated genes and therefore serves as a reasonable 218 proxy for our analyses (see Figure S5). 219 We started by measuring the growth characteristics of the wild type isolates and the relative cost RpoB Ser450Leu mutation differed as well, from a modest 2 % (mixed effect linear regression, p 223 = 0.71) to a pronounced 27 % (mixed effect linear regression, p = 5.6 × 10 -6 ). 224 We obtained the expression profiles for each strain to check whether the pattern we identified for 225 IdeR-repressed genes was a universal phenotype for RpoB Ser450Leu mutants. Analysing the 226 transcriptomic data by performing a single comparison across the five strain pairs, we found that 227 only 17.5% (7/40 genes) of the IdeR-repressed genes were significantly differentially expressed. A 228 single gene belonging to the mycobactin biosynthesis cluster was included in that number. 229 Proteomic analysis revealed a similar result -17.1% (6/35 detected proteins) were found to be 230 significantly differentially expressed across all strains, none of which belonged to the mycobactin 231 biosynthesis cluster. None of the iron-homeostasis gene sets highlighted in the "signature of 232 compensation" were significantly differentially expressed across all strains. Since these findings 233 were contrary to our expectations, we stratified the analysis and mapped the differential 234 expression results for each strain onto the IdeR-and iron-responsive gene network we collated 235 earlier. These results echoed our combined analysis: the signature of compensation was not 236 universal across the tested strains. N0155, which corresponds to "DS", is the only strain to show 237 a transcriptional profile consistent with the signature of compensation (see Figure 4A). Proteomic 238 data corroborated this finding (see Figure S7). It is important to note that these data represent an 239 independent replication of the experiments, from which we derived the signature of 240 compensation, showing that our original results are robust and reproducible. However, the 241 absence of a coherent IdeR-responsive phenotype was clear evidence of epistasis and raised a 242 broader question: are there any commonalities in the phenotypic manifestation of the RpoB 243 Ser450Leu mutation among our set of strains? 244  Figure 3, coloured based on transcriptional differential expression data from pairwise comparison of genetically distinct rifampicin-susceptible clinical isolates and their cognate RpoB Ser450Leu mutants. RifR and N0155 refer to an independent sampling of the same strain pairs. See Figure S7 for the proteome counterpart of this plot. For RifR, N0072 and N0157 the plot is based on the comparison of three drug susceptible and three rifampicin resistant samples. For N0155, N0145 and N0052 we used two samples of each. B. Representation of the enrichment of significantly differentially expressed genes within individual transcriptional modules, as defined elsewhere (Peterson et al., 2014). The columns alternate proteomic (P) and transcriptomic data (R). "ALL" refers to the global differential expression analysis of all rifampicin-susceptible against all rifampicin-resistant strains. The remaining column annotations refer to individual pair-wise comparisons in different genetic backgrounds. Black squares represent no significant enrichment, mauve squares and yellow squares show enrichment at 0.01<p<0.05 and p<0.01 using a Fisher's exact test. These p-values are not adjusted for multiple testing. Modules covering the DosR-regulon and IdeR-iron repressed regulon are highlighted separately.

245
To address this question we sought to identify expression modules (Peterson et al., 2014) whose 246 membership was well represented among significantly differentially expressed genes in at least 247 one pair-wise comparison between a rifampicin-resistant strain and its cognate drug-susceptible 248 ancestor (see Methods for details). Using transcriptomic and proteomic data, we identified 33 249 expression modules that fitted our criterion (see Figure 4B). There was virtually no consensus 250 across the strains in the transcriptional or translational response to the rpoB mutation. The only 251 case where we observed partial agreement across genetic backgrounds concerned some of the 252 modules controlled by the hypoxia-responsive regulator DosR (Park et al., 2003). As with 253 modules containing IdeR iron-repressed genes, we observed only partial regulon induction for 254 DosR. Specific modules were clearly involved in the expression changes (either protein or 255 transcript) in each background, but the impact of these was strain-specific. Importantly, the impact of resistance on the expression profile of any two strains was found to 275 be independent of the genetic distance between them (see Figure S11). 276 So far, we showed that the RpoB Ser450Leu causes a considerable re-organization of baseline 277 gene expression, that this perturbation can be reversed by a compensatory mutation in RpoC and 278 that the specific phenotypic manifestation was dependent on mutations that occurred more 279 recently than those defining individual lineages. These findings were consistent with our 280 observation that the same mutation imposed a different fitness cost to different strains. We 281 therefore sought to find correlates of the varying fitness costs. 282 Deviation from baseline expression correlates with the cost of rifampicin resistance 283 Pleiotropic phenotypes of the kind described above are not normally addressed, however we 284 wanted to explore whether the extent of the expression perturbations correlated with the varying 285 fitness costs of Ser450Leu we observed in different genetic backgrounds. We reasoned that the 286 cumulative impact on expression disruption, rather than the dysregulation of individual genes, 287 would provide a conduit for a loss of fitness. 288 In the first instance, we considered the correlation between the fitness cost of the rpoB mutation 289 and the overall expression distance between the mutant and its cognate wild type strain (See 290 Figure S12). Through this approach, we were able to detect a relationship between cost and 291 expression differences for the expressed proteins (R 2 = 0.83, p = 0.031, ordinary least squares 292 linear regression) but not RNA (R 2 = 0.39, p = 0.258, ordinary least squares linear regression). 293 Given that the correlation was stronger in the proteome compartment, and that the proteome 294 compartment seemed more affected by resistance, we elaborated on our observation by 295 incorporating a measure of physiological cost for each protein. We used two different metrics for 296 cost. In the simpler case we used the molecular weight of amino acids as proxy for the resource 297 investment necessary to generate each protein (Seligmann, 2003). We also used estimates of ATP 298 cost for each amino acid in E. coli as a way to approximate the level of energy investment a 299 bacterial cell makes when synthesising its proteome (Akashi and Gojobori, 2002). Both metrics 300 showed that drug resistance imposes an additional physiological cost to the baseline proteome 301 (Molecular Weight: Mann-Whitney U-test, p = 8.26 × 10 -4 , ATP equivalents: Mann-Whitney U-302 test, p = 4.50 × 10 -4 , see Figure S13). Furthermore, this cost was negatively correlated with the 303 relative fitness of the RpoB Ser450Leu mutation in a given strain background (ρ s = -0.90, p = 304 0.04) -the greater the deviation from the resource investment of the ancestral proteome, the 305 larger the cost of the mutation (see Figure 5A). Growth rate and gene expression are not 306 independent from each other. To test the possibility that the observed correlation may be an 307 artefact of our analysis, we took advantage of the natural variation in growth rates of different 308 drug-susceptible clinical isolates in our medium and compared them to the relative costs of 309 expression (See Figure S14). We performed a pairwise comparison across all the tested strains 310 and observed no statistically significant correlation between the differences in the investment into 311 the proteome and the difference in growth rates (ρ s = 0.34, p = 0.33). The differences in the 312 allocation of resources into the protein compartment of different bacterial strains were therefore 313 not the main determinant of variation in their respective generation times. 314 Taken together, our results seemed to suggest that the ultimate manifestation of the disruption of 315 wild type baseline gene expression by RpoB Ser450Leu was a net increase in the biosynthetic 316 input required to maintain the steady state proteome: the greater the cost of the disruption, the 317 greater the slowing down of growth in a given strain background. We propose this as the 318 "Burden of Expression" hypothesis of the fitness cost of rifampicin resistance. 319 The fitness cost of RpoB Ser450Leu correlates with increased resource requirements. A. The relative fitness of Ser450Leu RpoB mutants estimated from growth rate data is negatively correlated with the magnitude of the deviation from the resources allocated to the wild type proteome. ϱs -Spearman correlation. Error bars indicate the 95% confidence interval for the data.

320
Carbon allocation rather than ATP availability modulates cost of resistance 321 An implication of the "Burden of expression" hypothesis is the possibility of suppressing the 322 emergence of rifampicin-resistance in mycobacteria by maximising the additional biosynthetic 323 cost imposed by the deviation from the baseline expression. We tested two types of conditions 324 that may impose such a stress: inhibition of ATP synthesis and variation of carbon-source quality. 325 The first would disrupt the ability to generate energy through catabolic processes, while the 326 second would place more emphasis on the anabolic aspects of bacterial growth. In the first 327 instance, we tested the susceptibility to bedaquiline, an ATP synthase inhibitor that leads to a 328 decrease in intracellular ATP levels in Mtb (Andries et al., 2005). Given the higher baseline cost of 329 their proteome, we expected that RpoB Ser450Leu mutants should show an increased 330 susceptibility to bedaquiline commensurate with their relative loss of fitness. We did not observe 331 any correlation between bedaquiline susceptibility and the cost of the RpoB Ser450Leu mutation 332 (see Figure 5B). 333 Next, we explored varying carbon source quality, expecting substrates that force the bacterial cell 334 to rely more heavily on anabolic processes to serve as amplifiers for the perceived cost of 335 rifampicin resistance. A related phenotype has been reported before for RpoB Ser450Leu (Song et  336 al., 2014). We chose the Luria-Delbrück fluctuation assay as an unbiased readout for the overall 337 increase in the cost of rifampicin-resistance, because its frequency of resistance estimate contains 338 a signal for the ability of drug resistant bacteria to propagate within the population prior to 339 antibiotic exposure (Ycart, 2013). The global increase in the cost of RpoB mutations would 340 therefore manifest itself as an apparent decrease in the frequency of resistance, as the population 341 size of pre-existing RpoB mutants would be smaller due to limited expansion post-emergence. 342 We chose glycerol, citrate and acetate to test our hypothesis in the soil organism Mycobacterium 343 smegmatis, whose patterns of rifampicin resistance mirror those of Mtb (Borrell et al., 2013). As 344 expected, these three carbon sources supported different growth rates with measured generation 345 times of the wild type being 3.24 h ( 95% CI: 3.23 -3.25 h), 6.17 h ( 95% CI: 6.09 -6.25 h) and 17.62 346 h ( 95% CI: 17.61 -17.62 h), respectively. We then determined the frequency of rifampicin 347 resistance for bacteria grown on each carbon source using the Luria-Delbrück fluctuation assay. 348 We found a striking correlation between carbon source and the calculated frequency of 349 resistance, with bacteria grown in glycerol giving rise to rifampicin-resistant bacteria at a rate of 350 1.3 × 10 -8 ( 95% CI: 1.2 × 10 -8 -1.5 × 10 -8 ), those grown in citrate at a rate of 3.4 × 10 -9 ( 95% CI: 2.9 351 × 10 -9 -4.0 × 10 -9 ) and acetate-cultured bacteria at a rate of 4.5 × 10 -10 ( 95% CI: 3.4 × 10 -10 -5.6 352 × 10 -10 ) -see Figure 5C. This trend was remarkable, because it showed that changing only the 353 carbon source, keeping all other variables constant, could lead to a 28-fold change in the 354 frequency of resistance. 355 The disparity in outcomes between the two experimental approaches suggests that the availability 356 of catabolic energy does not disproportionately influence the ability of RpoB mutants to survive. RpoC Leu516Pro reduced both, the apparent fitness cost of rifampicin resistance and the 366 magnitude of the expression changes arising from it. However, we also showed that the nature of 367 the perturbation was not consistent across different genetic backgrounds. Instead, we observed a 368 strain-specific response to the RpoB mutation, both in terms of the relative impact on growth 369 and the rearrangement of gene expression. We further observed that the magnitude of the fitness 370 cost that RpoB Ser450Leu imposes on a strain was related to the overall increase in the resources 371 allocated to the proteome. Based on these observations, we proposed the "Burden of expression" 372 hypothesis, with which we posited that in Mtb, the cost of rifampicin resistance was mediated by 373 the metabolic burden imposed by the modified baseline protein expression of resistant strains. 374 Elaborating on this hypothesis we demonstrated that interfering with anabolic processes could 375 suppress the emergence of rifampicin resistance in the related organism M. smegmatis. 376 The "Burden of expression" hypothesis stems from experimental data with clear caveats. First, 377 we started our analyses assuming that ribosomal biosynthesis is unlikely to play a key role in the 378 cost of rifampicin resistance in Mtb and that therefore expression data were a better window into 379 the modified physiology. Our data seem to support the validity of this assumption: ribosomal 380 proteins represented only 5.5%, on average, of the total protein biomass in our experiments. This 381 proportion was marginally higher in RpoB mutants, and it seemed to increase with increasing 382 generation time (see Figure S15). These trends were more consistent with a cost imposed by the 383 metabolic burden of making ribosomes. Second, some of our key conclusions are based on a 384 relatively small number of strains. Nonetheless, to the best of our knowledge, this sample set 385 represents the most comprehensive and best curated account of rifampicin resistance-induced 386 global expression changes in Mtb to date, covering both: evolutionary dynamics and phylogenetic 387 diversity. We were also able to show that patterns of expression detected in the DS-RifR 388 comparison were robust when the same strain pair was sampled again (see Figure 4 and Figure  389 S7). Importantly, key inferences that led us to propose the hypothesis came from SWATH-MS 390 proteomic data drawn from the five different strain backgrounds. These data showed a clear 391 clustering of biological replicates (see Figure S16), with the exception of N0145 for which we 392 were also unable to detect a significant cost for the Ser450Leu mutation or any significant 393 changes to the expression. Third, we assumed that label free quantification (LFQ) using the "best 394 flyer peptide" or TopN approach, which reflects the proportional abundance of individual 395 proteins within our samples (Schubert et al., 2015), can be used to draw conclusions about the 396 resource investment of the cell and can be extended to the growth rate of bacteria. It is possible 397 that the roles are reversed and the growth rate of bacteria in fact determines the protein 398 complement being expressed (Beste et al., 2007). We addressed this possibility by performing a 399 comparison of proteome investment and growth rate for wild type strains only. If the growth rate 400 of Mtb did indeed determine the protein complement of cells across genetic distances on an 401 evolutionary timescale, we would expect a strong correlation between differences in proteome 402 and differences in growth rates between any two strains. This was however not the case (see 403 Figure S13). Finally, we also assumed that the proteome plays a central role in imposing a limit to The strain-specific nature of resistance-related expression perturbations can be used to provide a 441 credible link to disparate growth rate modulation. Our suggestion that proteome composition 442 influences growth rate is not without precedent. This connection has been made before (Scott et 443 al., 2010), and resulted in the formulation of a collection of "growth laws" that linked growth 444 rates to the partitioning of the limited proteome between ribosomes and other proteins carrying 445 out the rest of the cellular functions. Growth on different carbon sources impacted this balance, 446 with "poorer" ones requiring a greater investment into the functional proteome, presumably 447 because of the need for anabolic reactions increased the reliance on biosynthetic enzymes. A 448 similar relationship has been observed in a wide range of microbial species (Karpinets et al., 449 2006). An elaboration of these growth relationships also led to the conclusion that the efficiency 450 of proteome allocation can impact growth rates and cell physiology (Basan et al., 2015). Our 451 finding that the increase in the relative cost of the proteome brought about by the gain of a 452 mutation correlates with the relative fitness of that mutation is consistent with these reports, as is 453 our observation that anabolic processes may play a mechanistic role in setting the cost of a 454 mutation. 455 The observed differential cost of rifampicin resistance across Mtb strains, provides a lens through 456 Bacteria were cultured in 1l bottles containing large glass beads to avoid clumping and 100 ml of 480 media incubated at 37°C rotated continuously on a roller. Unless otherwise stated we used a 481 modified 7H9 medium supplemented with 0.5% w/v pyruvate, 0.05% v/v tyloxapol, 0.2% w/v 482 glucose, 0.5% bovine serum albumin (Fraction V, Roche) and 14.5 mM NaCl. Compared to the 483 usual composition of 7H9 we omitted glycerol, tween 80, oleic acid and catalase from the 484 medium. We added 10 μM Hemin (Sigma) when supplementing growth medium with iron. We 485 followed growth by measuring optical density at 600 nm (OD 600 ). 486 Fluctuation assay experiments were performed using Mycobacterium smegmatis, mc 2 155. M. 487 smegmatis was grown either in 10 ml cultures within 50 ml Falcon conical tubes in a shaker 488 incubator (37°C, 200 rpm), or as 200 μl aliquots within flat-bottomed 96-well plates at 37°C and 489 shaken at 200 rpm. We followed growth by measuring OD 600 . We used unmodified 7H9 medium 490 or medium where glycerol was replaced with citrate or acetate added at concentrations that 491 matched the molarity of carbon. 492

Fitness determination 493
Mtb fitness was determined by comparative growth rate estimation. We grew bacteria as 494 described and followed their growth by measuring OD 600 with a Ultrospec 10 (GE Lifesciences). 495 We transformed the optical density measurements using logarithm base 2 and trimmed all early 496 and late data points that deviated from the linear correlation expected for exponential growth. 497 Next, we fitted a linear mixed effect regression model to the data. Fitness cost was calculated as 498 the resistance imposed deviation from wild type growth dynamics. 499 For M. smegmatis, we determined the growth rates by culturing bacteria as described above. We 500 monitored the increase in OD 600 using a Tecan M200 Pro Nanoquant at 20 min intervals. The 501 data were log2-transformed, trimmed to retain only the portion of data pertinent to exponential 502 growth and used for fitting a mixed effect linear regression model to estimate growth parameters. 503

Transcriptional analysis with RNAseq 504
We transferred a 40 ml aliquot of bacterial culture in mid-log phase (OD 600 = 0.5 ± 0.1) into a 505 50ml Falcon conical tube containing 10 ml ice. We harvested the cells by centrifugation (3,000×g, 506 7 min, 4°C), re-suspended the pellet in 1 ml of RNApro solution (MP Biomedicals) and 507 transferred the suspension to a Lysing matrix B tube (MP Biomedicals). We disrupted the 508 bacterial cells using a FastPrep24 homogeniser (40s, intensity setting 6.0, MP Biomedicals). We 509 clarified the lysate by centrifugation (12,000×g, 5 min, 4°C), transferred the supernatant to a clean 510 tube and added chloroform. We separated the phases by centrifugation (12,000×g, 5 min, 4°C) 511 and precipitated the nucleic acids from the aqueous phase by adding ethanol and incubating at -512 20C overnight. We performed a second acid phenol extraction to enrich for RNA. We treated 513 our samples with DNAse I Turbo (Ambion), and removed stable RNAs by using the RiboZero 514 Gram Positive ribosomal RNA depletion kit (Epicentre). We prepared the sequencing libraries 515 using the TruSeq stranded Total RNA kit (Illumina) and sequenced on a HiSeq2500 high output 516 run (50 cycles, single end). 517 Illumina short reads were mapped to the Mtb H37Rv reference genome using BWA (Li and 518 Durbin, 2010) (ver 0.7.13); the resulting mapping files were processed with samtools (Li et al.,519 2009) (ver 1.3.1). Per-feature read counts were performed using the Python module htseq-520 count (Anders et al., 2015) (ver 0.6.1p1) and Python (ver 2.7.11). We performed differential 521 expression analysis using the R package DESeq2 ( The overrepresentation analysis was based on Fisher's exact as the discriminating test. 529 In addition we transformed per-feature counts into transcript counts per million bases (TPM). 530 TPM for each feature for each sample were calculated using the following formula: 531

= ∑
Where counts i refers to the number of reads that map to a feature i, and size i refers to the length (in 532 bp) of feature i. This ratio was normalized by dividing by the sum of all the ratios across all the 533 features. 534

Proteomic analysis with SWATH-MS 535
We harvested 20 OD 600 equivalents from mid-log phase (OD 600 = 0.5 ± 0.1) bacterial cultures by 536 centrifugation (3,000×g, 7 min, 4°C). We washed the bacterial pellet twice with phosphate 537 buffered saline (PBS) to remove residues of tyloxapol. We re-suspended the bacterial pellet in 500 538 μl of protein lysis buffer (8M Urea, 0.1 M Ammonium bicarbonate, 0.1% RapiGest [Waters]) and 539 transferred the suspension to a Lysing matrix B tube (MP Biomedicals). We disrupted the 540 bacterial cells using a FastPrep24 homogeniser (40s, intensity setting 6.0, MP Biomedicals). We 541 clarified the lysate by centrifugation (12,000×g, 5 min, 4°C), and sterilised the supernatant by 542 passing it twice through a 0.22 μm syringe filters (Milipore). 543 Following protein extraction for each sample, we used trypsin to digest proteins into peptides 544 and then desalted them using C 18 columns (The Nest Group). The cleaned up peptides were re-545 suspended in MS buffer (2% v/v acentonitrile, 0.1% v/v formic acid). Finally, the RT-kit 546 (Biognosis) containing 11 iRT retention time normalization peptides was spiked in to every 547 sample. 548 We measured every sample in sequential window acquisition of all theoretical mass spectra 549 (SWATH) mode, a data independent acquisition implementation, on a tripleTOF 5600 mass 550 spectrometer (AB Sciex) coupled to a nano flow HPLC system with the gradient of one 551 hour (Banaei-Esfahani et al., 2017). The raw files acquired through a 64 variable width window 552 precursor isolation scheme were centroid normalized using Proteowizard msconvert. We used 553 the Mtb spectral library described earlier (Schubert et al., 2013) to extract data using the 554 Transcriptional module analysis. 582 The iron-responsive sub-graph of the global gene regulation network published by Peterson et 583 al. (Peterson et al., 2014), was generated by using all expression modules and all iron-responsive 584 genes as nodes, with edges connecting them representing module membership. All other gene 585 nodes were discarded, keeping only the information pertinent to the number of genes present in 586 each module (its degree). We focused explicitly on modules with at least 3 IdeR-dependent iron-587 responsive genes within them. Finally we marked significant differential expression of the gene 588 nodes in every comparison. 589 For the purposes of contextualising the expressional profiling of RpoB Ser450Leu we selected a 590 subset of expression modules as follows: first we collated all the genes that were differentially 591 expressed in at least one genetic background as determined by pairwise comparisons. We then 592 scored each expression module for enrichment of membership by differentially expressed genes 593 using a binomial test. We retained all modules for which the test pointed to an excess of 594 differentially regulated genes (p < 0.05). We constructed a new sub-graph of the global regulatory 595 network using all enriched modules and their constituent genes irrespective of whether or not 596 individual genes were significantly differentially expressed. Edges reflected module membership. 597 We added expression information in the form of log-fold changes of abundance to each 598 subgraph based on pairwise analyses. 599 Calculation of genetic distance between clinical isolates 600 Genetic distance between strains was defined as the number of single nucleotide variants (SNV) 601 that separate two strains. The numeric value of this parameter was extracted from the phylogeny 602 published elsewhere (Borrell et al., 2018). 603 Quantification of the relative impact of the rpoB mutation on gene expression in 604 different clinical isolates 605 We define the dissimilarity in the expressional response to the presence of the rpoB mutation 606 using three metrics: absolute number of shared significantly differentially expressed genes, the 607 fraction of both the shared significantly differentially expressed genes and shared non-affected 608 genes (hamming distance) and the Euclidean distance between ratios of TPM. The first is simply 609 the number of shared genes that were found to be significantly affected by the presence of the 610 rpoB mutation in two different genetic backgrounds. For the second we use the same input to 611 calculate the hamming distance between the patterns of genes significantly affected by the 612 mutation in rpoB in two different genetic backgrounds. In the third case we first calculate the 613 TPM. We then calculate the mean TPM for each gene across the biological replicates as well as 614 the ratio of mutant to wild type mean TPM for every gene. This gives us a vector containing 4000 615 ratios for each mutant-wild type pair. Finally we calculate the Euclidean distance between these 616 vectors for the different genetic backgrounds. We plotted each of these metrics against genetic 617 distance and calculated the spearman correlation and the coefficient of variance: standard 618 deviation over mean multiplied by 100 (σ / μ × 100%). 619 Quantification of the absolute impact of the rpoB mutation on gene expression of a 620 clinical isolate 621 We used transcript counts per million bases (TPM) and label free quantification (LFQ) to 622 generate an RNA vector and a protein vector containing all the available information for each 623 measured sample. We then calculated all the possible DS -RifR pairwise Euclidean distances for 624 the RNA and protein vectors within each genetic background. We used the mean and standard 625 deviation for the dissimilarity estimates. We evaluated the correlation between the fitness cost of 626 RpoB mutations and the expression distance using the R 2 -coefficient derived from ordinary least 627 squares linear regression as well as the Spearman correlation. Arbitrary units expressing the 628 dissimilarity were obtained by dividing the calculated distances by 500,000 or 10,000,000 for TPM 629 and LFQ, respectively. 630 Estimation of the biosynthetic cost of protein production 631 The calculation of biosynthetic cost was based on the molecular weight of amino acids 632 (MW) (Seligmann, 2003) or on the estimate of E. coli ATP investment into individual amino acids 633 derived by Akashi et al. (Akashi and Gojobori, 2002) using the following formulae: 634 Where the cost of protein i (p i ) was calculated as the sum of the cost for each constituent amino 635 acid (α j MW/ATP ) based either on its molecular weight (MW) or ATP investment (ATP) and adjusted 636 by the proportional contribution of protein i to the total proteome of sample X (LFQ i,X ). The 637 overall cost of the proteome P for a sample X (P X ) is expressed as the sum of the costs of 638 individual proteins (p). The difference between the biosynthetic investments in the proteome of 639 sample X when compared to sample Y was simply: P X -P Y . We estimated the biosynthetic 640 perturbation of RpoB Ser450Leu within a genetic background, by resampling sample-specific 641 proteome costs for DS and RifR with replacement 100-times, and using the median as well as the 642 3 rd and 98 th quantiles to provide the 95% confidence interval. Finally, we quantify the correlation 643 with the relative fitness of RpoB Ser450Leu by calculating the Spearman coefficient. 644

Minimum inhibitory concentration determination 645
We used the microplate alamar blue assay (Franzblau et al., 1998) to determine the minimum 646 inhibitory concentrations of bedaquiline in all drug susceptible and drug resistant strains used in 647 our study. We tested bedaquiline using a two-fold dilution series spanning a concentration of 4 648 ng/ml -1 µg/ml. 649 Fluctuation Assay for determining the frequency of rifampicin resistance 650 We used the Luria-Delbrück fluctuation assay (Luria and Delbruck, 1943) Figure S6. 675 A record of data analysis pertinent to this paper will be made available at 676 http://www.github.com/swissTPH/TBRU_cost_of_resistance/. 677 678