Modeling pKas of unfolded proteins to probe structural models of unfolded state

Modeling unfolded states of proteins has implications for protein folding and stability. Since in unfolded state proteins adopt multiple conformations, any experimentally measured quantity is ensemble averaged, therefore the computed quantity should be ensemble averaged as well. Here, we investigate the possibility that one can model an unfolded state ensemble with the coil model approach, algorithm such as \°exible-meccano" [Ozenne V et al., Flexible-meccano: A tool for the generation of explicit ensemle descriptions of intrinsically disordered proteins and their associated experimental observables, Bioinformatics 28:1463–1470, 2012], developed to generate structures for intrinsically disordered proteins. We probe such a possibility by using generated structures to calculate pKas of titratable groups and compare with experimental data. It is demonstrated that even with a small number of representative structures of unfolded state, the average calculated pKas are in very good agreement with experimentally measured pKas. Also, predictions are made for titratable groups for which there is no experimental data available. This suggests that the coil model approach is suitable for generating 3D structures of unfolded state of proteins. To make the approach suitable for large-scale modeling, which requires limited number of structures, we ranked the structures according to their solvent accessible surface area (SASA). It is shown that in the majority of cases, the top structures with smallest SASA are enough to represent unfolded state.


Introduction
The unfolded state and intrinsically disordered proteins (IDPs) or regions (IDRs) adopt a conformational landscape which is still not very well understood. Generating the \right" conformational ensemble of an unfolded state is not an easy task and needs knowledge of how the protein was unfolded, including what denaturants were *Corresponding author. This is an Open Access article published by World Scienti¯c Publishing Company. It is distributed under the terms of the Creative Commons Attribution 4.0 (CC-BY) License. Further distribution of this work is permitted, provided the original work is properly cited. used and what the conditions were. 1 In some cases, the unfolded state ensembles contain hydrophobic clusters and/or well-de¯ned secondary structure elements. [2][3][4] The importance of knowledge of unfolded state is demonstrated in modeling folding free energy, the di®erence of the energy of folded and unfolded states. Consequently, the structural information of the unfolded state is valuable for mechanistic and thermodynamic studies of folding.
Ionizable side chains represent about 30% of the proteins' residues and are crucial for stability and solubility of proteins. [5][6][7] The pKa values of ionizable residues determine the pH-dependence of protein folding and binding energies. [8][9][10][11][12][13] In some cases, modeling the pH-dependence of stability requires knowledge of pKas in unfolded state. [14][15][16][17] While it is frequently assumed that the pKas in an unfolded state are unperturbed (equal to intrinsic pKas), there are also studies that show the role of electrostatic interactions between charged residues in the denatured state. 18,19 Additionally, some studies described the impact of changes in the protonation state of ionizable residues in the formation of hydrophobic clusters in denatured state. 20 This indicates that some titratable groups may have perturbed pKas in their unfolded state.
Experimentally, the pKa values of titratable residues in the unfolded state were measured for some proteins using NMR spectroscopy, site-directed mutagenesis and CD temperature-denaturation measurements. [21][22][23][24] Most of these experiments have indicated that the pKa values of acidic residues are lowered in unfolded state compared with intrinsic pKas. 21,22,25 These studies are used in this work to benchmark DelPhiPKa along with 3D models of unfolded states. Previous attempts to model pKas of unfolded states were applied to either molecular dynamics (MD) simulation to generate structures of unfolded state 14 or Gaussian-chain model. [15][16][17] MD with arti¯cially increased van der Walls atomic radii was used to \unfold" the proteins and obtained structures to calculate pKas of unfolded state. 14 Alternatively, the electrostatic interactions in unfolded state were estimated with Gaussian-chain model and then were used to compute the corresponding pKas. [15][16][17] In our work, we took a di®erent approach, taking advantage of advances made in the¯eld of IDPs.
A great amount of e®ort has been invested to develop methods to predict the conformational ensembles of IDPs. A straightforward approach to model the conformational ensemble of unfolded state is the MD simulation. 26 Another approach, the coil models, alternatively describes the conformational ensemble of IDPs by sampling di®erent conformational states of individual residues using the coil libraries based on the experimental information of residue-speci¯c f'; g angles. [27][28][29][30] Several studies showed that the results from coil modeling accurately reproduce experimental data such as J-coupling, 27 residual dipolar couplings (RDCs) 28,30 and smallangle X-ray scatterings (SAXS) curves for both IDPs 28,30 and unfolded proteins. 31 Thus, in our work the unfolded proteins are generated via coil models and we use the conformational ensembles generated by a particular software, the°exible-meccano software. 30 Our interest in modeling unfolded state stems from the importance of unfolded state in calculating folding free energy changes due to amino acid mutations, since this is related to understanding e®ects of disease-causing nonsynonymous single nucleotide polymorphisms (nsSNPs). [32][33][34][35] All available methods based on energy di®erence between two states, folded and unfolded, use an approximation and consider only a few residues as representative of unfolded state. [36][37][38][39] This is clearly a severe assumption that omits residual interactions in unfolded state. With this work, we would like to suggest that one can explicitly generate representative full length structures of unfolded state using the tools for modeling IDPs, the°exible-meccano software. 30 In this work, we use the°exible-meccano software 30 to generate ensemble of structures for unfolded state of selected proteins for which there is experimental data of pKas in unfolded conditions. These structures are used to compute pKas of titratable residues with DelPhiPKa 40,41 and results compared with experimental data. To provide some guidance how to select minimal number of representative structures (to be applicable for large-scale calculations), we investigate the possibility of the generated structures ranked with respect to solvent accessibility surface area (SASA). Furthermore, we make predictions for pKas of unfolded state for titratable groups that are not reported in the literature.

Methods
The main purpose of this investigation is to probe if structures generated with°e xible-meccano can be used as representative structure for modeling unfolded state. Such a possibility is tested via comparing experimental pKas of unfolded state and predicted pKas with DelPhiPKa using structures generated with°exible-meccano. However, one can use this approach to predict pKas delivered via unfolding experiments and to infer the pH-dependence of folding as well.
Dataset: We searched for proteins with experimentally measured pK a values of titratable residues in the unfolded state. This resulted in the following cases: Turkey Ovomucoid Third Domain (OMTKY3), 25 Barnase, 21 chymotrypsin inhibitor 2 (CI2), 22 protein G B1 domain (PGB1) 23 and Hen Egg white Lysozyme (HEWL). 42 Unfolded state: The°exible-meccano algorithm 30 was used to generate ensemble of structures for the unfolded state. This method builds multiple di®erent copies of the polypeptide chain based on the random sampling backbone dihedral angle potential wells assuring that these conformations do not self-overlap (Fig. S1). This method was shown to successfully reproduce the experimental data such as RDC and SAXS by averaging over the built-ensembles. 30 Thus, we generated 1000 structures for each protein in our data set using default parameters of°exible-meccano software.
pKas calculations: The pKas calculations of titratable residues were performed by DelPhiPKa which is a Poisson-Boltzmann-based approach. 41 43 A dielectric constant of 4 and 8 was considered for each protein to test the sensitivity of results. The charge and radius parameters were assigned using PARSE. 44 The salt concentration was set to a value that matches the experimental conditions. Therefore, the salt concentrations for OMTKY3, Barnase, CI2, PGB1 and HEWL were 0.2, 0.05, 0.05, 0.001 and 0.144 M, respectively. Solvent accessible surface area: SASA was calculated by VMD 45 for each structure in structural ensembles with the solvent probe radius of 1.4 # A. It was used to rank the structures according min/max SASA.

Results and Discussions
Here, we report DelPhiPKa calculated pKas of ionizable groups using°exible-meccano generated structures. For each of the proteins, we generated 1000 structures and carried pKas calculations. At the same time, we address the possibility to reduce the number of needed structures by selecting the top 15, either with maximum (very extended structures) or minimum (relatively compact structures) SASA. We also investigate the e®ect of di®erent values of internal and external dielectric constants. Lysines and arginines are excluded from our investigation, because there is no experimental data.

OMTKY3
A study of the pH dependence of OMTKY3 stability showed that among all the titratable residues, the pKas of Asp7 and Asp27 are perturbed and therefore they contribute to the pH dependence of OMTKY3 stability. 25 The pKa of Asp7 and Asp27 in unfolded state were reported as 3.9 and 3.6.
We calculated the pKa values of titratable residues using all 1000 structures and considering ionic concentration of 200 mM and solvent and solute dielectric constant of 80 and 8, respectively (Table 1). It can be seen that predicted values for Asp7 and Asp27 are 3.87 and 3.56, which are almost the same as the ones found in the experimental work. 25 Additionally, we predicted the pKas of all other acidic groups  45 To consider the e®ect of D 2 O on the pKas of unfolded state, we calculated the pKas of titratable residues using solvent dielectric constant of 74, 75 and 78.5 (Table S1). Moreover, to investigate the e®ect of protein (solute) dielectric constant on the pKa values in unfolded state, we performed calculation using protein dielectric constant of 4 as well (Table S2). Our results show that the e®ect of D 2 O on pKa values of titratable residues in unfolded state is negligible (Table S1). Moreover, the results using dielectric constant of protein of 4 and 8 are practically the same. Therefore, the calculations of pKas in unfolded state are not sensitive to these parameters (Table S2).

Barnase
The thermal unfolding experiments of barnase suggested that pKa values of all acidic residues, Asp and Glu, on average are 0.4 lower than of the model component values. 21 Thus, there is no experimental data for individual pKas, but just a tendency that acid pKas are perturbed on average by 0.4 pH units. To see if our modeling can reproduce this trend, we calculated the individual pKa values for all Asp, Glu and His residues for each structure in unfolded ensemble (1000 structures) and then averaging over all pKas of Asp and Glu separately ( Table 2) Similarly, as above, we repeated the modeling using only 15 structures that have minimum or maximum of SASA out of 1000 snapshots ( Table 2). The result of the structures with the minimum SASA shows that the average ÁpKa values of Asp and Glu are 0.47 and 0.57, which are better predictions compared to the ones with the maximum SASA, 0.19 and 0.29. This speaks in favor of models that are more compact, allowing for residual interactions in unfolded state.
To test the sensitivity of results, with respect to the value of internal and external dielectric constants, we carried calculations with in solvent dielectric constant of 74, 75, 78.5 as well (Table S3) (Table S4).

CI2
Thermal denaturation experiments of CI2 at ionic concentration of 50 mM demonstrated that the pKa values of acidic residues, Asp and Glu, in the unfolded state are on average 0.3 unit lower than their pKa values of the model compound. 22 Thus, we calculated the pKa values of all titratable residues, no histidine residue present in CI2, using 1000 snapshots of unfolded CI2. The DelPhiPka calculations were done at ionic concentration of 0.05 M, solvent dielectric constant of 80 and protein (solute) dielectric constant of 8. Additionally, we carried out pKas calculations using only for 15 snapshots with maximum and minimum SASA (Table 3).
We observe that all the ÁpKa shifts are positive, which means that the predicted pKa values are lower than the model pKas. Moreover, the average of ÁpKa value for Asp is 0.308 and for Glu is 0.22, which are close to the experimental values. Furthermore, we observe that the average ÁpKa of Asp and Glu are 0.329 and 0.37, respectively, for snapshots with minimum SASA. On the other hand, the predicted values for snapshots with maximum SASA are in less agreement with the experiment (Table 3). This again indicates that compact structures are better representatives for unfolded state.
As above, we do not observe signi¯cant e®ect of the internal or external dielectric constant values (Tables S5 and S6).

PGB1
The pH-dependent stability of PGB1 with the mutations T2Q, N8D, and N37D (PGB1-QDD) was determined by measuring the pKa values for unfolded state under native conditions. 23 The pKas values of acidic groups were measured in the presence of H 2 O/D 2 O at 25 C with the counterion concentration about 1 mM (Table 4).
We calculated the pKa values of the corresponding titratable residues (note that no histidine residue is present in PGB1) in the unfolded structures for solvent dielectric constant of 80 and protein dielectric constants of 8 (

HEWL
The pKa values of several acidic residues in thermally unfolded state of HEWL for di®erent pH and 50 mM Britton-Robinson bu®er were reported. 42 For some of these residues, signi¯cant shifts from their model pKas were observed. 42 We calculated the pKa of those corresponding acidic residues using salt concentration of 144 mM for 1000 snapshots of unfolded structure of HEWL with solvent and protein dielectric constants of 80 and 8, respectively (Table 5). There is good correlation between the experimental and predicted pKa values for Asp18, Asp66, Asp101, Asp119, Asp35, Glu35 except Asp48 (Fig. S2). Moreover, we calculated the pKa values only for 15 snapshots with maximum and minimum SASA. The better correlation between the experimental estimated pKa values with the calculated ones is seen for the snapshots with the minimum SASA.
Moreover, the predicted results for solvent dielectric constants of 74, 75 and 78.5 and the protein (solute) dielectric constants of 4 are shown in Tables S9 and S10. Again, there is no signi¯cant di®erence observed for changes in these parameters.

Conclusion
The investigation showed that unfolded state can be modeled with the coil model, algorithm such as \°exible-meccano", if one is concerned about pKa values of acidic groups. Even more, the number of snapshots representing are not many and one can be successful using only 15 structures with minimum SASA. This reduces the computational cost of modeling.
Apart from testing the possibility to model unfolded state to compute pKas of titratable groups, we also delivered predictions for pKas that are not experimentally measured yet.
While we attempted to justify the usage of \°exible-meccano" for modeling pKas in unfolded state, it is tempting to generalize the observation and to consider that generated ensemble can be used for any other type of modeling. Thus, we speculate that snapshots can be used to model unfolded state in cases of computing folding free energy and folding free energy caused by mutations.