An Alignment-Independent 3D-QSAR Study of FGFR2 Tyrosine Kinase Inhibitors

Purpose: Receptor tyrosine kinase (RTK) inhibitors are widely used pharmaceuticals in cancer therapy. Fibroblast growth factor receptors (FGFRs) are members of RTK superfamily which are highly expressed on the surface of carcinoma associate fibroblasts (CAFs). The involvement of FGFRs in different types of cancer makes them promising target in cancer therapy and hence, the identification of novel FGFR inhibitors is of great interest. In the current study we aimed to develop an alignment independent three dimensional quantitative structure-activity relationship (3D-QSAR) model for a set of 26 FGFR2 kinase inhibitors allowing the prediction of activity and identification of important structural features for these inhibitors. Methods: Pentacle software was used to calculate grid independent descriptors (GRIND) for the active conformers generated by docking followed by the selection of significant variables using fractional factorial design (FFD). The partial least squares (PLS) model generated based on the remaining descriptors was assessed by internal and external validation methods. Results: Six variables were identified as the most important probes-interacting descriptors with high impact on the biological activity of the compounds. Internal and external validations were lead to good statistical parameters (r2 values of 0.93 and 0.665, respectively). Conclusion: The results showed that the model has good predictive power and may be used for designing novel FGFR2 inhibitors.


Introduction
It is well known that the interaction between different components of tumor microenvironment play crucial role in progression and malignancy of the tumor. 1 Among the cells present in the turmeric area, fibroblasts were gained much attention due to having distinguished characteristics compared with fibroblasts in normal tissues. Such fibroblasts in turmeric area are termed carcinoma associate fibroblasts (CAFs) and are detectable in various tumors including breast, prostate, lung, colon and pancreas cancers. 2 Fibroblast growth factor receptors (FGFRs) presented on the surface of CAFs are belong to the transmembrane receptors known as receptor tyrosine kinases (RTKs) comprised of three immunoglobulin-like domains at the extracellular region connected via a single transmembrane region to the intracellular tyrosine kinase domain. 3 FGFR family consist of four closely related receptors called FGFR1 to FGFR4. Ligand-activated FGFRs activate signaling pathways in the cell which lead to cell proliferation, growth, differentiation, migration, and survival. 4 Similar to other RTKs, deregulation of these receptors can trigger numerous diseases including cancer. FGFR2 as one of the important factors on the surface of CAFs is overexpressed in some human cancers including stomach, pancreas, and breast. Moreover, mutations of this receptor can lead to intrinsically active form of FGFR2 reported in endometrial and lung cancers. 5 Inhibition of RTKs as a promising target in treatment of different kinds of cancers has been led to development of remarkable therapeutic agents. 6 Most of these tyrosine kinase inhibitors (TKI) at different clinical phases are the small molecules targeting ATP-binding site of the kinase domain of RTKs. 7 In the context of developing new therapeutics, highthroughput studies combined with computational analyses are effective tools for lead compound discovery. 8 Quantitative structure-activity relationship (QSAR) is one of the most commonly in silico methods for the prediction of biological activity of compounds by transforming their chemical and structural properties into numerical values which can then be linked to their potencies using mathematical models. 9 There are different types of QSAR from dimensionality point of view of which 3D-QSAR method is extensively used in drug design and discovery process. In this methodology, 3D-descriptors which are representative of atomic arrangement in 3D space are employed to be used in alignment-dependent or alignment free analyses. In alignment-dependent analysis the studied compounds are required to be aligned with each other whereas in alignment free method there is no need for superpositioning of molecules prior to development of 3D models, which can be considered as an advantage. GRid-Independent Descriptors (GRIND) is one of the alignment-independent methods 10,11 in which molecular interaction fields (MIF) are used to describe the interaction energy between ligands and different types of probes. 12 Then data mining are performed on the pool of calculated descriptors according to their impact on the biological activity followed by calculating favorable and unfavorable interactions. 13 In the current study we aimed to develop a 3D-QSAR model using GRIND algorithm for a set of FGFR2 kinase inhibitors to identify the needed structural requirements.
The results of the current study may be used for designing novel FGFR2 inhibitors.

Data set preparation
A set of 26 small molecules with inhibitory activity on FGFR2 were collected from the literature. [14][15][16] Inhibitory activities of the studied compounds reported in Kd (nM) were converted to pK d values. The transformed data would be used as dependent variable in 3D-QSAR study. Table 1 presents the structures and corresponding pK d values of FGFR2 kinase inhibitors. The 3D structures of the molecules were generated using the Built Optimum option of Hyperchem software (version 8.0.8) followed by energy minimization using MM+ force field based on Polack-Ribiere algorithm. 17 Then, the structures were fully optimized based on the semiemperical method at AM1 level of theory. 18

Calculation of GRIND descriptors and model building
The docking solutions for each compound were filtered based on the similarity to the reference structure (i.e. dovitinib) using Shape-it™ software. 20 The selected conformers were introduced to Pentacle program to generate GRIND-based descriptors. To do this, first MIFs were generated using GRID-based fields in which the interaction energies between atoms of molecules and different probes including hydrophobic (DRY), hydrogen bond donor, HBD (O), hydrogen bond acceptor, HBA (N1), and shape (TIP) probes at the given cutoff distance are calculated. The interaction energies (E xyz ) at each grid point called node were the sum of Lennard-Jones energy (E lj ), hydrogen bond (E hb ), and electrostatic interactions (E el ). Based on the defined cutoff, the nodes having energies lower than the cutoff were discarded. To this end, ALMOND algorithm was employed to extract the most relevant regions from MIFs according to the field intensity at a node and the mutual node-node distances between the selected nodes. Finally, MIFs were encoded by maximum autocorrelation and cross-correlation algorithm for generating correlograms in which the product of node-node energies were plotted vs the distances between the nodes.

Modeling and statistical analyses
The entire dataset was randomly divided into training and test sets containing 21 and 5, compounds, respectively. For generating 3D-QSAR model, fractional factorial design (FFD), was applied on training subset of data for obtaining the descriptors explaining the important interactions with defined probes. FFD method was carried out until no significant change in the model statistical parameters such as r 2 and q 2 was observed. The remaining descriptors were subjected to partial least squares (PLS) regression where the descriptors internally crossvalidated using leave-one-out (LOO), leave-two-out (LTO), and random-group-out (RGO). The PLS model was also externally evaluated with 5 randomly selected test set compounds. To further evaluate the robustness of the generated model, y-scrambling test was carried out by ten times randomly scrambling the activity data for the train set and generating PLS models as outlined above. The generated PLS models were utilized to predict the activity of test set compounds.

Results and Discussion
Targeting carcinoma associate fibroblasts (CAFs) as one of the important components presented in the microenvironment of turmeric area was the focus of many research activities recently. 1,2,21,22 FGFR proteins as the cell surface elements of CAFs interact with their endogenous ligands and interfere with these interactions by small molecule inhibitors is one of the promising strategies which may lead to the development of new anticancer agents. 4 In general, the drug development processes require extensive experimental studies to find and improve the potency and pharmacokinetics of drug candidates via optimizing their 3D structures and physicochemical properties. Identification of novel TKIs are no exception and variety of technologies such as high-throughput screening are being employed extensively to develop druggable compounds acting as tyrosine kinase inhibitor. 23 The highly expensive and time-consuming procedure of drug development requires the utilization of complementary in silico methods to cut-down the cost and time and increase the success rate. 11 In the current study we have used in silico QSAR approach based on alignment independent method to generate a 3D model for activity prediction of a set of FGFR2 inhibitors. One of the important issues in alignment independent 3D-QSAR calculations is the use of active conformation of the studied compounds. To consider an appropriate measure regarding this criterion, the best fitting conformers of the compounds to the reference structure dovitinib were selected after docking procedure. The experimental data showing the receptor bound form of reference compound (FGFR2-dovitinib complex) is not available, however, its crystal structure in complex with TK domain of closely homologous receptor FGFR1 has been reported previously. 24 By superpositioning the crystal structures of TK domains of FGFR1 and 2, the bound conformation of dovitinib at the TK domain of FGFR2 was identified and subsequently was used as the reference structure to filter the docking results. The filtered conformations for the TK inhibitors were submitted to Pentacle software for alignment independent 3D-QSAR analyses. FFD feature selection was applied to select important independent variables calculated by the software in relation to the dependent variable pK d . Based on the remained variables PLS model was generated. Figure 1 represents PLS coefficients of variables which were selected by applying FFD on the GRIND descriptors using Pentacle software. The important descriptors based on their corresponding PLS coefficients were listed in Table 2. According to Table 2 two descriptors have positive impact on the biological activity of the compounds while four of them have negative impact.    Considering that the study was carried out using 26 inhibitors, only 5 compounds (20% of data set) were randomly selected as the test set and the remaining 21 compounds were used to develop the model. Using bigger test set would have adversely affected the predictivity of the model due to loss of information as the result of developing the model by fewer training set compounds. Table 3 shows the statistics for the PLS model generated using training set FGFR2 inhibitors for three latent variables (3LV). Internal validations with leave-one-out (LOO), leave-two-out (LTO), and five random-group-out (RGO) methods were performed to assess robustness of the model. The results suggested a significant correlation between PLS components and FGFR2 inhibitory activities of the compounds. In order to assess the predictive performance of the model, a subset of 5 randomly selected molecules as the test set (shown in Table 4) were used for the prediction. By monitoring the changes in the statistical indices (shown in Table 3), two latent variables (2LVs) were selected as the optimum number of PLS components for the model interpretation. 0.35 (equal to 15% of training set activity range) which is greater than 10% of the activity range for the training set compounds. But, it is less than 20% of the range, which collectively considering the results of all validation assessments seems reasonable. 28 The applicability domain was defined according to the method developed by Roy et al, termed applicability domain using standardization approach. 29 For this purpose PLS latent variables obtained from 3D-QSAR model and pK d values were used as X and Y variables, respectively. The result showed that there are no outliers among the dataset compounds. Hence the model can be applied for the prediction of FGFR2 tyrosine kinase inhibitory activity of compounds having structures similar to those used in this study. Taken together, working with small data set has some challenging issues concerning external validation. However, this limitation can be overcome by providing more data obtained from experimental procedures.

Conclusion
In summary, an alignment independent 3D-QSAR model was generated for a set of tyrosine kinase inhibitors (TKIs) evaluated on FGFR2. The model was verified with internal and external validation methods as well as yscrambling technique. The results showed that the model has good predictive power with acceptable statistics. According to the selected correlograms, indole and quinolone rings as well as their bioisosteres are important moieties in the structures. These structural features are involved in hydrogen bond and hydrophobic interactions. Moreover, in some compounds substitution of these moieties with small heterocyclic groups helps to retain the functionality of the compounds. The result of the current investigation can be used in designing the novel FGFR2 tyrosine kinase inhibitors.