Data processing in the healthy and FL cohort
Physical examination data, including high throughput 16S rDNA sequencing results from 1463 participants of KSH cohorts, were collected for the study. After removing missing and poorly detected values, 1,250 participants were included in the analysis. Subsequently, participants whose ASV number were < 5 000 were filtered out, leaving only 777 participants for the study (Fig. 1). It was revealed that 290 of the 777 participants had FL, while the remaining 487 did not. We regarded participants as insulin resistant or sensitive if their values for homeostatic model assessment of IR (HOMA-IR) were over the following cutoff: 1.8 for men and 2.2 for women, calculated as the critical threshold for T2DM development based on the KSH cohort internal investigation (data not shown). Among the 777 participants, 611 were classified into the insulin-sensitive (IS) group, while 166 were classified into the IR group based on the criteria for insulin resistance. The biological and physical characteristics of these groups are described in Table 1.

Schematic of the analysis pipeline. Participants (n = 777) from the KSH cohort were included in the final analysis and were divided into four subgroups, ISNF (n = 449), ISFL (n = 162), IRNF (n = 38), and IRFL (n = 128). ASV: amplicon sequence variants; HOMA-IR: homeostatic model assessment of insulin resistance index; IR: insulin resistant; KSH cohort: Kangbuk Samsung Hospital cohort.
Subject demographics
Among the 777 participants in the study, IS and IR included 611 and 166 individuals, respectively. Men accounted for 55.97% of IS and 84.94% of the IR group. The IS group had significantly lower values than those of the IR group for age, body mass index (BMI), waist circumference, heart rate, HOMA-IR, glucose, insulin, HbA1c, albumin, aspartate aminotransferase, alanine transaminase, triglycerides (TG), low-density lipoprotein cholesterol (LDL-C), and both diastolic and systolic blood pressure (BP). The IS group had lower total cholesterol than the IR group, but the difference was not statistically significant. Moreover, participants of IS group had higher high-density lipoprotein cholesterol (HDL-C) levels than those of the IR group with statistical significance (Table 1).
Microbiome comparison and classification between the groups with or without FL
Alpha diversity, particularly Shannon’s entropy, was used to compare the diversity of the gut microbiome in participants of the non-fatty liver control group (NF) and FL groups. For Shannon’s entropy, representing biodiversity integrated with community richness and evenness, NF (median = 6.677; interquartile range [IQR] = 6.179–7.103) had a significantly higher value than that of FL (median = 6.475; IQR = 5.961–6.941; p = 0.0017; Fig. 2a). Consistent with previous reports, FL (median = 15.090, IQR = 12.859–18.057) had a significantly lower value of Faith’s phylogenetic diversity (PD: biodiversity based on phylogeny) than that of NF (median = 15.848, IQR = 13.648–18.594) (p = 0.004; Fig. 2b). Additionally, FL (median = 0.910; IQR = 0.885–0.928) had a lower value of Pielou’s evenness (a measure of biodiversity and species richness) than that of NF (median = 0.918; IQR = 0.899–0.931; p = 0.00045; Fig. 2c). Subsequently, we performed principal coordinate analysis (PCoA) to obtain representative relationships between the NF and FL groups. However, the two groups had no observable distant clusters (Fig. 2d).

Comparison of the alpha and beta diversity of gut microbiome between fatty liver disease (FL) and nonfatty liver control (NF) groups. (A–C) Alpha diversity of NF and FL was measured using Shannon’s entropy (A), Faith’s phylogenetic diversity (PD) (B), and Pielou’s evenness (C). Boxes represent the IQR, whereas the upper whiskers represent the range from minimum (upper quartile − 1.5IQR) to maximum (lower quartile + 1.5IQR), and black dots represent outliers excluded in the range. (D) Beta diversity among participants in NF and FL was measured using the Principal Coordinates Analysis (PCoA). (E–F) The predictive power (AUROC) of the RF prediction model featuring different discriminative gut microbial genera in the training dataset (E) and test dataset (F). Statistical significance was analyzed using the Kruskal–Wallis test. *p < 0.05, **p < 0.01. FL: participants with fatty liver disease; IQR: interquartile range; ML: machine learning; NF: nonfatty liver control group; RF: random forest.
As NF and FL showed differential alpha diversity but not beta diversity, we generated a classification model with informative gut microbial features. We can perform Gini importance-based core informative feature selection based on the algorithm. The predictive power of the models from the training dataset with two, four, eight, 12, 16, 24, and 32 features were 0.53, 0.59, 0.60, 0.59, 0.62, and 0.64 of AUROC, respectively (Fig. 2e). The FL prediction using the model featuring 32 gut microbial features displayed 0.65 (0.56–0.73) of AUROC in the test dataset (Fig. 2f). This implies an inefficient classification based on a set of most informative features between the NF and FL groups.
Classification between NF and FL using reported gut microbial features
Recent reports have proposed a novel microbiome-based diagnostic tool for liver cirrhosis, the most advanced FL13. We built an RF classifier using gut microbial markers to test whether the markers could distinguish between FL and NF in our data. Differential abundance of gut microbial genera features, including Acidaminococcus spp., Alistipes spp., Bacteroides spp., Dorea spp., Enterobacter spp., Escherichia-Shigella spp., Eubacterium spp., Faecalibacterium spp., Klebsiella spp., Ruminococcus (gnavus group) spp., Streptococcus spp., and Veillonella spp., were selectively observed between FL and NF (Supplementary Fig. 1a). Among these features, Acidaminococcus spp. (p = 8.9e−05), Alistipes spp. (p = 7.5e−07), Faecalibacterium spp. (p = 0.011), and Ruminococcus spp. (gnavus group; p = 0.0018) had significantly different abundances in NF and FL, consistent with previous studies. Furthermore, we evaluated the sensitivity and selectivity of a set of these features using the AUROC. However, the predictive power of each model with different number of features was insufficient to distinguish between NF and FL (0.60 AUROC for a 12-feature model, 0.60 AUROC for an 8-feature model, 0.56 AUROC for a 6-feature model, and 0.56 AUROC for 4-feature model; Supplementary Fig. 11).
Microbiome comparison and classification between IRNF and IRFL
NAFLD is considered the hepatic component of IR. Therefore, it is critical to distinguish FL from NF in participants with IR. The participants in the IR groups were divided into the following based on the presence of FL: NF featuring IR (IRNF) and FL featuring IR (IRFL), to find the most informative microbial features differentiating FL from NF in the participants with IR. Then, we observed the differential biodiversity of the two microbiomes. IRNF (median = 6.808, IQR = 6.289–7.183) had a significantly higher value of the index than that of IRFL (median = 6.403, IQR = 5.988–6.805; p = 0.032) in terms of Shannon’s entropy (Fig. 3a). Additionally, IRFL (median = 14.906, 12.876–17.804) had significantly lower Faith’s PD than that of IRNF (median = 16.293, IQR = 14.311–20.455; p = 0.032; Fig. 3b). In terms of other alpha diversity indices, IRFL (median = 0.902, IQR = 0.878–0.921) had lower Pielou’s evenness than that of IRNF (median = 0.915, IQR = 0.905–0.930; p = 0.021; Fig. 3c). However, PCoA and uniform manifold approximation and projection (UMAP) of the total microbiome of both groups showed no difference in clusters between IRFL and IRNF (Fig. 3d, e, Supplementary Fig. 2a and b).

Comparing the alpha and beta diversity of the gut microbiome of IRNF and IRFL. (A–C) Alpha diversities of IRNF and IRFL were measured using Shannon’s entropy (A), Faith’s phylogenetic diversity (PD) (B), and Pielou’s evenness (C). Boxes represent the IQR. The upper whiskers represent the range from minimum (upper quartile − 1.5IQR) to maximum (lower quartile + 1.5IQR), and black dots represent outliers excluded in the range. (D–E) Beta diversity among participants in IRNF and IRFL was measured using PCoA (D) and UMAP analyses (E). (F) The model’s predictive power featuring different number of gut microbial genera constructed using RF, GBM, and XGB algorithms in the test dataset. The statistical significances were analyzed using Wilcoxon’s test. *p < 0.05, **p < 0.01. IRFL, fatty liver participants featuring insulin resistance; IQR: interquartile range; IRNF: nonfatty liver control group featuring insulin resistance.
To classify IRFL and IRNF from their gut microbiome, we constructed ML models using three ML algorithms for classification, RF, GBM, and XGB. Among the three ML models featuring different numbers of gut microbial genera, the RF model demonstrated the most reliable prediction in the test dataset (AUROC 0.77), while the AUROCs of classification for the other two ML models, GBM and XGB, were 0.62 and 0.63, respectively (Fig. 3f). Next, we built the models in the same manner, but individually for each gender, to see if any gender had better predictive results. In the training dataset, the RF model for females displayed AUROC values of 0.81, 0.96, 0.88, and 0.73 for the models using six-, eight-, twelve-feature, and entire gut microbiome, respectively (Supplementary Fig. 2c). The predictive power of the eight-feature model showed 0.67 AUROC. Surprisingly, the aforementioned outcome in the training dataset was superior to the results from the RF model for male, presenting AUROC values of 0.63 (six-feature model), 0.76 (eight-feature model), 0.69 (twelve-feature model), and 0.58 (entire gut microbiome-based model) (Supplementary Fig. 2d). During model validation using the male test dataset, the RF model had an AUROC of 0.76, the GBM model had an AUROC of 0.62, and the XGB model had an AUROC of 0.77 (Supplementary Fig. 2e). Together, it was determined that the models using the RF algorithm are appropriate for further research.
Then, we built RF models and assessed their efficacy in predicting FL in the IR groups after applying the SMOTE algorithm to the dataset to minimize the present class imbalance (30% IRNF: 70% IRFL). The model’s predictive power was 0.87 AUROC in the training dataset, but it only displayed 0.72 AUROC in the test dataset (Supplementary Table S1).
Classification between IRNF and IRFL by using GA-optimized classifier (IRFL-GARF classifier)
GA is a heuristic algorithm that determines the global optimum based on natural selection30,31,32. It can be used to select model features such that the model demonstrates the best prediction. We used GA to create an ML classifier with better prediction performance using the RF algorithm. We developed an RF classifier presenting higher accuracy in distinguishing IRFL from IRNF, based on the features selected by GA. The RF classifier optimized by GA was termed “IRFL-GARF classifier,” with the potential gut microbial biomarkers33. Using the fitness score, the classifier can repeatedly search for the best solution for classifying IRFL and IRNF every generation.
In the development of the IRFL-GARF classifier, we first generated 300 individuals to be evolved further as the initial population (Fig. 4). Then, the fittest individual was selected following evaluation based on the fitness scores of every individual. With the fittest individual, the next generation is produced with crossover and mutation (Supplementary Fig. 3).

The overview of biomarker genera mining using GA. From the randomly generated initial 300 individuals consisting of genera, the classification model was optimized using GA methods, including crossover and mutation. The model with the highest fitness score is selected for each generation and further sequentially optimized in the next generation. The final model was evaluated using the average AUROC of the tenfold CV model. Further model validation was conducted using test data for the corresponding biomarker subset and accuracy, an F1-score, a kappa, and an AUROC.
Consequently, the GA reported ten optimal features (equivalently, genera) for an optimal RF model: Christensenellaceae (R-7 group) spp., Lachnospiraceae (UCG-004) spp., Fusicatenibacter spp., Butyricimonas spp., Weissella spp., Ruminococcaceae (UCG-004) spp., Erysipelatoclostridium spp., UBA1819 spp., Allisonella spp., and Collinsella spp. The classifier model’s predictive power was 0.93 in the test dataset (95% confidence interval: 0.83–1.00; Fig. 5a). Between gut microbial features in the classifier model, Butyricimonas spp. (mean of IRNF: 0.111%; IRFL: 0.070%), Christensenellaceae (R-7 group) spp. (IRNF: 0.736%; IRFL: 0.202%), Collinsella spp. (IRNF: 0.052%; IRFL: 0.021%), Erysipelatoclostridium spp. (IRNF: 0.153%; IRFL: 0.020%), and UBA1819 spp. (IRNF: 0.104%; IRFL: 0.014%) displayed higher relative abundances in IRNF than in IRFL. In contrast, Allisonella spp. (IRNF: 0.008%; IRFL: 0.049%), Fusicatenibacter spp. (IRNF: 0.216%; IRFL: 0.305%), Lachnospiraceae (UCG-004) spp. (IRNF: 0.219%; IRFL: 0.358%), Ruminococcaceae (UCG-004) spp. (IRNF: 0.020%; IRFL: 0.026%), and Weissella spp. (IRNF: 0.089%; IRFL: 0.117%) were more abundant in IRFL than in IRNF. Notably, Butyricimonas spp. (p = 0.0094), Christensenellaceae (R-7 group) spp. (p = 0.00056), and Ruminococcaceae (UCG-004) spp. (p = 0.026) had significantly different relative abundances between the two groups (Fig. 5b). The visualization of fold change in ten GA-selected features in the rate per hundred showed that Christensenellaceae (R-7 group) spp. (fold change of log2 [log2FC]: − 1.006), Weissella spp. (log2FC: − 0.168), UBA1819 spp. (log2FC: − 1.967), Collinsella spp. (log2FC: − 0.185), and Erysipelatoclostridium spp. (log2FC: − 0.032) had lower relative abundances in IRFL. In contrast, Lachnospiraceae (UCG-004) spp. (log2FC: 0.536), Fusicatenibacter spp. (log2FC: 0.823), Butyricimonas spp. (log2FC: 0.070), Allisonella spp. (log2FC: 0.408), and Ruminococcaceae (UCG-004) spp. (log2FC: 1.229) were more abundant in both groups (Fig. 5c). Then, we performed UMAP projection to dimensionally reduce the dataset, presenting an IRNF clustering. Christensenellaceae spp. (R-7 group) were highly distributed in the green circle, where most IRNFs were distributed, whereas the Lachnospiraceae (UCG-004) group was highly distributed in the purple circle, where most of the dots represent IRFL (Fig. 5d–f).

Prediction of FL in the presence of insulin resistance using GA-optimized classifier. (A) The predictive power (AUROC) derived from the test dataset using GA-optimized RF classifier with ten features. (B) Violin plots displaying relative abundances of core informative features in IRNF and IRFL. (C) Average relative abundances of discriminative features in the 10-feature prediction model in IRNF and IRFL. (D–F) UMAP analysis and heatmap of Christensenellaceae (R-7 group) spp. (E) and Lachnospiraceae (UCG-004) spp. onto UMAP. *p < 0.05, **p < 0.01, and ***p < 0.001 (Wilcoxon’s test). IRFL: fatty liver participants featuring insulin resistance; IRNF: nonfatty liver control group featuring insulin resistance; UMAP: uniform manifold approximation and projection.
Also, we developed a GA-optimized classifier for IS (a classification between IS participants without FL, ISNF, and IS participants with FL, ISFL). The model featured eight gut microbial genera, namely, Eubacterium spp. (coprostanoligenes group), Alistipes spp., Bifidobacterium spp., Erysipelotrichaceae spp. (UCG-003), Lachnoclostridium spp., Parabacteroides spp., Ruminococcus spp. (torques group), and Subdoligranulum spp. However, the model’s predictive power was insufficient to classify ISNF and ISFL (0.52 of an AUROC; Supplementary Fig. 4a and b).
Model evaluation
To assess the GA-optimized model’s performance in IR, the model’s predictive power was compared with previously and broadly used non-invasive indexing scores calculated from clinical data for predicting FL, including FL index (FLI)34, NAFLD liver fat score (NAFLD-LFS)35, hepatic steatosis index (HSI)36, and Framingham steatosis index (FSI)37. For comparison, each score was calculated for each IR analyzed for the study and used for FL prediction with a partitioned test dataset. Our classifier displayed 0.93 AUROC, as the FLI, NAFLD-LFS, HSI, and FSI values were 0.82, 0.62, 0.80, and 0.82, respectively (Fig. 6a). The prediction accuracies of the GA-optimized classifier, FLI, NAFLD-LFS, HSI, and FSI were 0.83, 0.57, 0.60, 0.67, and 0.84, respectively (Fig. 6b). Additionally, the FL prediction by our classifier presented a kappa of 0.50, while the kappa of FLI, NAFLD-LFS, HSI, and FSI were 0.24, 0.17, 0.33, and 0.53, respectively (Fig. 6c). Finally, our classifier displayed 0.89 F1-score, which was similar to the FSI (0.90), whereas FLI, NAFLD-LFS, and HSI displayed 0.63, 0.63, and 0.72 of F1-scores, respectively (Fig. 6d). As shown above, among all measuring methods for predicting the power of predictors, our classifier gave the highest diagnostic accuracy compared with other predictors. This result implied that our gut microbiome-based classifier could be used with the abovementioned established predictors.

Evaluating prediction using GA-optimized classifier differentiating IRFL from IRNF. Bar plots comparing the predictive power derived from the test dataset using the GA-optimized classifier with other predictors by (A) AUROC, (B) accuracy, (C) kappa, and (D) F1 score. IRFL: fatty liver participants featuring insulin resistance; IRNF: nonfatty liver control group featuring insulin resistance.
#Machine #learningderived #gut #microbiome #signature #predicts #fatty #liver #disease #presence #insulin #resistance #Scientific #Reports