Decision Tree Classification of Y-STR Haplogroups in the Iraq FTDNA Project

Main Article Content

Ahmed Hamid Elias
Azhar Hamid Elias
Sajjad Mohammed Hasan

Abstract

In this study, we use public Y-DNA dataset from the Iraq FamilyTreeDNA project and explore the possible power for Y-STRs to predict paternal haplogroups in the Iraqi population using data-mining techniques. The resulting table was cleaned to remove non-individual summary rows, leaving 188 male samples with 89 numeric Y-STR loci and haplogroups confirmed up to October 2023. The key meshed haplogroups collapse into the macro-lineages B, C, D and E with haplogroup E clearly domineering, and several Gini-based decision tree models were created: a binary-classifier (E versus non-E), a classifier for E-M35 versus all other haplogroups, and a model confined to haplogroup E separating E-M35 from all other E subclades. The E vs. non-E tree yielded very high test data accuracy, which reflects the very strong and distinct Y-STR signature of haplogroup E within this sample. While models targeting E-M35 showed moderate yet informative performance, they identified a handful of loci (DYS632, DYS442, DYS439, DYS456, DYS534 and DYS438) as most predictive of the difference between E-M35 and other lineages. These decision trees highlight the simple, interpretable rules that summarize the principal paternal structure in Iraq, and demonstrate both the potentials and weaknesses of decision tree classification with defined haplogroups under high class imbalance and small sample sizes.

Article Details

How to Cite
[1]
A. H. Elias, A. H. Elias, and S. M. Hasan, “Decision Tree Classification of Y-STR Haplogroups in the Iraq FTDNA Project”, SHIFAA, vol. 2025, pp. 69–75, Nov. 2025, doi: 10.70470/SHIFAA/2025/009.
Section
Articles