The classical Human Leucocyte Antigen (HLA) loci are the most polymorphic loci in the human genome, with over 18,000 alleles known. The extensive nature of HLA polymorphism, and its uneven distribution between human populations, makes for highly informative and convenient markers for studies of individual ancestry, population genetics, the peopling history of the world, natural selection and protein evolution.
HLA proteins are found on the surfaces of all nucleated cells, and present endogenous and exogenous peptides to T-cells, permitting the distinction to be drawn between self-and non-self-tissues, and initiating specific immune responses. HLA interaction with T-cells and NK-cells is central to understanding immune-related disease; in many cases (e.g., type-1-diabetes and ankylosing spondylitis), the HLA region is the major genetic determinant of disease.
Our goal in studying HLA is to understand the mechanisms by which HLA polymorphism results in susceptibility and resistance to cancers, and infectious and autoimmune diseases; determine how HLA polymorphism is distributed on a global scale; understand how natural selection has acted to maintain our species’ high level of HLA diversity; and make inferences about human history by relating the distribution of HLA polymorphism to global patterns of disease prevalence.
Historically, HLA data have been generated with a wide variety of methods that afford very little insight into the extent to which data may be equivalent across methods, laboratories and time-periods, which means that an existing HLA genotype cannot easily be reevaluated in the light of new sequence variation and new technologies. To address these issues, we are developing novel, specialized tools and methods of exchanging, reporting and analyzing HLA data in particular, and highly-polymorphic genomic data in general.
We want to make our work, and the work of anyone interested in HLA, easier, sharable and reproducible. With these goals in mind, we have developed tools like BIGDAWG, an R package that performs automated case-control analyses of highly polymorphic genetic data at the allele, multi-locus haplotype and amino acid levels; and POULD, an R package that calculates multiple LD measures for chromosomally phased or unphased genotype data. Gene Feature Enumeration (GFE) is a novel approach we have developed to describe the sequence of an HLA allele, addressing the issues of data-equivalency across methods that were used for HLA typing, laboratories that performed the HLA typing, and time-periods in which the HLA typing occurred. The GFE approach facilitates the standardized, community accessible interpretation of NGS HLA sequencing data, and allows us to dissect and investigate distinct sequence elements of an HLA (or KIR, ABO or Rh) gene, providing new insight into sequence variation and disease association, and potentially identifying new targets for study or intervention.
April 29, 2020 3:56 PM