check
Automatic identification of optimal marker genes for phenotypic and taxonomic groups of microorganisms | Plant Pathology and Microbiology

Publications by Year

<embed>
Copy and paste this code to your website.

Publications by Authors

Recent Publications

Contact Us

Department of Plant Pathology and Microbiology
The Robert H. Smith Faculty of Agriculture, Food & Environment
The Hebrew University of Jerusalem

Herzl 229
Rehovot 7610001 
ISRAEL

Tel: 08-9489219
Fax: 08-9466794
Email: maayanms@savion.huji.ac.il

Automatic identification of optimal marker genes for phenotypic and taxonomic groups of microorganisms

Citation:

Segev, E. ; Pasternak, Z. ; Ben Sasson, T. ; Jurkevitch, E. ; Gonen, M. . Automatic Identification Of Optimal Marker Genes For Phenotypic And Taxonomic Groups Of Microorganisms. PLOS ONE 2018, 13, e0195537 - .

Date Published:

2018/05/02

Abstract:

Finding optimal markers for microorganisms important in the medical, agricultural, environmental or ecological fields is of great importance. Thousands of complete microbial genomes now available allow us, for the first time, to exhaustively identify marker proteins for groups of microbial organisms. In this work, we model the biological task as the well-known mathematical “hitting set” problem, solving it based on both greedy and randomized approximation algorithms. We identify unique markers for 17 phenotypic and taxonomic microbial groups, including proteins related to the nitrite reductase enzyme as markers for the non-anammox nitrifying bacteria group, and two transcription regulation proteins, nusG and yhiF, as markers for the Archaea and Escherichia/Shigella taxonomic groups, respectively. Additionally, we identify marker proteins for three subtypes of pathogenic E. coli, which previously had no known optimal markers. Practically, depending on the completeness of the database this algorithm can be used for identification of marker genes for any microbial group, these marker genes may be prime candidates for the understanding of the genetic basis of the group's phenotype or to help discover novel functions which are uniquely shared among a group of microbes. We show that our method is both theoretically and practically efficient, while establishing an upper bound on its time complexity and approximation ratio; thus, it promises to remain efficient and permit the identification of marker proteins that are specific to phenotypic or taxonomic groups, even as more and more bacterial genomes are being sequenced.

Publisher's Version

Last updated on 07/11/2019