Personal Information

  • Name: Dr. sc. Armin Töpfer
  • Date of birth: 24 May 1986
  • Nationality: Citizen of Germany
  • Address: Gelsenkirchen, Germany
  • Languages: German and English
  • Github: armintoepfer
  • Email: armin.toepfer@gmail.com

Professional Profile

Senior Director Instrument Analysis and Senior Principal Engineer with a Doctor of Sciences ETH Zürich in computational biology, specialized in statistical modeling and analysis of long-read sequencing technologies. I combine academic curiosity and industry standards to develop bleeding-edge, yet robust and carefully tested, software for everyday use. Expert in end-to-end genomic data analysis, from biological sample to variant call. Proven ability to design and maintain large and complex software, including productization of deep learning and GPU acceleration. Highly experienced in scientific computing and real-time processing of large data sets on bare metal Linux servers, high-performance computing clusters, and cloud infrastructure. Ten years of full-time remote working experience. Managing teams for seven years with nine direct reports, reporting to the VP of software and advising executive leadership.

Research Interest

Algorithm Design — C++ Development — High-Performance Computing — Statistical Modelling — Population Structure Reconstruction — NGS / Long-Read Sequence Analysis — Bare-metal Code Optimization — Data Visualization

Education

October 2011 - August 2014

Swiss Federal Institute of Technology Zürich (ETH)

Dr. sc. ETH Zürich
Thesis topic: Studies in viral quasispecies reconstruction

October 2009 - July 2011

University of Bielefeld

M.Sc. in Bioinformatics and Genome Research
Thesis topic: Prediction of Group I Introns under structure variation

October 2006 - March 2009

University of Bielefeld

B.Sc. in Bioinformatics and Genome Research
Thesis topic: Ideas for and implementation of an automated statistical data analysis

Work Experience

March 2024 - now

Senior Director, Instrument Analysis

Pacific Biosciences, Menlo Park, CA, USA; 100% Remote, Germany

Lead the long read analysis team with nine direct reports that develop all on-instrument analysis software.
  • Architect data processing from photon counts to highly accurate long reads
  • Successful long read platform launches, scaling up existing software to process hundreds of millions to billions HiFi reads at once.
  • Added CI/CD to build 100% reproducible and well-tested software.
  • Integrated and productized deep learning solutions using ONNX runtime.
  • Optimized CPU and GPU code to increase throughput and reduce COGS.
  • Spearheaded open-source and community software distribution pbbioconda.
  • Plan, orchestrate, and execute cross-team interdisciplinary projects.
  • Point of contact for major collaborations with Nvidia and Google.
  • Clear communication to executive management.
  • Drive architectural decisions and select hardware for on-instrument compute.

February 2023 - February 2024

Director, Platform Bioinformatics

Pacific Biosciences, Menlo Park, CA, USA; 100% Remote, Germany

Lead an expanded team responsible to develop all of platform bioinformatics analysis software.
  • Develop all bioinformatics production software solutions, build them 100% reproducible, and deploy on- and off-instrument.
  • Develop novel algorithms on GPUs using CUDA.
  • Integrate, productise, and optimise deep learning solutions.
  • Close collaborations with external partners to shape company objectives.
  • Scale up existing software to process hundreds of millions to billions HiFi reads at once.

Juli 2021 - January 2023

Principal Engineer & Associate Director, Bioinformatics

Pacific Biosciences, Menlo Park, CA, USA; 100% Remote, Germany

Lead a team responsible to develop on-instrument bioinformatics analysis software. Continue as an individual contributor. Our focus is the generation of long and accurate HiFi reads in near real-time.
  • Develop bioinformatics production software solutions, build them 100% reproducible, and deploy on- and off-instrument.
  • Port existing and develop new algorithms on GPUs using CUDA.
  • Integrate and productise deep learning solutions.
  • Close collaborations with external partners to shape company objectives.
  • Drive architectural decisions and select hardware for on-instrument compute solutions.

September 2020 - June 2021

Principal Engineer, Bioinformatics

Pacific Biosciences, Menlo Park, CA, USA; 100% Remote, Germany

Shape company objectives by developing next-gen products.
  • Evaluate and develop on next-gen hardware architectures, incl. ARM, RISC-V, and GPGPU.
  • Design and implement on- and off-instrument software.
  • Continue enhancing customer-facing tools
    • CCS: Generate long and accurate HiFi reads
    • lima: Demultiplex barcodes and remove primers
    • Iso-Seq: Scalable de novo isoform discovery
  • Lead and mentor individual members of the team.

August 2018 - August 2020

Senior Staff Engineer, Bioinformatics

Pacific Biosciences, Menlo Park, CA, USA; 100% Remote, Germany

Technical leader, mentor individual members of the team. Work independently in cross-functional teams to lead development of new products or execution of time critical analyses. Shape product roadmap from a technical perspective, by delivering software architectures, design specifications, and implementations. Consultant to senior management in long-range company planning. Plan Agile scrum sprints and epics for my team and products. Identify and fix performance bottlenecks in existing products to handle the ever-increasing throughput of sequencing platforms. Enable savvy bioinformaticians to use command-line tools in the cloud or locally, by maintaining bioconda packages on pbbioconda. Carefully maintain extensive, yet visually appealing customer-facing documentation.
Additionally responsible for following new products:
  • Iso-Seq: Reference free clustering of full-length transcriptional sequencing data to annotate de-novo genome assemblies with a focus on scalability, reducing runtime by a factor up to 100x, while increasing accuracy over existing solutions.
  • Structural Variation: Fast, accurate, and reliable discovery of structural variations from low-coverage, cohort sequencing data. Best in class algorithm with highest recall and precision.
  • Consensus: Lead initiatives to massively reduce time to result for our CCS algorithm that generates HiFi reads.

October 2015 - July 2018

Staff Engineer, Bioinformatics

Pacific Biosciences, Menlo Park, CA, USA; 100% Remote, Germany

Individual contributor, developing of bleeding edge statistical algorithms in C++14.
Core projects:
  • Polyploid Consensus: Enhance quality of individual single-molecule reads and genomic regions by polyploid-enabled polishing.
  • Minor Variants: Full-stack product development to reconstruct co-occurring minor variants in heterogeneous samples from single-molecule sequencing data, tailored to personalized medicine applications.
  • Demultiplexing: Reliable demultiplexing of hi-plex barcoded samples with focus on UX and quality control. Establish internal end-of-line QC pipeline to automatically test purity of barcode oligos before distribution.
  • Mapping and Alignment: A frontend for minimap2, state-of-the-art mapper and aligner, for PacBio native data. formats.

September 2014 - September 2015

Senior Engineer, Bioinformatics

Pacific Biosciences, Menlo Park, CA, USA; 100% Remote, Germany

Hardware-near C++11 development on x86_64 and MIC architectures on the Sequel instrument to enhance the base-call accuracy and hotspot parallelization and vectorization to enable real-time base calling. Design and implementation of custom binary formats to store high-throughput real-time data. Post process raw base-call data to provide customer-friendly BAM files, including SMRTbell adapters removal, demultiplexing, and spike-in control filtering.

July 2011 - August 2014

Graduate research assistant

Computational Biology Group, ETH Zürich, Basel

Development of statistical methods and machine learning approaches for viral quasispecies assembly from next-generation and single-molecule sequencing data. Application to intra-host samples of HBV, HCV, CSFV, and HIV-1 infected individuals. Collaboration with teams across the globe, from Switzerland to Australia.

October 2009 - May 2011

Research assistant

Bielefeld University Bioinformatics Service, University of Bielefeld, Germany

Continuation as Java developer at the BiBiServ2 project. Migrating architecture from JavaServer Faces (JSF) 1.2 to JSF 2. Introduction of PrimeFaces as the main component suite.

September 2009 - May 2011

Programmer

High performance computing laboratory, Bergen Center for Computational Science, UNIFOB AS, Bergen, Norway

Development of a parsing library for Web Services Description Language and XML Schema files, resolving complete XML Schema structures. Project is funded by the EMBRACE Network of Excellence coordinated by EBI. Further development of a Business Process Execution Language (BPEL) editor to construct complex workflows using the NetBeans Platform.

November 2009

Professional Trainer

International Institute (Training, Assessment, Certification), Cognitive Core UG, Germany

Binary auditing trainer for developers at Symantec India. Full practical online workshop on analysis and exploitation of stack based buffer overflows and reverse code engineering of copy protections.

April 2009 - August 2009

Bioinformatics internship

Computational Biology Unit, BCCS, UNIFOB AS, Bergen, Norway

Implementation of a fully functional web-based BPEL editor to construct and execute simple linear workflows. Working as a team member on the eSysbio project, funded by the Research Council of Norway through its e-science program eVita.

October 2008 - May 2009

Research assistant

Bielefeld University Bioinformatics Service, University of Bielefeld, Germany

Development of automatically generated web surfaces with JSF for the new BiBiServ2 project.

August 2008 - September 2008

Freelance JSF developer

Teamkollegen.de, Bielefeld, Germany

Responsible for the web development of an AJAX based JSF frontend, i.e., a messaging system and user friendly search interface. A project supported by the Heinz Nixdorf Foundation and the Foundation of the German Economy (sdw).

Talks

SMRT Leiden - pbsv - Detecting structural variants with confidence - Leiden, Netherlands, 2019
Long-Read Sequencing Meeting - Long Accurate Reads – Call All Variants with Confidence - Uppsala, Sweden, 2019
SMRT Leiden - Many-to-One-to-Many: Pooling and Demultiplexing - Leiden, Netherlands, 2018
SMRT Leiden - Calling all variants: fast, accurate, population-scale structural variant analysis - Leiden, Netherlands, 2018
SMRT Leiden - Juliet - One Click Minor Variant Calling - Leiden, Netherlands, 2017
4th ICCABS (IEEE) - Viral quasispecies assembly from paired-end reads - Miami, USA, 2014
RECOMB 2014 - Viral quasispecies assembly via maximal clique enumeration - Pittsburgh, USA, 2014
SMIDDY - Global haplotype prediction of HIV-1 - Zürich, Switzerland, 2013
3rd ICCABS (IEEE) - Probing of viral diversity by global haplotype prediction - New Orleans, USA, 2013
2nd CHAIN NGS Meeting - Computational and Statistical Challenges of Ultradeep Sequencing of Viral Quasispecies - Rome, Italy, 2013
Virus goes Bioinformatics - Estimating viral genetic diversity from next-generation sequencing data - Jena, Germany, 2012
RECOMB 2012 - Probabilistic inference of viral quasispecies subject to recombination - Barcelona, Spain, 2012
RUBIES - Amsterdam, Netherlands, 2011
Nil-University - Cairo, Egypt, 2010
EMBRACE - Amsterdam, Netherlands, 2009

Posters

HitSeq 2019 - Reliable, precise, and fast detection of structural variants from long reads - Basel, Switzerland, 2019
CROI 2015 - A Comprehensive Analysis of PrimerIDs to Study Heterogenous HIV-1 Populations - Seattle, USA, 2015
CROI 2014 - Full-length HIV-1 Haplotype Reconstruction from Heterogeneous Virus Populations - Boston, USA, 2014
Statistical Genomics and Data Integration for Personalized Medicine Ascona - Probing of viral diversity by global haplotype prediction - Switzerland, 2013
SIB Days 2103 - Visualization of viral populations - Biel Switzerland, 2013
ECCB 2012 - QuasiRecomb: prediction of recombinant viral quasispecies - Basel, Switzerland, 2012
SIB Days 2012 - Probabilistic inference of viral quasispecies subject to recombination - Biel, Switzerland, 2012

Publications

2021

DeepConsensus improves the accuracy of sequences with a gap-aware sequence transformer

Gunjan Baid, Daniel E. Cook, Kishwar Shafin, Taedong Yun, Felipe Llinares-López, Quentin Berthet, Aaron M. Wenger, William J. Rowell, Maria Nattestad, Howard Yang, Alexey Kolesnikov, Armin Töpfer, Waleed Ammar, Jean-Philippe Vert, Ashish Vaswani, Cory Y. McLean, Pi-Chuan Chang, Andrew Carroll
doi.org/10.1038/s41587-022-01435-7
Nature Biotechnology

2019

Highly-accurate long-read sequencing improves variant detection and assembly of a human genome.

Aaron M. Wenger, Paul Peluso, William J. Rowell, Pi-Chuan Chang, Richard J. Hall, Gregory T. Concepcion, Jana Ebler, Arkarachai Fungtammasan, Alexey Kolesnikov, Nathan D. Olson, Armin Töpfer, Chen-Shan Chin, Michael Alonge, Medhat Mahmoud, Yufeng Qian, Adam M. Phillippy, Michael C. Schatz, Gene Myers, Mark A. DePristo, Jue Ruan, Tobias Marschall, Fritz J. Sedlazeck, Justin M. Zook, Heng Li, Sergey Koren, Andrew Carroll, David R. Rank, Michael W. Hunkapiller
10.1038/s41587-019-0217-9
Nature Biotechnology.

2016

Single-molecule sequencing reveals complex genomic variation of hepatitis B virus during 15 years of chronic infection following liver transplantation.

Brigid Betz-Stablein, Armin Töpfer, M Littlejohn, L Yuen, D Colledge, V Sozzi, P Angus, A Thompson, P Revill, Niko Beerenwinkel, N Warner, Fabio Luciani
10.1186/s12864-016-2575-8
BMC genomics.

2016

A method for near full-length amplification and sequencing for six hepatitis C virus genotypes.

Rowena A Bull, Auda A Eltahla, Chaturaka Rodrigo, Sylvie M Koekkoek, Melanie Walker, Mehdi R Pirozyan, Brigid Betz-Stablein, Armin Töpfer, Melissa Laird, Steve Oh, Cheryl Heiner, Lisa Maher, Janke Schinkel, Andrew R Lloyd, Fabio Luciani
10.1128/JVI.00243-16
Journal of Virology.

2015

A Comprehensive Analysis of Primer IDs to Study Heterogeneous HIV-1 Populations.

David Seifert, Francesca Di Giallonardo, Armin Töpfer, Jochen Singer, Stefan Schmutz, Huldrych F. Günthard, Niko Beerenwinkel, Karin J. Metzner
10.1016/j.jmb.2015.12.012
Journal of Molecular Biology.

2014

Full-length haplotype reconstruction to infer the structure of heterogeneous virus populations.

Francesca Di Giallonardo*, Armin Töpfer*, Melanie Rey, Sandhya Prabhakaran, Yannick Duport, Christine Leemann, Stefan Schmutz, Nottania K. Campbell, Beda Joos, Maria Rita Lecca, Andrea Patrignani, Martin Däumer, Christian Beisel, Peter Rusert, Alexandra Trkola, Huldrych F. Günthard, Volker Roth, Niko Beerenwinkel, and Karin J. Metzner.
10.1093/nar/gku537
Nucleic Acids Research.

2014

Viral Quasispecies Assembly via Maximal Clique Enumeration.

Armin Töpfer, Tobias Marschall, Rowena A. Bull, Fabio Luciani, Alexander Schönhuth, and Niko Beerenwinkel.
10.1371/journal.pcbi.1003515

PLOS Computational Biology.

Abstract appears in R. Sharan, RECOMB 2014 - Research in Computational Molecular Biology, volume 8394 of Lecture Notes in Bioinformatics, pages 309–310. Springer, 2014.
10.1007/978-3-319-05269-4_25

2013

Challenges in RNA Virus Bioinformatics.

Manja Marz, Niko Beerenwinkel, Christian Drosten, Markus Fricke, Dmitrij Frishman, Ivo Hofacker, Dieter Hoffmann, Thomas Rattei, Peter Stadler, and Armin Töpfer .
10.1093/bioinformatics/btu105
Bioinformatics.

2013

Sequencing approach to analyze the role of quasispecies for classical swine fever.

Armin Töpfer, Dirk Höper, Sandra Blome, Martin Beer, Niko Beerenwinkel, Nicolas Ruggli, and Immanuel Leifer.
10.1016/j.virol.2012.11.020
Virology.

2013

Probabilistic inference of viral quasispecies subject to recombination.

Armin Töpfer, Osvaldo Zagordi, Sandhya Prabhakaran, Volker Roth, Eran Halperin, and Niko Beerenwinkel.
10.1089/cmb.2012.0232

Journal of Computational Biology.

Extended abstract appeared in B. Chor, editor, RECOMB 2012 – Research in Computational Molecular Biology, volume 7262 of Lecture Notes in Bioinformatics, pages 342–354. Springer, 2012.
10.1007/978-3-642-29627-7_36

2010

BioXSD: the common data-exchange format for everyday bioinformatics web services.

Kalas M., Puntervoll P., Joseph A., Bartaseviciute E., Töpfer A. , Venkataraman P., Pettifer S., Bryne J.C., Ison J., Blanchet C., Rapacki K., and Jonassen I.
10.1093/bioinformatics/btq391
Bioinformatics.

Software

pbmm2 - A minimap2 frontend for PacBio native data formats.
pbsv - Fast, accurate, population-scale structural variant analysis from single-molecule data.
Iso-Seq - Scalable de novo isoform discovery from single-molecule data.
CCS - Generate accurate consensus sequences from single molecules.
Juliet - Reference guided phasing of low-frequency de-novo discovered variants in heterogeneous samples.
Lima - Demultiplex pooled barcoded single-molecule data.
HaploClique - Viral quasispecies assembly from paired-end data.
QuasiRecomb - Reconstruction of recombinant viral quasispecies structures.
InDelFixer - Iterative and very sensitive NGS sequence alignment software.
ConsensusFixer - Consensus sequence caller with ambiguous bases and in-frame insertions.

Awards

  • Innovation Award – SMRT Masking, PacBio, 2024
  • Best method – GPU compute advances, PacBio, 2023
  • Young investigator scholarship, CROI, 2014
  • Best poster award, SIB Days, 2013
  • Conference fellowship, RECOMB, 2012
  • Studentship for foreign internships, ERASMUS, 2009

Master students adviced

  • Monica-Andreea Drăgan, Research assistant, 2014
    Minimal path cover with paired-end constraints.
  • Kee Pang Soh, Lab-rotation, 2014
    Error correction of Pacific Biosciences data.
  • Veronika Boskova, Lab-rotation, 2013
    Visualization of HIV quasispecies data.
  • David Seifert, Master thesis, 2012
    Computational studies in HIV diversity.

Technical skills

C++20 • CUDA • Boost • GTest

meson • CMake • ninja

R • Data Visualization

Deep Learning • ONNX Runtime

Bash • SQL • Assembly • Web

Bamboo • Docker • Bootstrapping

C++ toolchains • Multiplatform releases

Agile • Jira • CI • CD

Illustrator • vtune • IDA Pro • LaTeX