APBN New Site

APBN Developing Site

Artificial Intelligence (AI) in Precision Cancer Diagnostics: Myth or Magic?

Experts from ACT Genomics will walk us through current AI developments and applications in precision cancer diagnostics – beginning from pathology slide reading, variant calling, and interpretation of biological impact and clinical significance in clinical report generation. As such shedding light on the future of AI in cancer diagnostics.

by Dr Ying-Ja Chen, Jen-Hao Cheng, and Dr Allen Lai

Artificial Intelligence (AI) is a term that intends to disrupt current healthcare practices long governed by medical professionals.

In each step of creation along the healthcare continuum, AI brings innovation, and far reaching breakthroughs in hopes of providing more certainty to clinical decision-making. In other words, helping medical professionals to be “smarter” in order to improve treatment outcomes.

Research has found that within aspects of cancer care, AI has outperformed medical specialists in diagnosing metastatic breast cancer1, and melanoma2. In liquid biopsies and pharmacogenomics, AI applies to cancer screening and monitoring, and improves the prediction of adverse events and patient outcomes. 3,4


Current AI Development – Deep Learning and Natural Language Processing

In 2011, the computer system, IBM Watson won humans in the American quiz show ‘Jeopardy!’.5 Six years later, Google DeepMind’s computer program AlphaGo outdid the world champion for Go.6 The former showcases the technology of natural language processing (NLP) which is used to process natural language in order for the computer system to understand, whereas the latter highlights the use of deep neural networks, also known as deep learning, which is a method of machine learning.

Deep learning models how the human brain infers predictions from layers of neurons. Once input and trained with lots of data, deep learning extracts features that are unknown even to the human brain to predict results and outcomes.

Deep learning does not occur within a vacuum, it has been developed as a special form of machine learning and can be divided into two types:

  1. Supervised learning generally indicates a situation when we provide a training dataset with every data point classified as positive or negative. Through repetitive and recurring training processes, the model becomes capable of predicting any new data entry as either positive or negative.
  2. Unsupervised learning, on the other hand, takes place when a dataset is provided without labeling. Therefore, all data points are clustered into numerous different subgroups, each represented by a set of distinct features. Through a series of vigorous training processes, the model can associate the features and predict which subgroup the new data entry belongs to.

These technological advancements reveal that computers surpass the expectations of their creators in every field and way. This is particularly true in the area of molecular diagnostics in relation to clinical care.


AI Applications in Precision Cancer Diagnostics

The workflow of precision cancer diagnostics is composed of several critical steps to attain the highest quality and accuracy of testing results. For this article, we will focus on three key steps that may benefit the most from the introduction of AI:

  1. Pathology slide reading amidst sample preparation,
  2. variant calling after next-generation sequencing (NGS) is conducted, and
  3. literature mining to link genomics to therapeutics.


1. Pathology Slide Reading

Mutation-guided therapeutic insight is almost directly derived from cancer patients’ tumor tissue and/or blood samples. Turning tumor samples into clinical insight involves a series of complex and intricate steps. Deploying precision cancer diagnostics, such as Next Generation Sequencing (NGS), to identify pathogenic mutations requires laboratory technicians and pathologists to meticulously curate pathology slides.

When extracting DNA from pathology slides, normal cells are intermixed with tumor cells, resulting in difficulty to detect copy number variants, microsatellite instability, and cancer variants with lower mutation allele frequency. Determining which region on a slide contains mostly tumor cells and the proportion of tumor cells within that region therefore becomes a critical step. Deep learning technology can be applied in the processing of pathology slide images,7 using images with tumor cells annotated to train the AI model to determine the tumor cell region and tumor cell proportion (Figure 1). Indeed, AI-enabled pathology slide reading has been available in clinical practice. For instance, a whole slide imaging system constructed by Philips Medical Systems has been recently approved by the U.S. Food and Drug Administration.8


2. Variant calling

Traditional workflow for NGS data analysis starts with base calling, followed by read alignment, and subsequent variant calling.

In order to obtain maximum information from one NGS test, it is best to sequence as many genes as possible (i.e. whole genome, whole exome). However, the more genes selected for sequencing, results in more data generated, extending the time taken for computation. In a clinical setting, the whole genome or exome sequencing is, unfortunately, unfeasible as a routine test due to the substantial tumor sample required and the unrealistically long turnaround time. As such, AI will be at the forefront to revamp the traditional workflow of NGS data analysis, especially in variant calling, in two folds.

First, AI-enabled computer hardware can accelerate NGS data analysis by Field Programmable Gate Array (FPGA).9 How FPGA works is to make computer hardware programmable. After customizing it to perform sequence mapping, the time-consuming computation can be done up to 250 times faster than traditional computation previously powered by software methods.

Second, deep learning assists variant calling by transforming NGS data into images that deep learning is adept at analyzing. In each variant position, the sequencing reads aligned to the reference genome can be displayed as an image. Deep learning models can recognize these images and categorize each variant as a single nucleotide polymorphism, insertion, deletion, or sequencing error (Figure 2). Google DeepVariant is a good example in this regard.10


3. Literature Mining to Link Genomics to Therapeutics

When determined in a cancer sample, genomic alterations must be linked to therapeutic implications. Whether or not the alteration is actionable very much depends on detailed analysis of the most updated clinical trial studies and literature. Linking genomics to therapeutics therefore requires one to peruse massive amounts of scientific publications in order to integrate updates into data analysis. This process is very critical, but laborious. This is where NLP plays its part.

NLP scans through tons of biomedical literature and analyzes word statistics and sentence structures. More importantly, it extracts useful and relevant therapeutic information, such as the mutation-drug interaction (Figure 3). IBM Watson for Oncology and Genomics is an example that integrates information extracted by NLP with other structured database to provide comprehensive suggestions for clinical decisions.11


Challenges and Opportunities of AI Adoption in Healthcare

The introduction of AI applications carries immense potential to revolutionize current clinical practices. In places where there is a lack of healthcare specialists, AI may serve the function of disease triage or surrogates at the time of administration. That is, a preliminary diagnosis can be made by AI to aid the initiation of follow-on tests before detailed analyses by medical experts, saving patients’ time in disease management. In places where initial tentative diagnosis is made, AI can assist to provide second opinions and potentially improve consensus in medical decisions.

Human performance can be hindered by the mental state and physical condition of the operator. On the other hand, AI is free of mental or physical issues, allowing consistent and quality decision support when the operator is having variations of performance. Incorporating AI into current clinical practices may greatly improve efficiency, filling in gaps when experts are not available, and provide quality assurance.


While AI Holds Great Promises, It Has Considerable Limitations

  1. A single AI model can only accomplish one task. For example, a digital pathology AI may be able to accurately detect lung cancer cells in a slide most of the time, but when provided with a liver cancer slide, the same model designed for lung cancer slide reading will not apply. A different model would have to be trained from scratch to recognize liver cancer cells.
  2. AI for variant calling also need re-training in different sequencing platforms, and distinct literature mining models are required for different types of biomarker-disease relationships.
  3. The current healthcare environment contains a myriad of unstructured data. Consolidating multiple tests and model results are hampered as a result. Laborious manual curation is required before data can be fed to construct AI models. It is also difficult to find enough patients who have performed the desired set of tests. Eventually, a decision making still needs to be led by humans through associating results from the available tests.
  4. AI models also fall short on rare instances, in the event of exceptions or outliers. AI models can perform exceptionally well for routine processes, but they fail when encountering something new for which they have no prior experience.

These limitations of AI apply to real world situations have led to opportunities for “man-made” intelligence to be at the center in the era of AI-enabled cancer precision diagnostics. That is, to be able to support timely and accurate decision-making, computational experts must construct numerous AI models. At the same time, it is still mandatory for human professionals to conduct hypothesis testing and make decisions in scenarios especially unseen by the model or in our medical practices.

Precision medicine requires a wide range of high-resolution image scanning and high throughput testing, all of which generates massive multi-scaled data from a single patient that requires complex analysis. For medical professionals, integrating and interpreting these test results is barely possible. This is where AI can step in and assert its potential in complementing human intelligence to make informed decisions.

Thanks to breakthroughs in computational algorithms, deep learning and NLP, they have transformed the way medical professionals interpret genomic test results, as well as the recommendations they have subsequently given to their patients based on patterns learned from vast amount of public and proprietary data sources. [APBN]

About the Authors

Dr. Ying-Ja Chen is the vice director of Bioinformatics and Artificial Intelligence division at ACT Genomics.



Jen-Hao Cheng is senior engineer and manager of Artificial Intelligence division at ACT Genomics.



Dr. Allen Lai is the regional managing director of ACT Genomics.