Smart Lab News

[Nature Communications] Generative AI for misalignment-resistant virtual staining to accelerate histopathology workflows

2026-04-01T00:00:00+08:00

Recently, Prof. Hao Chen’s team at HKUST, in collaboration with The Chinese University of Hong Kong (CUHK) and Nanfang Hospital of Southern Medical University, published a new paper in Nature Communications (IF = 15.7): “Generative AI for misalignment-resistant virtual staining to accelerate histopathology workflows”.

The study presents DGR (Decoupled Generation and Registration), a misalignment-resistant virtual staining framework that addresses spatial distortions caused by tissue deformation during chemical staining, and provides a scalable solution for practical pathology workflows.

Introduction

Histopathology is a cornerstone of clinical diagnosis, relying on chemically stained tissue slides to reveal disease-related morphology and molecular signals. In practice, however, conventional staining is labor-intensive, time-consuming, tissue-consuming, and environmentally costly.

Virtual staining has emerged as a promising alternative by translating one imaging modality into another with deep learning. Yet most existing methods require perfectly aligned paired data for pixel-level supervision. This assumption is difficult to satisfy in real-world workflows, because tissue deformation during processing introduces unavoidable spatial misalignment, and repeated staining on the same section often damages tissue integrity.

To address this bottleneck, we propose DGR, a robust framework with a cascaded registration strategy that decouples image generation from alignment correction. This design enables high-fidelity virtual staining under imperfectly paired data, reducing data curation burden and improving practicality for clinical deployment.

Across five datasets and four staining tasks, DGR demonstrates clear improvements: average gains of 3.2% on internal datasets and 10.1% on external datasets. Under severe misalignment, DGR improves PSNR by up to 3.4 dB (23.8%) over baseline methods. In blinded pathologist evaluation, experienced pathologists achieved only around 52% accuracy in distinguishing virtual from chemical staining, indicating no statistically significant diagnostic difference.

Fig. 1 summarizes the end-to-end virtual staining workflow and key evaluations:

(a) pipeline from tissue sampling to virtual stain generation,
(b) dataset composition,
(c–e) quantitative comparison on PSNR/SSIM/LPIPS,
(f) robustness under severe synthetic misalignment,
(g–h) blinded pathology evaluation on H&E and PAS-AB.

Method

DGR contains two key modules:

Registration for Noise Reduction (R1): aligns generated images with roughly paired ground-truth targets, reducing the impact of noisy misregistration in reconstruction supervision.
Position-Consistency Generation (R2): enforces spatial consistency between generated outputs and input images through adversarial training, ensuring the generator focuses on stain translation rather than geometric warping.

Unlike prior coupled approaches (e.g., RegGAN), DGR explicitly separates generation and registration. This avoids a common failure mode where generators hide structural misalignment and rely on downstream registration to compensate. Importantly, DGR can be integrated into existing virtual staining pipelines without modifying their backbone architectures.

In short, DGR imposes dual constraints: one for reliable reconstruction supervision under noisy pairing (R1), and one for preserving spatial consistency between input and generated images (R2). This enables robust learning from roughly paired data while maintaining anatomical faithfulness.

Results

We evaluated DGR across five datasets and four staining translation tasks:

Autofluorescence to H&E: DGR achieved the best quantitative and perceptual performance, including PSNR 22.914 dB (+4.4%) and SSIM 0.766 (+4.8%), while reducing LPIPS and FID by 15.9% and 12.0% compared with the next-best method.
H&E to PAS-AB: On internal testing, DGR improved PSNR and SSIM by 1.2% and 1.0%; on external testing, gains reached 10.1% (PSNR) and 8.0% (SSIM), showing strong generalization.
H&E to mIHC: DGR delivered the best image quality and improved downstream classification performance (UniToPatho +2.4%, GCHTID +1.2%), while achieving the highest nucleus segmentation Dice score (0.422).
H&E stain normalization: DGR outperformed all baselines in both fidelity and distribution-level metrics (PSNR 23.823, SSIM 0.734, FID 10.253, KID 0.007).

In a blinded pathologist study, experts distinguished virtual from chemical staining at near-chance level (about 52% accuracy), with no statistically significant difference. Under severe simulated misalignment, DGR also maintained clear robustness, improving PSNR by up to 3.4 dB over baseline methods.

Task 1: Autofluorescence → H&E

Best PSNR: 22.914 dB (+4.4%)
Best SSIM: 0.766 (+4.8%)
Best LPIPS/FID: LPIPS 0.159 and FID 20.264 (15.9% and 12.0% lower than the next best method)

Task 2: H&E → PAS-AB

Internal test: PSNR/SSIM gains of +1.2%/+1.0%
External test: stronger gains of +10.1%/+8.0%
LPIPS reduction on both internal and external sets

Task 3: H&E → mIHC

Best image-level metrics across compared methods
Best downstream task gains (+2.4% on UniToPatho, +1.2% on GCHTID)
Highest nucleus segmentation Dice (0.422), indicating superior structure consistency

Task 4: H&E stain normalization

Best fidelity and distribution scores simultaneously: PSNR 23.823, SSIM 0.734, FID 10.253, KID 0.007

Blinded human evaluation

250 virtual + 250 chemical H&E images, and 250 virtual + 250 chemical PAS-AB images
Accuracy around 52.4% in both settings, with non-significant statistical difference

Misalignment robustness

Built 11,918 well-aligned H&E–PAS-AB pairs and injected five levels of synthetic misalignment (rotation/translation/scaling)
DGR consistently outperformed all baselines across all misalignment levels, with up to +3.4 dB PSNR under severe misalignment

Conclusion

DGR reframes misalignment from a nuisance to remove into an inherent property of histopathology workflows that models should tolerate. By decoupling generation and registration with dual constraints, DGR improves robustness to imperfect pairing while preserving anatomical consistency and staining realism.

The framework lowers data acquisition barriers, scales to practical clinical settings, and offers a general strategy for building resilient virtual staining systems. Future work will extend DGR to more staining modalities, tissue types, and end-to-end pathology applications such as tumor detection and grading.

More broadly, this work shifts the focus from unrealistic data perfection to algorithmic resilience, which is a critical step toward replacing resource-intensive chemical workflows in routine practice.

Resources

For more details, please see our paper Generative AI for misalignment-resistant virtual staining to accelerate histopathology workflows via Nature Communications.

Code: https://github.com/birkhoffkiki/DTR

Paper: “Generative AI for misalignment-resistant virtual staining to accelerate histopathology workflows” (Nature Communications, 2026)

Authors: Jiabo Ma, Wenqiang Li (co-first authors)

Corresponding Author: Hao Chen (jhc@cse.ust.hk), Professor, Department of Computer Science and Engineering, HKUST

Collaborations: HKUST, The Chinese University of Hong Kong, Nanfang Hospital of Southern Medical University

[ICLR 2026] Exploiting Low-Dimensional Manifold of Features for Few-Shot Whole Slide Image Classification

2026-03-18T00:00:00+08:00

A joint research effort from The Chinese University of Hong Kong (CUHK), SmartX Lab, and Nanyang Technological University (NTU) has been accepted at ICLR 2026, one of the premier machine learning conferences. The study presents MR Block (Manifold Residual Block), a plug-and-play, geometry-aware drop-in replacement for standard linear layers in Multiple Instance Learning (MIL) models, specifically targeting the challenge of few-shot Whole Slide Image (WSI) classification in computational pathology.

MR Block decomposes a linear projection into two parallel paths: a fixed random geometric anchor that preserves the intrinsic manifold structure of pathology foundation model features, and a trainable low-rank residual path (LRP) for task-specific adaptation. This design introduces a structured inductive bias that simplifies learning into a more tractable residual-fitting problem, achieving state-of-the-art performance with significantly fewer trainable parameters.

Background

Histopathology is the gold standard for disease diagnosis, and computational analysis of Whole Slide Images (WSIs) faces two structural constraints. First, WSIs operate at the gigapixel scale, making Multiple Instance Learning (MIL) the de facto paradigm: each slide is represented as a bag of patch features. Second, expert annotations are costly and scarce, and real-world data often involves few labeled slides with only slide-level labels, making models highly susceptible to overfitting.

To understand the root cause of overfitting beyond the learning algorithm itself, this work examines the intrinsic geometric structure of features. Based on the manifold hypothesis, the authors analyze feature representations on the Camelyon16 dataset using different feature extractors (CONCH, UNI, ResNet-50), providing multi-angle evidence that these representations lie on a low-dimensional, nonlinear manifold:

Spectral analysis reveals an effective rank of only 29.7 (against CONCH’s 512-dimensional space), confirming low-dimensionality.
t-SNE visualization shows clear clustering topology.
Tangent space analysis demonstrates non-flat, distance-dependent geometric drift—quantitative evidence of the manifold’s intrinsic curvature, ruling out a purely linear subspace hypothesis.

Based on these observations, the authors argue that a key driver of few-shot overfitting is geometric: while pathology foundation models produce features with a fragile low-dimensional manifold structure, existing MIL models fail to preserve it. The primary source of distortion is the most ubiquitous and indispensable component—the linear layer. Linear layers appear in projections, attention computation, and classification heads, yet they are inherently geometry-agnostic. Tangent space analysis provides direct evidence: trained linear layers significantly distort the intrinsic geometry of the manifold, causing models to learn overly complex mappings in few-shot settings that both violate the low-rank nature of features and discard the geometric priors learned during pre-training.

Existing MIL approaches for WSI classification can be broadly grouped by how they handle feature geometry:

A. Standard MIL Backbones

Attention-based approaches such as ABMIL and CATE are effective general frameworks but rely on unconstrained linear layers that are inherently geometry-agnostic. In few-shot settings, this leads to manifold distortion and overfitting.

B. Few-Shot Specialized Methods

Methods such as ViLaMIL and FOCUS are specifically designed for few-shot WSI classification and represent the current state of the art. However, they do not explicitly account for the geometric structure of pre-trained features, leaving the manifold distortion problem unaddressed.

C. Manifold Residual Block (MR Block)

MR Block takes a fundamentally different approach by directly addressing the geometric distortion caused by linear layers. Rather than designing a new MIL backbone, it provides a plug-and-play replacement for standard linear layers, explicitly preserving and leveraging the low-dimensional manifold geometry of foundation model features.

Methodology

Preliminaries

In the bag-level MIL setting, a WSI is divided into non-overlapping patches, encoded by a pre-trained feature extractor into patch features, forming a bag. An MIL aggregator (typically attention pooling) produces a slide-level feature, which is then fed into a classifier. Modern MIL models use attention pooling that simultaneously yields patch-level importance scores.

Spectral analysis estimates the intrinsic dimensionality of representations via the eigenvalue distribution of the Gram matrix, computing the Von Neumann entropy and effective rank. Tangent space analysis probes the non-linear structure beyond low-dimensionality by constructing a neighborhood graph on normalized features and estimating local tangent spaces via PCA at each point—quantifying how much local geometry varies with position.

MR Block Architecture

To mitigate the geometric degradation introduced by standard linear layers, the authors propose MR Block, a parameter-efficient, plug-and-play, geometry-aware alternative. As illustrated below, MR Block decomposes the linear mapping into two parallel paths:

Given input x ∈ ℝ^{d_in}, MR Block is defined as:

MR(x) = W_anchor · x + Up(GELU(Down(x)))

where:

Geometric Anchor Path: W_anchor ∈ ℝ^{d_out × d_in} is a fixed random matrix (Kaiming uniform initialized, never updated). It serves as a geometric anchor that approximately preserves the original feature topology while acting as a spectral sharpener to enhance spectral discriminability.
Low-rank Residual Path (LRP): Down ∈ ℝ^{r × d_in} and Up ∈ ℝ^{d_out × r} are trainable matrices with a bottleneck rank parameter r ≪ d_in. This structural bottleneck explicitly aligns with the low effective rank of features, modeling only task-relevant residuals.

Initialization: Up is initialized to all zeros, so MR Block initially behaves as W_anchor alone, contributing zero residual at the start of training. The LRP activates only when it can improve the training objective, counteracting the geometric distortion that LRP alone would introduce.

Parameter efficiency: When r ≪ d_in, LRP has r(d_in + d_out) parameters—strictly fewer than a standard linear layer’s d_in × d_out, theoretically reducing overfitting risk.

Experimental Results

Comparison with State-of-the-Art Methods

Table 1 summarizes results across multiple datasets. Three key conclusions emerge:

Consistent Improvement: Whether on large cohorts with artificially constructed few-shot settings or on naturally few-shot treatment response datasets, MR-augmented models consistently outperform their respective baselines across different datasets and shot numbers. On Camelyon16, TCGA-NSCLC, and TCGA-RCC, MR versions match or surpass current state-of-the-art methods (ViLaMIL and FOCUS) with significantly fewer trainable parameters.
Parameter Efficiency: Replacing standard linear layers with MR Block produces a smaller model that performs better, and this pattern holds across multiple MIL backbones. This confirms that MR’s gains come from a beneficial low-rank inductive bias, not from increased model capacity.
Stability: As k increases, all methods improve steadily; MR versions show the largest gains at moderate shot counts and remain competitive at higher shot counts. MR also exhibits lower variance across runs in several settings, suggesting improved training stability.

Ablation Studies

Component Ablation

Table 2 ablation validates the decoupled design. Removing the LRP consistently degrades performance, confirming LRP’s necessity for task adaptation. Removing the geometric anchor—whether retaining or discarding the residual connection—significantly harms performance, highlighting its dual role as both a geometric anchor and a spectral sharpener. Most critically, making the anchor trainable causes catastrophic performance collapse, providing direct empirical evidence that unconstrained linear layers tend to disrupt feature manifolds, while the MR design better preserves them.

Capacity-Matched Analysis

To disentangle “fewer parameters” from “geometric inductive bias,” the authors construct a capacity-matched MR-ABMIL where MR Block’s trainable parameter count exactly matches the original gated attention layer. On Camelyon16 and RCC, capacity-matched MR-ABMIL significantly outperforms ABMIL across all shot settings; results on NSCLC are broadly comparable. Since these gains are achieved with identical parameter counts, they directly demonstrate that MR’s geometry-aware structure—not parameter reduction alone—plays the central role in few-shot performance gains.

Rank Sensitivity Analysis

The residual rank r controls information flow in the low-rank path. Sensitivity analysis shows performance saturates at approximately r ≈ 32 across evaluated datasets and shot settings. This saturation point closely matches the theoretically predicted effective rank of the features (~29.7), confirming that a low-rank path suffices to capture the principal task information encoded in the manifold structure. The simpler MR-ABMIL shows a pronounced peak at r = 32, while the more expressive MR-CATE shows only marginal gains beyond this—attributed to CATE’s additional modeling capacity capturing finer feature interactions beyond the main manifold.

Interpretability in Extreme Resource-Constrained Settings

Figure 6 shows heatmaps generated by MR-CATE in the extreme 2-shot setting. Standard CATE under 2-shot supervision typically fails to produce meaningful attention maps, while MR-CATE exhibits significantly stronger robustness. In the original images (top), blue curves mark approximate tumor boundaries; corresponding heatmaps are shown below. Notably, MR-CATE also captures finer-grained boundaries beyond those in the original annotations—even within the blue tumor boundaries, the model accurately distinguishes between different morphological patterns, demonstrating sensitivity to heterogeneity within tumors and surrounding normal tissue.

Conclusion

This work re-examines few-shot WSI classification overfitting from a geometric perspective. The authors provide quantitative and visual evidence that pathology foundation model features exhibit a fragile low-dimensional manifold geometry, and identify a common failure mode in MIL models: their indispensable linear layers systematically destroy this manifold structure due to a lack of geometric awareness.

MR Block addresses this by combining a fixed random geometric anchor that preserves manifold structure with a low-rank residual path for parameter-efficient task-specific adaptation. Extensive experiments not only demonstrate state-of-the-art performance but also empirically support the geometric diagnosis, offering a new geometry-aware paradigm for building more robust models—with implications extending beyond computational pathology.

Future Directions

Extend MR Block to other MIL tasks beyond WSI classification.
Explore more sophisticated spectral shaping strategies for the geometric anchor.
Apply geometry-aware inductive biases to other domains where pre-trained features exhibit manifold structure (e.g., natural image few-shot learning, medical image segmentation).

The code is open-sourced at https://github.com/BearCleverProud/MR-Block.

For more details, check out the full paper:
Conghao Xiong, Zhengrui Guo, Zhe Xu, Yifei Zhang, Raymond Kay-yu Tong, Si Yong Yeo, Hao Chen, Joseph J. Y. Sung, and Irwin King. “Exploiting Low-Dimensional Manifold of Features for Few-Shot Whole Slide Image Classification.” arXiv preprint arXiv:2505.15504, 2026.

[Seminar] Publishing in Nature Biomedical Engineering

2025-12-29T00:00:00+08:00

Publishing in top-tier journals such as Nature Biomedical Engineering can be highly competitive and challenging. This seminar will provide an insider’s perspective on the editorial and review processes at the journal, and offer practical advice for researchers aiming to publish their work there.

Dr. Jennifer Haskell is an Associate Editor at Nature Biomedical Engineering, part of the Nature Portfolio. She joined the journal in September 2024 after conducting research at the University of Exeter, where she worked on applications of Raman spectroscopy for cancer detection. At the journal, she handles manuscripts in the areas of biomedical imaging, cellular immunotherapy, and genetic engineering, giving her a broad view of current trends in biomedical engineering research.

In this talk, Dr. Haskell will discuss what the editors look for in submissions, how manuscripts are evaluated, and common reasons for rejection. She will also share tips on framing your research story, preparing a strong manuscript, and navigating peer review. The seminar will be particularly valuable for early-career researchers and students who plan to submit their work to high-impact journals.

Title: Publishing in Nature Biomedical Engineering
Speaker: Dr. Jennifer Haskell (Associate Editor, Nature Biomedical Engineering)
Time: 8 Jan 2026 (Thu), 10:00 am – 11:00 am
Venue: LTB, Academic Building, HKUST
Host: Dr. Hao Chen

Everyone is welcome to attend!

[Nature Communications] Large-scale generative tumor synthesis in computed tomography images for improving tumor recognition

2025-12-11T00:00:00+08:00

Recently, the SmartX Lab team has completed a groundbreaking project on a Generative AI (GAI) model for tumor synthesis. Published in Nature Communications, this work presents a tumor synthesis model (FreeTumor) in CT images for improvint tumor recognition.

Introduction

AI-driven tumor recognition unlocks new possibilities for precise tumor screening and diagnosis. However, the progress is heavily hampered by the scarcity of annotated datasets, demanding extensive efforts by radiologists. To this end, we introduce FreeTumor, a Generative AI framework to enable large-scale tumor synthesis for mitigating data scarcity. Specifically, FreeTumor effectively leverages limited labeled data and large-scale unlabeled data for training. Unleashing the power of large-scale data, FreeTumor is capable of synthesizing a large number of realistic tumors for augmenting training datasets. We curate a large-scale dataset comprising 161,310 Computed Tomography (CT) volumes for tumor synthesis and recognition, with only 2.3% containing annotated tumors. 13 board-certified radiologists are engaged to discern between synthetic and real tumors, rigorously validating the quality of synthetic tumors. Through high-quality tumor synthesis, FreeTumor showcases a notable superiority over state-of-the-art tumor recognition methods, indicating promising prospects in clinical applications.

Method

The framework of FreeTumor, including two stages: (1) Large-Scale Generative Tumor Synthesis Training. We first leverage labeled data to train a baseline segmentation model as the discriminator of the tumor synthesis model. Second, we leverage both labeled and unlabeled data to train the tumor synthesis model. Specifically, we train the generator to synthesize tumors on health organs while the discriminator is utilized to discriminate the reality of synthetic tumors. (2) Tumor Synthesis for Large-Scale Segmentation Training. We employ the generator to synthesize tumors on healthy organs for augmenting segmentation training datasets, while the discriminator is employed for quality control of synthetic tumors.

Results

To validate the fidelity of synthetic tumors, we engaged 13 board-certified radiologists in a Visual Turing Test to discern between synthetic and real tumors. Rigorous clinician evaluation validates the high quality of our synthetic tumors, as they achieved only 51.1% sensitivity and 60.8% accuracy in distinguishing our synthetic tumors from real ones.

Through high-quality tumor synthesis, FreeTumor scales up the recognition training datasets by over 40 times, showcasing a notable superiority over state-of-the-art AI methods including various synthesis methods and foundation models.

Conclusion

AI-driven tumor recognition has received increasing attention in recent years, yet the progress is heavily hampered by the scarcity of annotated datasets. Early attempts mainly focus on advancing network architectures to improve tumor recognition. Although encouraging results have been demonstrated, the scarcity of annotated datasets still heavily hampered further development. To this end, numerous medical foundation models have been introduced to tackle the challenges of data scarcity. Although these foundation models can leverage unlabeled data in self-supervised pre-training, they still fail to utilize unlabeled data during segmentation training and remain constrained by the limited scale of annotated datasets.

Thus, tumor synthesis emerges as a promising solution to mitigate the scarcity of annotated tumor datasets, which can synthesize a large number of tumors on images for augmenting training datasets. Early attempts investigated image processing and generative models for tumor synthesis. However, these methods fail to integrate large-scale data into synthesis training, thus hindering the improvements of downstream tumor recognition. In addition, these methods largely ignore the importance of quality control in synthesizing tumors, while low-quality synthetic tumors will pose a negative impact on downstream training.

To this end, we introduce FreeTumor to address the aforementioned challenges. First, FreeTumor adopts an effective adversarial-based synthesis training framework to leverage both labeled and unlabeled data, facilitating the integration of large-scale unlabeled data in synthesis training. Second, FreeTumor further employs an adversarial-based discriminator to discard low-quality synthetic tumors, enabling automatic quality control of large-scale synthetic tumors in the subsequent segmentation training. In this way, FreeTumor facilitates the utilization of large-scale data in both synthesis and segmentation training, demonstrating superior performances compared with previous methods.

Although FreeTumor has demonstrated promising results in tumor recognition, there are still numerous areas for growth and improvement. In our work, we collected 12 annotated datasets from public resources for training and validation, which are commonly used in existing research for the five types of tumors/lesions we studied. With more annotated tumor datasets for training, the performance of FreeTumor could be further improved. In the future, we will consistently collect more annotated datasets to advance our model. Moving forward, we will extend the application of FreeTumor to encompass other tumor types. Furthermore, generative models, including GAN and diffusion models, have also demonstrated promising results in the applications of other medical imaging modalities, e.g., X-ray and pathology images. In the future, we will explore adapting FreeTumor to other medical imaging modalities, which require further dataset curation and more evaluation.

Resources

For more details, please see our paper Large-scale generative tumor synthesis in computed tomography images for improving tumor recognition via Nature Communications.

Citation:
L. Wu, et al, “Large-scale generative tumor synthesis in computed tomography images for improving tumor recognition,” in Nature Communications, doi: https://doi.org/10.1038/s41467-025-66071-6.

[TPAMI] Large-Scale 3D Medical Image Pre-training with Geometric Context Priors

2025-12-03T00:00:00+08:00

Recently, the SmartX Lab team has completed a groundbreaking project on the CT foundation model. Published in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), this work presents a new foundation model (VoCo) for 3D medical images with a comprehensive evaluation benchmark.

Introduction

The scarcity of annotations poses a significant challenge in medical image analysis, which demands extensive efforts from radiologists, especially for high-dimension 3D medical images. Large-scale pre-training has emerged as a promising label-efficient solution, owing to the utilization of large-scale data, large models, and advanced pre-training techniques. However, its development in medical images remains underexplored. The primary challenge lies in harnessing large-scale unlabeled data and learning high-level semantics without annotations. We observe that 3D medical images exhibit consistent geometric context, i.e., consistent geometric relations between different organs, which leads to a promising way for learning consistent representations. Motivated by this, we introduce a simple-yet-effective Volume Contrast (VoCo) framework to leverage geometric context priors for self-supervision. Given an input volume, we extract base crops from different regions to construct positive and negative pairs for contrastive learning. Then we predict the contextual position of a random crop by contrasting its similarity to the base crops. In this way, VoCo implicitly encodes the inherent geometric context into model representations, facilitating high-level semantic learning without annotations. Extensive experiments highlight the superiority of VoCo, showcasing promising transferability to unseen modalities and datasets. VoCo notably enhances performance on datasets with limited labeled cases and significantly expedites fine-tuning convergence.

Method

The pivotal procedure is to generate position labels for self-supervision. We propose to leverage the inherent geometric context priors in 3D medical images. Given an input volume V, we first randomly crop a sub-volume k, with the objective of constructing positive and negative pairs with k for contrastive learning. Specifically, we propose to employ position encoding to generate n non-overlap base crops q-i, where each base crop represents a distinct region of the input volume.

Within human body anatomy, various organs are situated in distinct regions, leading to a potential way for us to form positive and negative pairs. As shown in Fig., the random crop k and the positive base crops q-pos exhibit overlap areas, whereas the negative base crops q-neg, lacking such overlaps, more likely encompass different organs (not absolutely). For example, k and q-pos both contain stomach, pancreas, vein, aorta, and cava, while k and q-neg exhibit different organ information. Thus, we can employ the position encoding to construct positive and negative pairs for contrastive learning.

Previous contrastive learning methods mainly employ InfoNCE loss to maximize the mutual information of positive pairs. In this paper, we propose to generate labels with specific values to supervise the correlation extent of positive pairs, i.e., with labels to reflect how similar between k and q-pos. It can be observed that the correlation between k and q-pos is associated with their overlap proportions. Intuitively, if a positive base crop q-pos shares more overlap areas with k, this q-pos would be more similar to k. Thus, we propose to assign the overlap proportions as the values of position labels y, enabling us to measure the similarity between k and q-pos. In contrast, the position labels y of q-neg are assigned to 0.

Overall framework of VoCo. (a) First, we generate base crops q with corresponding position labels y. Then we input the random crop k and base crops q for contextual position prediction. Specifically, we employ a student-teacher module to project k and q separately, where the teacher projector is frozen and updated from the student projector with Exponential Moving Average (EMA). Finally, we conduct volume contrast between k and q to predict similarity s, where s is supervised by position labels y. (b) We use the position labels to supervise the intra-volume contrast on k, q-pos, and q-neg, where k, q-pos, and q-neg are from the same volume. (c) We extract random crop k-A and base crops q-B from different volumes V-A and V-B for inter-volume contrast.

Experiments

We build the largest benchmark in this field, where we open-source the implementation of more than 50 downstream tasks, including segmentation, classification, registration, and Visual Language Processing (VLP). Extensive experiments demonstrate the superiority of VoCo. Consistent and significant improvements across 51 tasks are highlighted, i.e., average +3.62% over baseline and +2.19% above the second-best model.

Conclusion

In this paper, we proposed a simple-yet-effective Volume Contrast (VoCo) framework for large-scale 3D medical image pre-training. Inspired by the consistent geometric relation between different organs, we propose to leverage the geometric context priors to learn consistent semantic representations for SSL. VoCo can also be seamlessly integrated into a semi-supervised learning framework for omni-supervised pre-training. To facilitate the study of large-scale 3D medical image pre-training, we curated the existing largest medical image pre-training dataset PreCT-160K, which encompasses 160K CT volumes covering diverse anatomical structures. We further delve into the scaling law of model capacity and propose guidelines for tailoring different model sizes to various medical tasks. To evaluate the effectiveness of pre-training, we establish a comprehensive evaluation benchmark encompassing 51 downstream datasets across various tasks. Extensive experiments highlighted the superior performance of VoCo compared with previous methods.

Resources

For more details, please see our paper Large-Scale 3D Medical Image Pre-training with Geometric Context Priors via TPAMI.

Citation:
L. Wu, J. Zhuang and H. Chen, “Large-Scale 3D Medical Image Pre-Training With Geometric Context Priors,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: 10.1109/TPAMI.2025.3639593.

[Press Release] SmartLab Introduces SmartPath: An AI Pathology Platform Transforming End-to-End Cancer Care

2025-10-22T00:00:00+08:00

SmartLab at The Hong Kong University of Science and Technology (HKUST) today launched SmartPath, a comprehensive artificial intelligence (AI) system designed to transform the entire pathology workflow for cancer care. Led by Assistant Professor Hao Chen, Director of the Collaboration Center for Medical and Engineering Innovation, SmartPath provides integrated support for clinical diagnosis, subtyping, biomarker quantification, treatment response assessment, and prognostic follow-up across a wide spectrum of cancers—accelerating turnaround times and enhancing personalized treatment.

Groundbreaking Features for End-to-End Clinical Support

Developed from one of the largest and most diverse pathology datasets—over 500,000 whole-slide images spanning 34 major tissue sites—SmartPath assists healthcare professionals with 100+ clinical tasks, including cancer classification, subtyping, treatment response evaluation, survival prediction, and automated pathology report generation.

SmartPath is powered by two integrated large AI models:

Generalizable Pathology Foundation Model (GPFM)
A unified framework for accurate tumor identification, subtyping, and biomarker quantification across diverse tissues. GPFM supports survival outcome prediction and treatment response assessment, forming a data-driven foundation for personalized therapy.
Sources: arXiv (Towards a generalizable pathology foundation model via unified knowledge distillation, 2024), SmartX Lab publications.
Multimodal Knowledge-enhanced Whole-slide Pathology Foundation Model (mSTAR)
Fuses whole-slide pathology images with contextual data (pathology reports and transcriptomics) to enable minute-level automated report generation and powerful visual question-answering on slide regions.
Sources: arXiv (A multimodal knowledge-enhanced whole-slide pathology foundation model, 2024), SmartX Lab technical page.

Engineered for seamless clinical integration, SmartPath streamlines the cancer care cycle—from rapid slide analysis and proactive risk alerts to AI-assisted reporting—reducing diagnostic bottlenecks and enabling pathologists to focus on complex decisions.

Proven Performance in Rigorous Clinical Trials

SmartPath is undergoing multi-center prospective validation with top-tier hospitals in Hong Kong and the Chinese Mainland. In comprehensive benchmarking, it significantly outperformed existing models. A recent prospective study at Nanfang Hospital reported accuracy >95% across multiple cancers (including lung, breast, and colorectal), confirming SmartPath’s ability to enhance diagnostic accuracy, reliably predict patient survival, and rapidly generate detailed pathology reports.
Source: HKUST official news release; Medical Xpress; The Standard.

Leadership Voices

Prof. Hao Chen, Director of the Collaboration Center for Medical and Engineering Innovation and Assistant Professor in the Departments of Computer Science and Engineering and Chemical and Biological Engineering, said:

“SmartPath has been built and validated with a strong network of clinical partners. Across a wide array of real-world clinical tasks, the system consistently ranks first in benchmarking—especially in malignancy identification and treatment response prediction. With continuous real-world feedback, SmartPath keeps learning and improving, setting a new standard for intelligent, personalized medicine.”

Prof. Liang Li, Director of the Department of Pathology at Nanfang Hospital and Professor at Southern Medical University, commented:

“Preliminary results from our prospective trials are highly encouraging. SmartPath improves malignancy identification, provides reliable prognostic predictions, and significantly shortens diagnostic turnaround time through rapid generation of preliminary reports—crucial for time-sensitive cancer cases. This is the future of pathology, where AI augments precision and empowers data-driven clinical decisions.”

Real-World Impact and Collaboration

SmartPath is being deployed with over a dozen leading hospitals across Hong Kong and the Chinese Mainland, enabling robust validation across diverse patient populations and clinical tasks. The benchmark and framework established by HKUST and its partners set a new standard for computational pathology in precision oncology and smart healthcare worldwide. Ongoing research is expanding SmartPath to additional cancer types—including rare and genetically complex malignancies—to further enhance predictive models and patient stratification.

About SmartX Lab

SmartX Lab, led by Prof. Hao Chen at HKUST, is dedicated to advancing trustworthy AI technologies for healthcare and science. Our research spans large-scale models for healthcare, computer-assisted intervention, AI for science, and bioinformatics. Our mission is to drive a transformative revolution in medical practice and scientific discovery, shaping a healthier future.

About The Hong Kong University of Science and Technology

The Hong Kong University of Science and Technology (HKUST) is a world-class university that excels in innovative education, research excellence, and impactful knowledge transfer. With a holistic and interdisciplinary approach, HKUST was ranked 3rd in THE’s Young University Rankings 2024, and 19th worldwide and No.1 in Hong Kong in THE Impact Rankings 2025. Thirteen HKUST subjects were ranked among the world’s top 50 in the QS World University Rankings by Subject 2025, with Data Science and Artificial Intelligence ranked 17th globally and first in Hong Kong. Over 80% of HKUST research was rated “internationally excellent” or “world leading” in Hong Kong’s latest Research Assessment Exercise. As of July 2025, HKUST members have founded over 1,900 active start-ups, including 10 unicorns and 17 exits (IPO or M&A).

Sources and Further Reading

[EClinicalMedicine] Multi-task Deep Learning System achieves accurate identification and prognosis prediction of triple-negative breast cancer

2025-10-10T00:00:00+08:00

Recently, a joint research team from the The First Affiliated Hospital, Zhejiang University School of Medicine and the Hong Kong University of Science and Technology (HKUST), together with partner hospitals in China, announced a major breakthrough in AI-assisted triple negative breast cancer (TNBC) identification and prognosis prediction. Their work introduces the TRiple-negative breast cancer Identification and Prognosis prediction (TRIP) System designed for simultaneously identifying triple-negative breast cancer and predicting its prognosis based on H&E slides. Published in EClinicalMedicine (IF = 10.0, CAS Medicine Q1), this study demonstrates substantial accuracy gains and time savings for pathologists in clinical scenarios.

Introduction

In this study, we developed and rigorously validated the TRIP system, leveraging 4,898 breast cancer patient samples (including over 1,000 TNBC cases) from five hospitals in China and the public TCGA dataset. The system accurately identifies TNBC subtype and predicts patient’s disease-free survival and overall survival using H&E slides, providing an innovative paradigm for integrated AI-assisted pathology image analysis of TNBC.

The development of TRIP not only provides a new diagnostic and prognostic tool for TNBC, the most aggressive and heterogeneous breast cancer subtype, but also injects new momentum into intelligent pathology and precision oncology.

Clinical Challenges

Triple-negative breast cancer, the most aggressive subtype of breast cancer, accounts for approximately 15% of all breast cancer cases. Its cancer cells do not express estrogen receptors (ER), progesterone receptors (PR), and human epidermal growth factor receptor 2 (HER2), lacking clear therapeutic targets. The five-year survival rate for patients is only approximately 75%, significantly lower than that of other breast cancer subtypes (over 90%). Current clinical diagnosis relies primarily on immunohistochemistry (IHC), which is costly, time-consuming, and requires stringent tissue specimens. Furthermore, due to tumor heterogeneity, even patients with the same TNM stage have varying prognoses. Existing prognostic stratification methods based on clinicopathological features are limited in effectiveness, necessitating an urgent need for more efficient and accurate diagnostic and prognostic tools. Previous AI researches on triple-negative breast cancer have been plagued by small sample sizes (less than 600 TNBC patients), limited validation cohorts, and single clinical tasks, making it difficult to meet clinical needs. The TRIP system is developed by integrating multi-center large-sample data with innovative algorithms, and achieves the “identification + prognosis” multi-task integration for the first time, filling the technological gap in this field.

Method

The exceptional performance of the TRIP system stems from multiple innovations in its underlying technical architecture, establishing a fully intelligent solution from pathology image analysis to clinical prediction:

Unified Multiple-Instance Learning Framework: Unlike traditional single-task AI models, the TRIP system employs the same AI network structure simultaneously supporting two key modules: “Triple-Negative Breast Cancer (TNBC) Identification” and “Disease-Free Survival (DFS)/Overall Survival (OS) Prediction,” enabling efficient analysis of WSIs without manual intervention.
Effective and Efficient Long-sequence Modeling: The system utilizes a state-of-the-art pathology foundation model GPFM to extract patch-level features, which can automatically adapts to staining variations across different hospitals, eliminating the need for additional staining normalization. Then, a bidirectional Mamba encoder effectively captures long-range dependencies between patches across the entire slide, addressing the limitation of traditional Mamba models that early information in long sequences is easy to be forgot, thereby enhancing the accuracy of pathology image analysis.
Dynamic Adaptation for Enhanced Generalizability: To tackle challenges like staining and scanner differences across hospitals, the system integrates a Test-Time Adaptation (TTA) strategy. By fine-tuning only the parameters of normalization layer, it significantly improves prediction robustness on data from external sources, laying a solid foundation for real-world clinical deployment.

Validation and Results

To validate the effectiveness of the TRIP system, we constructed a large-scale dataset covering five tertiary hospitals in China and the public TCGA dataset. A total of 4,898 breast cancer patients (including more than 1,000 TNBC patients) were included, making this the world’s largest AI system validation study for triple-negative breast cancer.

TNBC identification performance:

Internal cohort. AUC: 0.980 (95% CI: 0.958-0.996), Sensitivity: 0.963, Specificity: 0.857, Accuracy: 0.934.
External cohort. AUCs: 0.916 (95% CI: 0.848-0.959), 0.936 (95% CI: 0.907-0.962), 0.860 (95% CI: 0.779-0.930), and 0.890 (95% CI: 0.841-0.929) in SDPH, SRRS, WHCH, and TCGA, respectively, which were significantly better than conventional AI models (MaxMIL, AttMIL).

Survival analysis performance:

Disease-free survival. Internal C-index: 0.747±0.070 (95% CI: 0.617-0.852), external C-index: 0.731±0.047 and 0.732±0.043
Overall survival. Internal C-index: 0.744±0.075 (95% CI: 0.602-0.865), external C-index: 0.720±0.034 and 0.721±0.030
Kaplan-Meier analysis. The TRIP system can accurately divide patients into high-risk and low-risk groups. The difference between the two groups is statistically significant (P-values < 0.0033), providing a clear risk stratification for clinical decision-making.

Interpretability

To address the “black box” problem of AI models, the TRIP system incorporates pathology heatmap visualization technology to visually mark tissue regions that are critical for diagnosis and prognosis. Pathologists found that the areas the system focuses on closely align with typical pathological features of triple-negative breast cancer. For example, in highly malignant cases, the heatmap highlights areas of nuclear atypia, tumor necrosis, and an immunosuppressive microenvironment; areas with abundant lymphoplasmacytic infiltration correlate with a better prognosis, which is generally consistent with clinical pathology.

To further validate the system’s reliability, the research team conducted the multi-omics analysis. Transcriptome data from 211 patients revealed 116 differentially expressed genes between high- and low-risk groups stratified by the TRIP system, and identified three molecular subtypes (C1-C3) with distinct immune and tumor-promoting signaling profiles. Among them, the C2 subtype showed significantly better prognosis than the C1 and C3 subtypes (P-values < 0.05), due to its high expression of immune-related pathways (such as the interferon-γ response pathway). This result is highly consistent with the conclusions of internationally recognized TNBC molecular classification studies, confirming the scientific validity of the TRIP system’s prognostic assessment at the molecular level.

Clinical Benefits and Limitations

Clinical Benefits:
The successful development of the TRIP system brings multiple clinical benefits to the diagnosis and treatment of triple-negative breast cancer:

Simplified diagnostic workflow: TNBC can be rapidly identified using only H&E slides, eliminating the need for IHC testing. This significantly reduces testing costs, shortens reporting times, and reduces the workload of pathologists, making it particularly suitable for resource-limited settings.
Optimized treatment decisions: Accurate prognostic stratification helps clinicians identify high-risk patients and chosse tailored treatment plans. It also enables treatment de-escalation for low-risk patients, reducing over-medication.
Improved equity in diagnosis and treatment: The system can be deployed on a workstation equipped with a single GPU with 12GB GPU memory, offering a low hardware threshold and facilitating its adoption in hospitals at all levels, helping to narrow the gap in diagnosis and treatment across different regions.

Limitations:
The current research data mainly comes from postoperative tissue samples, while clinical variables (such as age and TNM stage) are not yet included, limiting its direct application in preoperative scenarios. Moreover, further prospective studies can be conducted to verify the effectiveness of the system in core needle biopsy (CNB) samples; at the same time, clinical, imaging and genomic data can be integrated to build a multimodal AI model to continuously improve system performance.

Resources

For more details, see the paper: Development and validation of an artificial intelligence system for triple-negative breast cancer identification and prognosis prediction: a multicentre retrospective study via EClinicalMedicine.

Citation:
X.M. Zhang, H.J. Zhou, Q. Chen, et al. Development and validation of an artificial intelligence system for triple-negative breast cancer identification and prognosis prediction: a multicentre retrospective study. EClinicalMedicine (2025). https://doi.org/10.1016/j.eclinm.2025.103557

[EClinicalMedicine] Multi-task Deep Learning System Enhances Integrated Non-invasive MRI Diagnosis of Nine Knee Abnormalities

2025-09-29T00:00:00+08:00

Recently, a joint research team from the Southern Medical University Third Affiliated Hospital and the Hong Kong University of Science and Technology (HKUST), together with partner hospitals in southern China, announced a major breakthrough in AI-assisted knee MRI interpretation. Their work introduces a multi-task deep learning system (DLS) designed for integrated diagnosis of nine common knee abnormalities. Published in EClinicalMedicine (IF = 10.0, CAS Medicine Q1), this study demonstrates substantial accuracy gains and time savings for radiologists in real-world settings.

Introduction

Knee MRI diagnosis poses unique challenges due to complex soft tissue anatomy, multiple imaging sequences, and nuanced structural details. Conventional AI solutions tend to focus on single abnormalities, limiting clinical utility. The new DLS tackles this gap by providing a “generalist” approach—covering **meniscal tears, cartilage defects, ACL/PCL/MCL/LCL injuries, infrapatellar fat pad (IFP) injury, synovial plica, and cysts—within a unified diagnostic workflow.

The large-scale, multicentre study involved 13,419 patients, 14,962 MRI exams, and over 1 million individual images, and adopted a rigorous stepwise validation strategy incorporating multi-reader, multi-case experiments and randomized controlled trials.

Clinical Challenges

Radiologists often struggle with knee MRI interpretation due to:

Diagnostic complexity: Multiple coexisting pathologies with subtle boundaries.
Current AI limitations: Most systems are narrow-focus, lacking panoramic assessment.
Workflow inefficiency: A single case may require numerous sequence reviews, often taking 5–8 minutes, increasing cognitive load.

Method

The DLS employs a coarse-to-fine, multi-plane attention framework:

Coarse localization: A 3D U-Net identifies the meniscal region and crops 256×256 ROIs to reduce irrelevant background.
Multi-plane feature extraction: Sagittal, coronal, and axial PD fat-suppressed sequences are processed separately and then fused for classification.
Attention-guided defect focus: The Attention Object Localization Module (AOLM) directs focus to discriminative areas, improving small-lesion detection.
Explainability: Grad-CAM heatmaps visualize AI-attention overlap with expert-identified pathology regions, fostering clinical trust.

The system was implemented in PyTorch, and the training/testing datasets were strictly separated into internal and two external test sets.

Validation and Results

Multi-centre model performance:

Major lesions (meniscal tear, cartilage defect, ACL tear) AUC: Internal 0.898, External I 0.852, External II 0.812
Minor lesions (PCL, MCL, LCL, IFP injury, plica, cysts) AUC: Internal 0.815, External I 0.744, External II 0.774
Accuracy range: Internal 73.1%–95.6%; External I 63.3%–89.3%; External II 65.5%–83.5%

Reader studies:

Step 1 – DLS vs low/high seniority radiologists: Comparable accuracy in major lesions for both groups; notable gains in cartilage defects and cyst detection for low-seniority readers.
Step 2 – Multi-reader re-read (External I, washout period): Accuracy, sensitivity, and specificity improved across all readers; mean reading times reduced by ~30.5s (junior) and ~26.0s (senior).
Step 3 – Randomized controlled trial (External II): DLS-assisted group showed overall accuracy gains of 4.2%–8.8% (statistically significant), with further reading time reductions (~35.5s junior, ~30.7s senior).

Clinical Insights

The DLS proved helpful in addressing nine high-difficulty scenarios, such as early-stage cartilage changes, partial ACL tears, chronic PCL “normal appearances,” and distinguishing cysts from synovial recess fluid. Attention-guided multi-plane processing enabled more stable lesion identification in these “error-prone” cases.

Impact and Outlook

From a radiology perspective, the integrated approach reflects real-world needs for simultaneous interpretation of diverse knee pathologies. As a second reader, DLS can mitigate both under-diagnosis and over-diagnosis risks while improving efficiency.

From an AI perspective, the coarse-to-fine attention paradigm demonstrates quantifiable benefits, with attention-guided lesion localization showing clear performance boosts in ablation studies. The explainable heatmaps align closely with human expert strategies, increasing clinical adoption potential.

Limitations:

Retrospective, regional multicentre scope; broader geographic, prospective studies are needed.
Reference standards based on MRI + report; limited arthroscopic gold-standard confirmation.
Current reliance on PD fat-suppressed sequences; broader protocol adaptability needed.

Future Directions:

Large-scale prospective evaluations to measure long-term outcome and efficiency gains.
Extend “localization–recognition” AI workflow to shoulder, hip, and ankle joints for a generalizable musculoskeletal AI platform.

Resources

For more details, see the paper: Development of a multi-task deep learning system for classification of nine common knee abnormalities on MRI: a large-scale, multicentre, stepwise validation study via EClinicalMedicine.

Citation:
Xie, Z., Qiu, Z., Li, Y., et al. Development of a multi-task deep learning system for classification of nine common knee abnormalities on MRI: a large-scale, multicentre, stepwise validation study. EClinicalMedicine (2025). https://doi.org/10.1016/j.eclinm.2025.103534

[Call for Participants] CSIG Youth Scientist Conference 2025 Forum

2025-09-10T00:00:00+08:00

The 2025 CSIG Youth Scientist Conference, initiated by the Youth Working Committee of the China Society of Image and Graphics, will be held from September 18-21, 2025 at the Wyndham Grand Qingdao Yingsha Beach. The conference is hosted by the China Society of Image and Graphics and co-organized by the Ocean University of China, Shandong University, and the Youth Working Committee of the China Society of Image and Graphics. Academician Yaonan Wang, Professor Houjie Wang, Professor Junyu Dong, and Professor Huimin Ma will serve as the conference chairs.

Overview

CSIG Youth Scientist Conference 2025 will be a gathering of elites and distinguished guests. 4 keynote speakers will bring us the latest insights on image and graphics, 30 thematic forums and 4 workshops will showcase the diversity of opinions in the field, and more than 200 high-level academic reports will surely spark ideas. Youth talent forums such as the Youth Talent Support Program Forum, Excellent Doctoral Dissertation Forum, and Academic Rising Star Forum will also provide a platform for young talents to speak freely, exchange ideas, and grow. At that time, 2000+ scholars and doctoral students from academia and industry in the field of image and graphics and interdisciplinary subjects will gather together to celebrate this academic event. The biomedical image analysis sub-forum, is one of the thematic forums of the conference.

Agenda

Chairs and Keynote Speakers

Chairs Information

Prof. Qing Cai - Ocean University of China

Qing Cai is an Associate Professor and Doctoral Supervisor at the Ocean University of China. He has successively won the titles of “Taishan Scholar” Young Expert in Shandong Province, Shandong Provincial Outstanding Youth Fund, Shandong Provincial Artificial Intelligence Science and Technology Award - Outstanding Youth Award, and Ocean University of China “Youth Talent Project”. He is a Senior Member of CCF, a member of the CCF-CV Committee, a member of the CCF-MM Special Committee, a member of the CCF-AI Special Committee, a member of the CSIG Youth Working Committee, a director of the Shandong Artificial Intelligence Society, and a member of YOCSEF Qingdao. His main research direction is artificial intelligence and medical-engineering interdisciplinary direction, including medical image processing, disease-assisted diagnosis, and 3D reconstruction. Related work has been published in important academic conferences and journals at home and abroad as the first or corresponding author: CVPR, AAAI(x4), IJCAI, ACM MM, IEEE TIP(x4), IEEE TNNLS(x2), IEEE TCSVT, PR(x3), etc. He serves as a reviewer for CVPR, ICCV, ECCV, NeurIPS, ICLR, ICML, AAAI, IJCV, IEEE TIP, IEEE TNNLS, IEEE TCYB, etc. He presides over projects such as the National Natural Science Foundation of China (General & Youth), the Shandong Provincial Natural Science Foundation (Outstanding Youth & Youth), the China Postdoctoral Science Foundation, and the Ocean University of China Youth Talent Project.

Prof. Yinghuan Shi - Nanjing University

Yinghuan Shi, PhD, is a Professor and Doctoral Supervisor in the School of Computer Science and Technology at Nanjing University, Assistant Dean, and also the main person in charge of the Medical Artificial Intelligence Platform of the National Research Institute of Health and Medical Big Data at Nanjing University. She received her Bachelor’s and Doctoral degrees from the Department of Computer Science and Technology (now the School of Computer Science) at Nanjing University in 2007 and 2013, respectively. Her research interests include machine learning, pattern recognition, and their interdisciplinary research in medical image processing, AI for Science, etc. In recent years, she has presided over the National Natural Science Foundation for Excellent Young Scientists, the National Natural Science Foundation Key Project, the National Key R&D Program Digital Diagnosis Key Special Project, the National Science and Technology Innovation 2030 - New Generation Artificial Intelligence Major Project, and the Jiangsu Provincial Frontier Technology R&D Program Project. She has published more than 80 papers in CCF-A conferences and IEEE/ACM journals. She has published a popular science book “Artificial Intelligence in Your Pocket - AI and Health Care”. She has received honors such as the first Outstanding Graduate Moral Education Tutor of Nanjing University, the Wu Wenjun Artificial Intelligence Outstanding Youth Award, the China Association for Science and Technology Youth Talent Support Project, the second prize of the Jiangsu Provincial Natural Science Award (second finisher), and the Chinese People’s Liberation Army Military Medical Achievement Award (third finisher).

Prof. Xiao Jia - Shandong University

Xiao Jia is a Professor and Doctoral Supervisor at the School of Control Science and Engineering, Shandong University. He was selected into the National High-level Youth Talent Program, Distinguished Young and Middle-aged Scholars of Shandong University (First Level), Taishan Scholar Youth Expert of Shandong Province, and Shandong Provincial Excellent Youth Science Fund Project (Overseas), and undertakes a number of national and provincial natural science funds. He received his bachelor’s degree from the Department of Automation, Shandong University, his doctorate degree from the Department of Electronic Engineering, The Chinese University of Hong Kong, and was a postdoctoral fellow at Stanford University. Relying on the Key Laboratory of Machine Intelligence and System Control of the Ministry of Education and the Institute of Artificial Intelligence and System Control, he conducts research work. His main research directions include machine learning, multimodal intelligent perception, vision-language large models, and intelligent medical systems. He has published more than 40 papers in international academic journals and conferences such as PIEEE, EU, TASE, ICRA, IROS, and MICCAI.

Prof. Miaojing Shi - Tongji University

Miaojing Shi is a Professor at the School of Electronics and Information Engineering, Tongji University, Vice Dean of the Scientific and Technological Research Institute, and a Visiting Professor at King’s College London. She graduated from Peking University with a Ph.D. and has served as a researcher at the French National Institute for Research in Computer Science and Automation, and as an Assistant Professor and Associate Professor in the Department of Informatics at King’s College London. Her main research interests include computer vision and medical image processing. She has published more than 90 high-level journal and conference papers. She has presided over more than 10 projects including the National Natural Science Foundation of China, the UK Engineering and Physical Sciences Research Council project, and the European Research Council project. Recently, she won the France-China Committee Personal Science and Technology Innovation Award, the King’s College London Annual Contribution Award, and the Tongji University May Fourth Youth Medal. He is a Young Thousand Talents, a Senior Member of IEEE, and a Fellow of the UK Higher Education Academy.

Prof. Daoqiang Zhang - Nanjing University of Aeronautics and Astronautics

Daoqiang Zhang is a Professor and Dean of the School of Artificial Intelligence, Nanjing University of Aeronautics and Astronautics, and Director of the Key Laboratory of Brain-Machine Intelligence Technology of the Ministry of Education. He was selected as a national leading talent and a Fellow of the International Association for Pattern Recognition (IAPR Fellow), and was funded by the National Natural Science Foundation for Excellent Youth and Key Projects. He serves as an editor of journals such as IEEE Trans. Medical Imaging, Pattern Recognition, Machine Intelligence Research, and Intelligent Medicine, as well as the deputy editor of the journal “Data Acquisition and Processing”. He serves as a supervisor of the China Society of Image and Graphics, a director of the Alzheimer’s Disease Prevention and Treatment Association, deputy director of the Graphics Big Data Special Committee of the Chinese Society of Graphics, a standing member of the Machine Learning Special Committee of the Chinese Artificial Intelligence Society, a standing member of the Medical Information and Control Branch of the Chinese Society of Biomedical Engineering, and director of the Medical Image Processing Special Committee of the Jiangsu Artificial Intelligence Society. His main research directions are artificial intelligence, machine learning, medical image analysis, brain-computer interface, etc. He has published more than 200 academic papers, which have been cited more than 20,000 times. He has won 1 second prize of the National Natural Science Award, and 1 first prize and 1 second prize of the Ministry of Education Natural Science Award. Doctoral students/postdoctoral fellows he has supervised have won the MICCAI Young Scientist Award, an important international conference in the field of medical imaging, twice. He has been selected as an Elsevier Highly Cited Chinese Scholar for 10 consecutive years from 2014-2023.

Keynote Speakers Information

Prof. Yang Chen - Southeast University

Report Title: Intelligent Medical Imaging and Processing

Abstract: The report will focuses on high-quality imaging technology based on feature learning in clinical task-driven intelligent medical imaging, the embedding of core algorithm research and development of domestic medical imaging equipment, and clinical task-driven medical image processing. It mainly talks about four parts: intelligent medical imaging, imaging algorithm application, intelligent image processing and application, and thinking on medical-engineering interdisciplinary research.

Speaker Profile: Professor Chen Yang conducts scientific research on medical imaging algorithms and intelligent image analysis, serving domestic high-end medical equipment. He has published more than 100 papers and is a Highly Cited Chinese Scholar released by Elsevier from 2022-2024. He is currently a professor at the School of Computer Science and Engineering, Southeast University, a winner of the National Science Fund for Distinguished Young Scholars, and the person in charge of the Key R&D Program of the Ministry of Science and Technology.

Prof. Yong Xia - Northwestern Polytechnical University

Report Title: Intelligent Computing for Medical Imaging - Challenges and Practices

Abstract: With the rapid development of deep learning technology, intelligent computing for medical imaging has made significant progress, but it still faces many challenges, especially how to build high-performance and reliable models in the case of scarce labeled data and long-tail distribution of diseases. In order to cope with these challenges, the application of pre-training technology in medical image analysis has gradually attracted attention. Related research is committed to using relevant or even irrelevant medical image data to pre-train the model to improve its ability to analyze various modalities of medical images. On this basis, researchers are working to build large-scale foundation models and develop model fine-tuning techniques to improve the generalization performance of models on different diagnostic tasks. This report will delve into the main challenges faced by pre-training technology in medical image analysis, including insufficient data labeling, data dimensionality issues, limitations of model capabilities, and the construction of foundation models. By sharing the research experience and insights of the research group in these fields, this report will also explore the application opportunities and challenges of pre-training and foundation models in medical image analysis, in order to provide useful references and inspirations for researchers in related fields.

Speaker Profile: Yong Xia is a Professor at the School of Computer Science/School of Artificial Intelligence, Northwestern Polytechnical University, and a member of the National Engineering Laboratory for Integrated Space-Air-Ground-Sea Big Data Application Technology. His research direction is intelligent computing for medical imaging. In the past 5 years, he has published more than 100 papers in JAMA Network Open, Radiology, IEEE-TPAMI/TMI/TIP/TNNLS, IJCV, MedIA, NeurIPS, CVPR, ECCV, MICCAI, AAAI, and IJCAI. Google citations are more than 17,000 times, H-index=61, and he has won the top three in more than 10 international subject competitions. He serves as a director of the Chinese Society for Stereology, a standing member of the Digital Medicine Branch of the China Computer Federation, etc.

Prof. Yong Liu - Beijing University of Posts and Telecommunications

Report Title: Brain Connection, Brain Network and Cognitive Ability: A Reflection Taking AD Application as an Example

Abstract: Multimodal magnetic resonance brain imaging can non-invasively provide information on the structure and functional activity of the human brain. Brain network research has opened up new avenues for understanding brain information processing mechanisms and evaluating new intervention programs. Developing precise and effective brain network computing theories and methods, and on this basis, clarifying the structural and functional organization rules, information processing patterns, and regulatory mechanisms of brain networks on cognitive functions has become a common scientific frontier for information science, brain science, etc. We will discuss with you the origin of the ideas of brain connection and brain network research based on brain imaging, the research progress of some special populations, and the progress of application in Alzheimer’s disease (AD). The focus is on reporting the recent progress of the team in selecting AD, a typical representative of neurodegenerative diseases, as the research object, from individualized brain atlases, individualized precise brain connection patterns, the development of non-invasive transcranial photobiomodulation systems, and AD intervention paradigms.

Speaker Profile: Yong Liu, PhD, Professor, Vice Dean of the School of Artificial Intelligence, Beijing University of Posts and Telecommunications. His main research direction is intelligent understanding and clinical application of brain imaging. He has published more than 50 papers as the corresponding (including co-) author, including Science Advances, Alzheimer & Dementia, eClinicalMedicine, Biological Psychiatry, Science Bulletin, etc., and has been granted 7 patents. As the project leader, he has undertaken projects including the National Natural Science Foundation Youth Fund (Category A) project, the Ministry of Science and Technology Science and Technology Innovation 2030 - Major Project, the National Key R&D Program Key Special Project, the National Natural Science Foundation Key Project, and the Beijing Natural Science Foundation for Distinguished Young Scientists (2020). He has won academic awards such as the Wu Wenjun Artificial Intelligence Science Award First Prize (2019, ranked 2nd), and has been continuously selected as an Elsevier Highly Cited Chinese Scholar (2020–).

Prof. Hao Chen - Hong Kong University of Science and Technology

Report Title: Pathology Large Models for Precision Cancer Diagnosis and Treatment: Challenges and Opportunities

Abstract: With the deep integration of artificial intelligence and digital pathology, pathology large models are becoming a new paradigm in the field of precision cancer diagnosis and treatment. This report will systematically explore the frontier progress of pathology large models in precision cancer diagnosis, molecular subtyping prediction, treatment response evaluation, and prognostic analysis, revealing its breakthrough potential to achieve “micro-meso-macro” association analysis through multimodal data fusion (such as whole-slide images, imaging, genomics, and clinical information). Further explore how pathology large models promote the paradigm shift of precision cancer diagnosis and treatment from “experience-driven” to “calculation-driven”.

Speaker Profile: Hao Chen, Assistant Professor, Department of Computer Science and Engineering, Department of Chemical and Biological Engineering, and Department of Life Science, Hong Kong University of Science and Technology, Director of the Medical Engineering Interdisciplinary Joint Innovation Center. His research interests include medical large models, computational pathology, multimodal fusion, medical image analysis, computer-aided minimally invasive diagnosis and treatment, etc. He has published more than 100 papers (Google Scholar citations more than 35,000 times, h-index 79) in top journals and conferences such as Nature Biomedical Engineering, Nature Communications, Lancet Digital Health, Nature Machine Intelligence, Jama, MICCAI, IEEE-TMI, MIA, CVPR, ICCV, etc. He has been continuously selected as a Stanford University global top 2% scientist and a Clarivate Analytics global highly cited scientist. He has won the 2023 Asian Young Scientist Award, the second prize of the National Ministry of Education Excellent Achievement Award, the first prize of the Beijing Science and Technology Progress Award, and the 2019 MICCAI Young Scientist Impact Award, a top conference in artificial intelligence medical imaging. He serves as an editor of journals including IEEE TMI, TNNLS, J-BHI, and CMIG, and serves as an area chair and program committee member of multiple international conferences such as ICLR, CVPR, ACM MM, and MICCAI. He has led the team to win 15 international medical image analysis challenge championships.

Prof. Yitian Zhao - Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences

Report Title: Beyond The Eye: Exploring Neurodegenerative Diseases with Intelligent Analysis of Ophthalmic Images

Abstract: Due to the complexity of the brain, the ambiguity of evaluation indicators, and the lack of direct observation methods, early screening for neurodegenerative diseases such as Alzheimer’s disease (AD) is a difficult problem in the medical field. Clinically routine methods such as PET, MRI, DSA, and cognitive tests have problems such as large equipment, high cost, radiation or trauma, and long detection time, which are not suitable for popularization at the grassroots level and large-scale population screening. Therefore, exploring efficient and non-invasive screening methods has become an urgent need and research hotspot for early prevention. The eye and brain have a high degree of developmental homology and functional similarity, and the eye has the potential to become an excellent observation window for studying the brain. This report will introduce the reporter’s research work in multimodal ophthalmic medical image processing in recent years, mainly including image enhancement, structure extraction and segmentation, feature quantification, and disease diagnosis algorithm research. Focusing on the difficulties in large-scale screening of neurodegenerative diseases such as Alzheimer’s disease, this report explores the correlation between brain diseases and various eye image features, reveals the internal connections and patterns between brain diseases and retinal structural changes, and provides a scientific basis and technical support for early accurate assisted diagnosis.

Speaker Profile: Yitian Zhao is a researcher and doctoral supervisor at the Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, and the deputy director of the Advanced Diagnosis and Treatment Laboratory. He is mainly engaged in the research of artificial intelligence + ophthalmic medical image analysis. In recent years, he has focused on ophthalmic multimodal images and carried out intelligent diagnosis algorithm research and equipment development for eye, brain, heart and other related diseases. He has successively received support from the National Excellent Youth Fund, the Ministry of Science and Technology Key R&D Program Young Scientists, the Zhejiang Provincial Outstanding Youth Fund, and the China Association for Science and Technology Youth Talent Support Project, and has presided over more than ten projects including the National Natural Science Foundation. He has published more than 100 papers in top international journals and conferences in the industry such as Nature Machine Intelligence, npj Digital Medicine, IEEE PAMI, IJCV, IEEE-TMI, MedIA, CVPR, and MICCAI, with more than 9,200 academic citations and single paper highest citation times over 2300 times. He currently serves as an editor/associate editor for journals such as The Innovation (IF: 32.1) IEEE Trans. Medical Imaging (IF: 10.6) and Medical Physics (IF: 4.5).

More Information

For more information, please visit the official website. We look forward to your participation!

[Call for Participants] 8th International Symposium on Image Computing and Digital Medicine

2025-09-07T00:00:00+08:00

The 8th International Symposium on Image Computing and Digital Medicine (ISICDM 2025) will be held at Shenzhen University from December 20 to 22, 2025. This year, ISICDM will feature a sub-forum on “Empowering Precision Medicine through Intelligent Digital Pathology,” which will bring together top scholars from around the world to discuss the latest research and applications of intelligent digital pathology in precision medicine.

Overview

With the rapid advancement of artificial intelligence, digital and intelligent pathology is becoming a core driver of precision medicine. By integrating high-resolution pathological images with multi-omics data, and leveraging cutting-edge technologies such as large AI models and multimodal fusion, intelligent digital pathology enables quantitative analysis of microscopic disease features, significantly improving the accuracy and efficiency of pathological diagnosis. This provides essential support for personalized treatment and prognosis evaluation.

However, real-world implementation still faces critical challenges — including model interpretability & transparency, patient privacy protection, and ethical compliance.

This forum will bring together prominent experts from pathology, AI, clinical medicine, and bioinformatics to discuss:

Breakthroughs in frontier technologies
Clinical application scenarios
Standardization and regulatory considerations

Through interdisciplinary dialogue, the forum aims to connect technological innovation with clinical needs, driving the paradigm shift from experience-based medicine to data-driven medicine, and contributing new momentum to the global precision medicine ecosystem.

Forum Agenda

Chairs

Prof. Hao Chen - Hong Kong University of Science and Technology

Assistant Professor (Research) at the Departments of Computer Science & Engineering, Chemical & Biological Engineering, and Division of Life Science; Director of the Joint Innovation Center for Med-Engineering Crossover.

Research interests: large medical foundation models, computational pathology, multimodal data fusion, medical image analysis, interpretable deep learning, AI-assisted minimally invasive diagnosis.
Published over 200 papers in Nature Biomedical Engineering, Nature Communications, The Lancet Digital Health, Nature Machine Intelligence, JAMA, MICCAI, IEEE Transactions on Medical Imaging, Medical Image Analysis, CVPR, and ICCV, with more than 35,000 Google Scholar citations and a Nature Index of 79.
Recognized as Stanford University Top 2% Scientist (multiple years) and Clarivate Highly Cited Researcher.
Awards: 2023 Asian Young Scientist Award, Ministry of Education (China) Scientific Research Excellence Award (2nd Prize), Beijing Science & Technology Progress Award (1st Prize), MICCAI 2019 Young Scientist Award.
Serves on editorial boards of IEEE RBME, IEEE TMI, IEEE TNNLS, JBHI, and CMIG; Area Chair for ICLR, CVPR, ACM MM, and MICCAI; led teams to over 15 international medical image analysis challenge championships.

Prof. Liansheng Wang - Xiamen University

Professor at the School of Informatics and jointly appointed Professor at the School of Medicine; Vice Director of the Digital Fujian Institute of Big Data for Health; Director of the XMU Medical AI Research Institute; Chair of MICS; Vice Chair of the AI Group of the Radiology Branch of the Fujian Medical Association; PhD from the Chinese University of Hong Kong.

Research focus: medical image processing, AI-assisted diagnosis.
Published over 120 papers in Nature Machine Intelligence, Nature Communications, IEEE Transactions on Medical Imaging, Medical Image Analysis, CVPR, AAAI, etc.
PI/Co-PI for NSFC instrumentation projects, China’s “Innovation 2030” Megaprojects, National Key R&D Program, and NSFC general/youth projects.
Awards include Tencent Rhino-Bird Research Award, Fujian Provincial Science & Technology Progress Award (2nd Prize), 2023 Tian Zhaowu Interdisciplinary Research Award (1st Prize) at XMU.
Led teams to win 11 international medical imaging competitions.

Prof. Jun Xu - Nanjing University of Information Science & Technology

Vice Dean of the School of Artificial Intelligence; Level-II Professor; Executive Director of the Institute of Smart Healthcare.

PhD from Zhejiang University; postdoctoral researcher and visiting professor at Rutgers University and Case Western Reserve University.
Roles: Deputy Director of the Medical Image Analysis Committee of the Jiangsu AI Society; Member of the Digital Pathology & AI Committee of the Chinese Society of Pathology; rotating Chair of the 4th Youth Symposium on Medical Image Computing.
Published in Nature Communications, Radiology, IEEE Transactions on Medical Imaging, Medical Image Analysis; listed as Stanford University Top 2% Scientist.
PI for NSFC joint key projects, NSFC general projects, National Key R&D Program key projects, and provincial/ministerial grants.
Research focuses: medical image computation, computational pathology, quantitative analysis of imaging and pathological slices for disease classification, risk prediction, diagnosis, treatment response, and prognosis evaluation.

Keynote Speakers & Topics

Prof. Lin Yang - Westlake University

Talk: Digital Pathology & Artificial Intelligence: Current Status and Future Perspectives
Abstract: Diagnostic pathology is the foundation and gold standard for cancer identification, but high inter-observer variability significantly impacts diagnostic consistency and efficiency, particularly in areas with shortages of pathologists. Despite rapid developments in computer-aided diagnosis (CAD), whole-slide pathology diagnosis still faces significant challenges in real-world settings. This talk reviews the latest progress in digital pathology, presents AI algorithms and solutions developed by our team to address practical challenges, and discusses open questions hindering wider clinical adoption in China.
Bio: From 2009–2011, served as Assistant Professor at Rutgers University (Departments of Pathology, Radiology, and Biomedical Engineering). From 2011–2014, Assistant Professor at the University of Kentucky. From 2014–2019, Preeminence Hire at the University of Florida with tenure as Associate Professor across three departments. Since 2020, Professor at Westlake University focusing on AI, medical imaging, machine learning, computer vision, and medical foundation models. Author of 100+ publications in Nature Machine Intelligence, Nature Medicine, CVPR, ECCV with over 10,000 citations. Winner of MICCAI Young Scientist Awards (2015, 2024), ECCV 2024 Best Paper Finalist; listed in Stanford University’s Top 2% Scientists.

Prof. Yongbing Zhang - Harbin Institute of Technology, Shenzhen

Talk: Whole-Slide Pathology Scanning and Computational Analysis
Abstract: Histopathology imaging and computational analysis are the clinical gold standards for cancer diagnosis. Current imaging technologies suffer from low speed and precision; manual diagnosis is labor-intensive and prone to subjective bias, potentially resulting in missed or incorrect diagnoses. This talk introduces the evolution of computational pathology, our lab’s work on high-speed whole-slide scanning and AI-assisted computational diagnosis, and future development prospects.
Bio: Professor and Doctoral Supervisor; recipient of the National Science Fund for Excellent Young Scholars. Has authored 100+ publications in Nature sub-journals, IEEE Transactions, NeurIPS, CVPR, ICCV; holds over 50 patents. Winner of the China National Science & Technology Progress Award (2nd Prize); recognized as Guangdong Province Science & Technology Innovation Leading Talent.

Prof. Munning Wang - Fudan University

Talk: Multi-Instance Learning with Bag and Instance-Level Optimization for Digital Pathology Diagnosis
Abstract: Digital pathology diagnosis is typically framed as a multi-instance learning (MIL) classification task supervised only by bag-level labels, often neglecting instance classification. Accurate instance classification is vital for interpretability and biomarker discovery; furthermore, it can improve bag-level classification performance. This talk presents our research on optimizing both bag and instance classification, including works published at NeurIPS and ICCV.
Bio: Professor at the Digital Medicine Center of Fudan University; Deputy Director of the Shanghai Key Laboratory of Medical Image Computing and Computer-Assisted Intervention. Research areas: AI-based digital pathology, medical imaging AI, and computer-aided drug screening. Author of 80+ papers in IEEE TPAMI, IEEE TMI, BIB, NeurIPS, ICML, CVPR, ICCV; recipient of China National Technology Invention Award (2nd Prize), Shanghai Youth Science & Technology Talent, and Shanghai Outstanding Technical Leader honors.

Prof. Sheng Huang - Chongqing University

Talk: Key Instance Mining: Identifying Local Diagnostic Cues in Pathological Images
Abstract: Addresses the visual redundancy of ultra-high-resolution whole-slide images (WSIs) using a multi-instance learning (MIL) framework to study key instance evaluation and selection. Models the weakly supervised relationship between image patches and diagnostic labels, employing attention mechanisms, local pseudo-classifiers, and clustering strategies to identify discriminative local cues. Promotes interpretability by transitioning from global to local evidence in AI-assisted pathology analysis.
Bio: Professor at the School of Big Data & Software Engineering, Chongqing University; Vice Director of the Big Data Intelligence Research Institute; ACM Chongqing Rising Star Awardee. Research focuses on medical image processing, open-world pattern recognition, and intelligent industrial inspection. Author of 60+ publications in IEEE TIP, TIFS, TNNLS, TMI, ICCV, CVPR, AAAI, IJCAI; serves as Area Chair for IJCAI, PRCV; executive committee member for multiple academic societies.

Prof. Jingang Yu - South China University of Technology

Talk: Instance Segmentation in Pathology Images and Clinical Applications
Abstract: Accurate instance-level segmentation of pathological structures such as nuclei and glands from digital pathology images is a key task in computational pathology. It underpins many clinical applications including tumor microenvironment analysis and precise immunohistochemistry interpretation, yet faces challenges such as labor-intensive annotation, strong heterogeneity, and structural complexity. This talk is divided into three parts: (1) overview of challenges, solutions, and current research status in pathology instance segmentation; (2) our recent work under weak annotation settings, including few-shot learning, semi-supervised learning, and approaches based on vision-language large models; (3) clinical use cases in related technologies.
Bio: Professor at the School of Automation, South China University of Technology; “Pearl River Talent Plan” Distinguished Young Scholar of Guangdong Province. Has studied at Xi’an Jiaotong University, Huazhong University of Science and Technology, and University of Nebraska–Lincoln (USA). Focuses on AI algorithms and translational applications for pathology. First/corresponding author of 40+ papers in TMI, MIA, MICCAI, TIP, CVPR; PI of 10+ national/provincial projects; first inventor on 12 patents worth nearly RMB 5 million in completed tech transfers; AI pathology products have been deployed in multiple hospitals. Academic roles include Deputy Secretary-General of the Visual Cognition & Computing Committee of CSIG and MICS Executive Committee Member.

Assoc. Prof. Yushan Zheng - Beihang University

Talk: Multi-Level Representation Learning for Whole-Slide Images and Tumor-Aided Diagnosis
Abstract: Whole-slide image (WSI) representation learning is central to digital pathology analysis and tumor-aided diagnosis. This talk introduces a multi-level representation learning framework spanning from the microscopic level of cells to the macroscopic level of tissues, as well as integration with cross-case multimodal structured learning. The framework effectively improves tumor screening, molecular status prediction, and prognosis evaluation across multiple auxiliary diagnostic systems.
Bio: Associate Professor at Beihang University, focusing on digital pathology image processing and tumor-aided diagnosis. PI of over 10 projects including National NSFC, Beijing Natural Science Foundation, and industry-academia collaborations. Author of 50+ papers in IEEE TMI, Medical Image Analysis, AAAI, MICCAI; holder of 10+ patents. Recognitions include the Young Scientist Award from the Chinese Society for Stereology and Biomedical Engineering.

Assoc. Prof. Chu Han - Guangdong Provincial People’s Hospital

Talk: Efficient Computation and Processing of Digital Pathology Images
Abstract: Computational pathology can assist in precise cancer diagnosis and treatment, but the extremely high resolution of pathology images poses challenges for annotation and computation. To address low annotation efficiency, our team developed multi-task learning, weak supervision, and self-supervised learning strategies that greatly reduce the need for manual expert labeling. To improve computational efficiency, lightweight network models and knowledge distillation were employed to enable large-scale, rapid pathology image analysis. This work provides core technical support for building efficient, automated intelligent pathology diagnostic systems.
Bio: Guangdong Distinguished Young Scholar; PhD in Computer Science from The Chinese University of Hong Kong; Associate Researcher and PhD Supervisor at Guangdong Provincial People’s Hospital; PI at the Guangdong Key Laboratory for Intelligent Medical Imaging Analysis & Application. Research focuses on AI algorithms for oncology imaging and computational pathology. Author of 70+ papers in TPAMI, TNNLS, TMI, MedIA, ACM TOG, CVPR, MICCAI; filed 20+ patents (6 granted). PI for multiple NSFC and Guangdong grants; awards include Guangdong S&T Progress Award (1st Prize, 2024) and 2023 National Digital Health Application Contest Grand Prize (1st place out of 39 teams).

Assoc. Prof. Jun Shi - Hefei University of Technology

Talk: AI-Based Intelligent Pathology Image Analysis for Cancer-Aided Diagnosis and Treatment
Abstract: Pathological diagnosis is the gold standard in cancer treatment. AI technologies have been increasingly applied to digital pathology image analysis, reshaping cancer diagnosis and treatment. This talk focuses on our practical research in AI-assisted cancer care, including advances in histopathology image analysis and multimodal pathology data modeling for tumor classification/grading, biomarker prediction, prognosis evaluation, and therapeutic response prediction, as well as challenges and future directions.
Bio: Associate Professor at the School of Software, Hefei University of Technology; Member of MICS. Holds a PhD in Pattern Recognition & Intelligent Systems from Beihang University; former researcher at CETC 38th Research Institute. Author of 40+ papers in IEEE TMI, MIA, Journal of Pathology, MICCAI, AAAI, BIBM; inventor on 10+ patents. PI/co-PI of over 10 research projects, including NSFC and Anhui Provincial grants; reviewer for IEEE TMI, MIA, TIP, JBHI, TCSVT, MICCAI, AAAI, BIBM.

Dr. Tian Shen - SenseTime Medical

Talk: AI-Driven Smart Pathology Department Construction
Abstract: The development of smart, digital pathology departments is a hot direction globally. Artificial intelligence can significantly improve diagnostic accuracy and workflow efficiency, but real-world issues such as a wide variety of disease types and multimodal long-tail distributions create barriers. This talk shares cutting-edge industry practices in integrating segmentation, classification, and detection into pathology department operations, and highlights how multimodal large models and model production tools can address long-tail challenges in diagnosis.
Bio: PhD in Computer Science from Lehigh University (USA). Formerly with Siemens Research and Tencent AI Lab, specializing in microscopic image processing and medical AI. Over a decade of industry experience in algorithm R&D, product innovation, and quality management. Currently COO of SenseTime Medical, overseeing AI pathology and imaging product deployment and usage.

Assoc. Prof. Wei Shao - Nanjing University of Aeronautics and Astronautics

Talk: Computational Pathology Radiogenomics
Abstract: This talk presents our recent progress in intelligent pathology radiogenomics for tumor diagnosis, including tumor microenvironment component analysis, multimodal fusion, open-set active learning for nucleus detection, zero-shot tissue segmentation, image–gene association analysis, and spatial transcriptomics-based gene prediction, with applications to breast, lung, and liver cancer prognosis and treatment.
Bio: Associate Professor at the College of Artificial Intelligence, NUAA; PhD Supervisor; member of the iBrain team (headed by Prof. Daoqiang Zhang). PI of two NSFC general projects; author of 55 publications in Nature Communications (2), Cell Reports (2), IEEE TMI (12), as well as CVPR, ICCV, NeurIPS; over 3,200 Google Scholar citations. Honors include 2× MICCAI Young Scientist Awards (only mainland China recipient), 2025 MOE Natural Science Award (2nd Prize), and Stanford University Top 2% Scientist listing (2024).

About ISICDM 2025

The forum represents a unique opportunity to engage with leading experts in digital pathology and AI, explore cutting-edge research, and discuss practical implementation challenges. Through the diverse expertise of our distinguished speakers, participants will gain comprehensive insights into the latest technological advances and their clinical applications. We warmly welcome researchers, clinicians, and industry professionals to join us in shaping the future of intelligent digital pathology.

[Nature Biomedical Engineering] A generalizable pathology foundation model using a unified knowledge distillation pretraining framework

2025-09-02T00:00:00+08:00

Recently, the SmartX Lab team, in collaboration with several leading institutions, has completed a groundbreaking project on the pathology foundation model. Published in Nature Biomedical Engineering, this work presents a comprehensive evaluation of foundation models in pathology and introduces novel solutions to address current limitations.

Introduction

Pathology, often referred to as the “gold standard” in cancer diagnosis, has seen significant advancements with the development of artificial intelligence (AI). However, existing models are often task-specific and lack the generalizability needed for complex clinical workflows. To address this limitation, a collaborative research team led by Professor Hao Chen at HKUST’s SmartX Lab, along with institutions such as Southern Medical University, Shanghai AI Lab, and others, has introduced the Generalizable Pathology Foundation Model (GPFM).

GPFM represents a milestone in pathology AI, excelling across 72 clinical tasks spanning six major categories, including diagnosis, prognosis, and quality assurance (QA). The findings have been published in Nature Biomedical Engineering, highlighting the potential for this model to address the limitations of current “specialized” AI systems.

Existing pathology AI systems often excel in specific tasks but struggle with broader adaptability. These challenges include:

Task-specificity: Models may achieve high accuracy in tissue classification but fail in survival prediction or report generation.
Generalization gap: Without a unified evaluation framework, the adaptability of models across diverse pathology tasks remains untested.
Complex clinical applications: The lack of a “comprehensive AI” hinders the deployment of AI into integrated clinical workflows.
To systematically evaluate these limitations, the team developed the first framework for generalizable pathology AI, assessing models on six task categories across 72 clinical benchmarks. Existing models achieved an average rank of only 3.7, with the best model leading in just six tasks (see Figure 2).

Method

The success of GPFM lies in three critical components:

A Unified Pathology Evaluation Framework The team introduced a comprehensive benchmark set to test the true generalizability of pathology models. This framework covers six task types: whole-slide classification, survival analysis, pathology QA, ROI classification, pathology report generation, and pathology image retrieval.
Knowledge Distillation The researchers implemented a novel, dual-engine knowledge distillation framework:

Expert Distillation: Leveraging the strengths of high-performing models such as UNI and Phikon to integrate specialized knowledge.
Self-Distillation: Facilitating cross-scale alignment of tissue features, enhancing generalization across microscopic and macroscopic levels.
Large-Scale Pretraining: GPFM was pretrained on an extensive dataset of 190 million image-level samples, sourced from over 95,000 whole-slide images (WSIs) spanning 34 tissue types. This unprecedented scale ensures the model’s robustness and adaptability to unseen clinical data.

Clinical Validation

The GPFM was comprehensively validated against state-of-the-art models across a diverse set of clinical scenarios, demonstrating significant improvements in accuracy and generalizability. In whole-slide image classification, a critical diagnostic task, GPFM achieved an average rank of 1.22 across 36 tasks, outperforming the previous leader UNI (average rank: 3.60). The model also recorded a mean AUC of 0.891, surpassing UNI by 1.6% (P < 0.001).

For survival prediction, which requires modeling complex prognostic data, GPFM held an average rank of 2.1 across 15 tasks, securing a top-2 position in 13 of them. It achieved a C-Index score of 0.665, representing a 3.4% improvement over UNI (P < 0.001).

In the domain of ROI classification, GPFM achieved the best average rank of 1.88 across 16 tasks, outperforming Prov-Gigapath (rank: 3.09), with the highest mean AUC of 0.946 (+0.2%, P < 0.001). Beyond these core tasks, GPFM demonstrated strong performance in additional areas such as pathology QA, image retrieval, and pathology report generation. These results showcase GPFM’s versatility and robustness, highlighting its potential for broad clinical application. For a detailed breakdown of these tasks, refer to the published study.

Translational Potential

Building on the capabilities of GPFM, the team developed SmartPath, a next-generation diagnostic tool designed for intraoperative workflows. SmartPath is tailored to support diagnosis in five high-incidence cancers, including lung, breast, and gastrointestinal cancers.

Currently under active deployment, SmartPath aims to accelerate the adoption of digital pathology by improving diagnostic accuracy and streamlining clinical workflows.

Your browser does not support the video tag.

Resources

For more details, please see our paper A generalizable pathology foundation model using a unified knowledge distillation pretraining framework via Nature Biomedical Engineering.

Citation:
Ma, J., Guo, Z., Zhou, F. et al. A generalizable pathology foundation model using a unified knowledge distillation pretraining framework. Nat. Biomed. Eng (2025). https://doi.org/10.1038/s41551-025-01488-4

[IJCAI 2025] A Survey of Pathology Foundation Models: Recent Advances and Future Directions

2025-08-09T00:00:00+08:00

A collaborative work by The Chinese University of Hong Kong, SmartX Lab, and Nanyang Technological University titled “A Survey of Pathology Foundation Models: Recent Advances and Future Directions” has been accepted at IJCAI 2025, a premier conference in artificial intelligence. This survey offers a comprehensive and systematic overview of Pathology Foundation Models (PFMs), a transformative direction in computational pathology.

This work delivers the first hierarchical classification framework for pathology foundation models (PFMs), establishes a systematic evaluation benchmark, and delineates critical technical challenges and future research priorities for advancing the field.

Background

Computational pathology (CPath) enables AI-powered analysis of whole slide images (WSIs) for disease diagnosis and prognosis. WSIs are gigapixel pathology images, and the dominant analysis framework is Multiple Instance Learning (MIL). In MIL, a WSI is divided into smaller image patches, from which a feature extractor produces embeddings that are subsequently aggregated by an aggregator to yield slide-level predictions.

Historically, models like ImageNet-pretrained ResNet-50 served as extractors, but these suffer from domain mismatch, failing to fully capture pathology-specific features such as subtle staining patterns or hierarchical tissue structures. Pathology Foundation Models (PFMs)—large-scale pathology-pretrained networks, often using self-supervised learning (SSL)—address this gap, enabling robust morphological representation and better performance on downstream tasks.

Yet, despite promising results, PFMs still face unique challenges in development, scalability, and deployment.

Hierarchical Taxonomy of PFMs

Our proposed taxonomy systematically organizes PFMs along three key dimensions using a top-down analytical framework:
(1) Model scope, which categorizes PFMs by their functional emphasis—focusing primarily on feature extraction, feature aggregation, or joint optimization of both;
(2) Model pretraining, which dissects image-centric pretraining methodologies at the slide, patch, and multimodal levels;
(3) Model architecture, which classifies PFMs by parameter scale and structural complexity.
This framework enables comprehensive, consistent comparison across PFMs.

Model Scope

The standard MIL workflow for WSIs involves three stages: patch extraction, feature extraction, and feature aggregation. Patch extraction is already mature; as a result, a model’s overall performance depends critically on the quality of its extractor and aggregator. WSIs themselves have hierarchical tissue organization—the extractor captures local morphological details, while the aggregator models global structural patterns. The synergy between these two components largely determines diagnostic accuracy. Based on functional emphasis, PFMs can be divided into three categories:

Extractor-Oriented PFMs form the current research mainstream. Their popularity is driven by two factors:
(1) The critical role of high-quality feature extraction;
(2) The urgent need to address domain adaptation challenges when transferring ImageNet-pretrained CNNs to pathology.
This design philosophy mirrors clinical workflows, where pathologists rely on fine-grained cellular features at the patch level to make diagnoses. For instance, CTransPath pioneered a semantic-based contrastive learning method to train a CNN–Transformer hybrid extractor on 15 million patches. REMEDIS demonstrated cross-domain limitations of ResNet-50 across medical imaging modalities, reinforcing the necessity for pathology-specific extractors—a point further validated by later works such as Virchow and SINAI.

Aggregator-Oriented PFMs focus on the slide-level representation. Since the aggregator is the only MIL component trained with direct supervision from true WSI labels, it plays a pivotal role in slide-level tasks. Yet, research in this direction remains limited. CHIEF was the first to show the value of aggregator pretraining, using anatomical-site supervision to create site-aware aggregation. More recent works like MADELEINE, TITAN, and THREAD employed multimodal data to pretrain aggregators while keeping patch features frozen—improving performance in data-constrained scenarios. This reflects an increasing recognition of the aggregator’s importance, consistent with transfer learning principles: pretraining on large datasets alleviates downstream data scarcity. However, results from CHIEF showed that its pretrained aggregator could sometimes underperform simple linear probes on extractors. Possible reasons include small pretraining model sizes or domain bias conflicting with generalized features—indicating the need for further study on large-scale aggregator pretraining.

Hybrid Optimization PFMs pretrain both extractor and aggregator, aiming to maximize synergy between these components. HIPT pioneered this by hierarchically pretraining the first two layers of the extractor (excluding the final layer), yielding significant performance gains. Prov-GigaPath followed a similar strategy, pretraining a ViT extractor alongside a LongNet slide encoder; however, LongNet produced instance-level features rather than a single slide-level embedding, requiring additional pooling (e.g., ABMIL or non-parametric methods) for classification. TANGLE pretrained a ViT extractor and an ABMIL aggregator guided by transcriptomic data. mSTAR introduced a reverse pipeline: pretraining a multimodal aggregator first, then using it to pretrain the extractor, achieving a fully pretrained hybrid design.

From recent trends, two key insights emerge:
First, research focus is shifting from extractor pretraining toward aggregator pretraining—a likely consequence of extractor performance plateauing and the growing appreciation of aggregator importance, especially in limited-data settings.
Second, aggregators are increasingly hierarchically dependent: later models often reuse earlier ones’ capabilities. For example, TITAN builds on features from CONCHv1.5, which itself is based on UNI—forming a dependency chain where TITAN’s performance rests on CONCHv1.5, which in turn depends on UNI.

Model Pretraining

PFM pretraining methods fall into supervised and self-supervised learning (SSL). SSL dominates due to its ability to extract rich morphological representations without manual labels; in fact, CHIEF is the only surveyed PFM to use supervised aggregator pretraining. SSL approaches can be grouped into two directions:
Pure vision methods (contrastive learning, masked image modeling, self-distillation) and cross-modal methods (multimodal alignment).

Contrastive Learning aims to minimize the distance between positive sample pairs and maximize separation between negative pairs. Key developments include:

SimCLR: established strong data augmentation and large-batch training regimes;
MoCo v3: stabilized ViT performance in SSL;
CLIP: extended contrastive alignment to image–text pairs;
CoCa: unified contrastive pretraining with text generation.

In pathology, REMEDIS used SimCLR to enhance robustness and data efficiency; TANGLE augmented SimCLR with gene expression reconstruction and intra-slide patch alignment; Pathoduet extended MoCo v3 with cross-scale localization and stain-transfer tasks to tackle stain variability and tissue heterogeneity. CLIP’s multimodal alignment has been widely deployed—not only in extractors (PLIP) but also in aggregators (Prov-GigaPath, mSTAR, MADELEINE, THREAD). KEEP improved CLIP-based extractors by integrating curated knowledge graphs to clean image–text pairs. CoCa-based frameworks have been adopted for both extractors (CONCH, trained on 1.17M image–caption pairs) and aggregators (PRISM, TITAN).

Masked Image Modeling (MIM) predicts masked image regions to learn context-aware features. While SimMIM simplified design with random masking and lightweight decoders, MAE employed heavy masking with asymmetric encoder–decoder architectures. Pathology models like SINAI (pretrained a ViT on 3.2B patches) have demonstrated MIM’s scalability; follow-ups like MUSK (BEiT-3) and BEPH (BEiTv2) confirmed its utility. Prov-GigaPath further showed that MIM benefits aggregator pretraining—applying MAE to its LongNet encoder.

Self-Distillation enables models to learn from their own predictions, often combining “teacher” and “student” networks. DINO introduced the momentum encoder with multi-crop training; iBOT integrated patch-level masking into self-distillation; DINOv2 improved scale stability. In pathology:

Phikon applied iBOT to 43M patches from 16 tumor types;
Phikon-v2 used DINOv2 across 456M patches;
RudolfV merged DINOv2 with pathologist knowledge across 58 tissue types;
Hibou scaled DINOv2 to 1.2B patches.
Self-distillation has also been used for aggregators—TITAN applied iBOT for generic aggregator learning. Domain-specific refinements include PLUTO (combining DINOv2, MAE, and Fourier loss on 195M patches), GPFM (a unified knowledge distillation framework mixing MIM, self-distillation, and expert knowledge), and Virchow2 (pathology-specific augmentations and redundancy reduction for DINOv2).

Model Architecture

PFM architecture design hinges on backbone choice, parameter count, and scale. Scale is primarily determined by parameter count; the authors introduce a ViT-based size taxonomy to standardize scale comparisons including the following sizes: XS (2.78M params), S (21.7M), B (86.3M), L (307M), H (632M), g (1.13B), and G (1.9B).

Extractor architectures include CNNs (ResNet) and Transformers (ViT, Swin, BEiT/BEiTv2, FlexiViT, multimodal BEiT-3). Aggregator architectures are dominated by ABMIL variants and Perceiver models.

Analysis reveals notable patterns:

ABMIL is the most widely used aggregator, while extractors are predominantly Transformer-based.
ViT-L is the most common backbone, with smaller variants (ViT-S, ViT-B) used for efficiency.
Extractors often have significantly more parameters than aggregators (e.g., ViT-L extractors vs. ViT-XS aggregators), leading to data–model scale mismatches.
Model sizes are steadily increasing—from ViT-B in early PFMs to ViT-L/H/g/G in recent designs—highlighting an ongoing scale-up trend despite computational costs.

Evaluation Framework

The survey structures PFM evaluation tasks into four broad categories:

Slide-Level: WSI classification, survival prediction, retrieval, segmentation.
Patch-Level: Patch classification, retrieval, segmentation—testing extractors directly.
Multimodal: Cross-modal retrieval, report generation, visual question answering.
Biological: Gene mutation detection, molecular subtype classification.

Leading models such as CONCH, UNI, GPFM, TITAN provide comprehensive multi-setting benchmarks, enabling evaluation across zero-shot, few-shot, and fully supervised scenarios.

Key Challenges & Future Directions

While PFMs have made substantial strides, several open problems must be addressed to unlock their full research and clinical impact. These challenges span from domain-specific learning to deployment infrastructure.

1. Pathology-Specific Pretraining
Current PFMs largely rely on techniques designed for natural image analysis, insufficiently tailored to the unique textures, staining variations, and hierarchical tissue organization in pathology. There is a pressing need for domain-specialized algorithms, designed for both single-modality (WSI) and multi-modality (WSI + reports + omics) contexts, to capture nuanced visual–biological correlations.

2. End-to-End WSI Learning
The two-stage extractor–aggregator paradigm often leads to misalignment between local patch and global WSI representations. Future systems must achieve fully integrated gigapixel-scale training, simultaneously optimizing both components while efficiently handling the massive computational and memory requirements.

3. Data–Model Scalability
Scaling trends for PFMs—considering dataset size, patch count, model capacity, and data diversity—are not yet fully understood. Research must determine optimal scaling strategies, develop adaptive architectures, and improve curation pipelines to ensure both quality and diversity of training data.

4. Efficient Federated Learning
Training with multi-institutional datasets without data sharing is essential but computationally challenging, especially in non-IID settings. Innovations in algorithm design are needed to reduce communication overhead, improve training stability, and preserve patient privacy, enabling large-scale collaborative learning at lower cost.

5. Robustness Across Institutions
Performance degradation due to institutional differences—scanner types, staining protocols, imaging resolutions—remains a critical obstacle. Approaches to domain generalization, adaptive normalization, and style-invariant modeling will be necessary to ensure consistent diagnostic performance across diverse clinical environments.

6. Retrieval-Augmented Pathology Language Models
Integrating retrieval-augmented generation (RAG) with PFMs could enrich decision support by combining visual analysis with retrieved pathology-specific knowledge bases, such as atlases or literature. This multimodal integration could yield explainable, context-aware AI tools for clinical pathology.

7. Model Adaptation & Maintenance
Given the high cost of full retraining, PFMs must be maintainable and updatable as new diseases emerge, staining protocols evolve, or institutional practices change. Lightweight continual learning and parameter-efficient tuning strategies will be key to maintaining clinical relevance.

Conclusion

This IJCAI 2025 survey delivers a structured taxonomy, a comprehensive evaluation framework, and a forward-looking research roadmap for pathology foundation models. By addressing the outlined challenges, the authors envision PFMs progressing toward greater generalization, clinical reliability, and theoretical advancement.

Here are the links to the paper and the GitHub repository: Read the full paper on arXiv
Explore resources on GitHub

[Press Release] SmartLab at HKUST Launches SmartCare AI Medical Platform to Transform Patient-Centered Healthcare

2025-07-29T00:00:00+08:00

SmartX Lab at the Hong Kong University of Science and Technology (HKUST), led by Assistant Professor Hao Chen, has officially launched the SmartCare AI Medical Platform at the HKUST campus clinic. This milestone marks a new era in patient-centered healthcare, harnessing the power of artificial intelligence to transform the entire patient journey—from triage to treatment—while streamlining clinical workflows for healthcare professionals.

Pilot Program: Real-World Impact at HKUST Clinic

A six-month pilot study of SmartCare is currently underway at the HKUST Clinic, inviting over 15,000 students, faculty, and staff to participate and provide feedback. The pilot epitomizes SmartX Lab’s commitment to translating cutting-edge AI research into practical solutions that directly benefit both patients and clinicians.

Core Features and Innovation

SmartCare is built on HKUST’s MedDr—one of the world’s largest open-source, generalist multimodal foundation models for medicine. The platform is designed to:

Streamline clinical workflows
Enhance patient-provider interactions
Support next-generation medical education

Key features include:

Intelligent Pre-visit Triage: Patients scan a QR code to submit medical histories and symptoms via text or voice, even before entering the clinic. The AI-powered triage system helps nurses optimize resource allocation and patient flow, significantly reducing waiting times.
AI-Powered Consultation Assistant: The platform supports real-time, multilingual transcription (Cantonese, Mandarin, English), enabling doctors to focus on patients instead of manual note-taking, and fostering personalized care.
Automated Medical Documentation: SmartCare auto-generates over 30 types of medical documents—including referral letters, prescriptions, and medical certificates—and provides AI-powered post-visit follow-up, reducing administrative burden for clinicians.
Virtual Patient Module: This patent-pending feature enables simulated consultations with instant AI feedback, paving the way for innovative medical education and training.

Built on World-Class Team Research: MedDr2 and GSCo

SmartCare is powered by MedDr2—the world’s largest open-source, generalist multimodal foundation model for medicine—developed by the SmartX Lab team based on the innovative GSCo framework. This foundational work was led by Sunan He and Yuxiang Nie, supervised by Prof. Hao Chen, with a team of outstanding collaborators, and sets a new paradigm for trustworthy AI in healthcare.

MedDr2 is built upon the GSCo framework, which enables seamless integration of both generalist foundation models (GFMs) and specialist models. This collaborative approach allows MedDr2 to deliver robust, precise, and scalable medical image analysis, demonstrating exceptional instruction-following, in-context learning, and generalization across diverse medical domains. GSCo introduces advanced collaborative mechanisms—including Mixture-of-Expert Diagnosis (MoED) and Retrieval-Augmented Diagnosis (RAD)—empowering MedDr2 to incorporate specialist knowledge for accurate diagnosis, even in out-of-domain or complex scenarios. This team-driven innovation is the scientific foundation of SmartCare’s advanced capabilities in clinical workflow optimization, intelligent triage, and automated documentation, ensuring the platform is both cutting-edge and clinically impactful.

Leadership Voices

Prof. Hao Chen, Director of SmartX Lab and Assistant Professor at HKUST, remarked:

“SmartCare is the culmination of SmartX Lab’s mission to develop trustworthy, real-world AI for healthcare. Leveraging our MedDr foundation model, SmartCare excels at complex medical queries, medical image interpretation, and seamless clinical integration. Following our campus pilot, we are exploring collaborations with Gleneagles Hospital Hong Kong and CUHK Medical Centre for further real-world validation and broader impact. HKUST’s AI innovation ecosystem empowers SmartCare to redefine clinical efficiency and patient outcomes.”

Dr. Justin Cheng, SmartCare’s CEO, Co-Founder, and Registered Medical Practitioner at HKUST Clinic, added:

“SmartCare fundamentally shifts the clinical paradigm. By automating administrative tasks and documentation, SmartCare allows doctors to focus on the human connection at the heart of medicine, delivering truly personalized and high-quality care. As both a clinician and researcher, I am excited by the transformative impact SmartCare brings to daily practice.”

Prof. Samuel Yu Chung-Toi, Director of the Health, Safety, and Environment Office at HKUST, commented:

“The SmartCare pilot at our campus clinic exemplifies HKUST’s commitment to impactful, cross-disciplinary innovation. By providing SmartX Lab with resources and an environment for real-world testing, we ensure our solutions address the needs of both healthcare providers and patients.”

Foundation and Partnership

SmartCare’s development is rooted in SmartX Lab’s research excellence and HKUST’s vibrant AI innovation ecosystem. In May 2025, SmartX Lab facilitated a tripartite partnership to integrate SmartCare’s platform with PanopticAI’s camera-based vital sign monitoring—another HKUST-nurtured innovation—at a new ambulatory care center in Admiralty, further enhancing operational efficiency and service quality.

Leadership and Research Excellence

The SmartCare project is led by Prof. Hao Chen (Director, SmartX Lab; Assistant Professor, Computer Science & Engineering, HKUST), Dr. Justin Cheng (CEO & Co-Founder, SmartCare; Registered Medical Practitioner, HKUST Clinic), and Prof. Samuel Yu Chung-Toi (Director, Health, Safety, and Environment Office, HKUST). The team is dedicated to advancing AI medical foundation models and next-generation clinical technologies, translating scientific breakthroughs into tangible healthcare benefits.

About SmartX Lab

About The Hong Kong University of Science and Technology

The Hong Kong University of Science and Technology (HKUST) is a world-class university driving innovative education, research excellence, and impactful knowledge transfer. With a holistic and interdisciplinary approach, HKUST was ranked 3rd in the Times Higher Education’s Young University Rankings 2024 and 19th worldwide in the THE Impact Rankings 2025. Thirteen HKUST subjects were ranked among the world’s top 50 in the QS World University Rankings by Subject 2025, with “Data Science and Artificial Intelligence” ranked 17th globally and first in Hong Kong. Over 80% of our research was rated “internationally excellent” or “world leading” in Hong Kong’s latest Research Assessment Exercise. As of May 2025, HKUST members have founded over 1,800 active start-ups, including 10 Unicorns and 17 exits (IPO or M&A).

Media Links

RTHK: Link (Traditional Chinese only)

Hong Kong Economic Journal: Link (Traditional Chinese only)

HKET Link (Traditional Chinese only)

[Press Release] HKUST and Wuhan Union Hospital Jointly Establish the Medical-Engineering Joint Innovation Center

2025-07-27T00:00:00+08:00

The Hong Kong University of Science and Technology (HKUST) and Wuhan Union Hospital, Tongji Medical College of Huazhong University of Science and Technology have officially inaugurated the Medical-Engineering Joint Innovation Center at HKUST. This milestone marks a new era of interdisciplinary collaboration, combining the strengths of engineering, artificial intelligence (AI), and clinical medicine to accelerate medical technology innovation for global healthcare advancement.

Launch Ceremony and Strategic Collaboration

The unveiling ceremony was attended by Prof. Zhang Yu, Party Secretary of Wuhan Union Hospital, Prof. Wang Hongbo, Deputy Secretary, and their delegation, alongside HKUST representatives including Prof. Kwang-Ting Cheng, Vice-President for Research and Development, and Prof. Chung Chi Shing, Associate Dean of Engineering. Distinguished guests toured the HKUST Neumann Institute and engaged in in-depth discussions with HKUST scholars regarding the center’s development plans, joint innovation funding, and collaborative training models for medical and graduate students.

Core Focus and Innovation Goals

According to the collaboration agreement, the Joint Innovation Center will focus on interdisciplinary medical-engineering research, prioritizing AI-driven medical imaging, ultrasound, radiology, and pathology. Through academic exchanges, faculty training, joint research projects, and translational initiatives, the center aims to accelerate the development of large multimodal medical AI models, intelligent diagnosis systems, precision medicine, and smart medical devices.

Prof. Hao Chen, Co-Director of the Center, led guests on a visit to the HKUST Neumann Institute, showcasing the university’s cutting-edge research infrastructure.

Leadership Voices

Prof. Kwang-Ting Cheng, HKUST Vice-President for Research and Development, remarked:

“The rapid advancement of artificial intelligence is a key driver of medical innovation. The deep integration of medicine, engineering, and data science opens new frontiers in disease prevention, diagnosis, monitoring, and treatment. HKUST’s strengths in data fusion, algorithm innovation, and high-performance computing, combined with Wuhan Union Hospital’s extensive clinical resources, will catalyze breakthroughs in AI-powered healthcare. This collaboration highlights the vast potential of cross-disciplinary and cross-regional partnerships, accelerating the translation of intelligent diagnostics and precision medicine to real-world clinical impact.”

Prof. Zhang Yu, Party Secretary of Wuhan Union Hospital, commented:

“The accelerating technological revolution and industrial transformation make medical-engineering integration one of the most promising directions for innovation. As a leading national medical institution, Wuhan Union Hospital offers rich clinical experience and comprehensive resources. This strategic partnership leverages complementary strengths and shared resources, advancing medical science and technology. We look forward to deepening collaboration, harnessing AI for medical progress, and contributing to the Healthy China initiative.”

Foundation of the Partnership

Wuhan Union Hospital is one of China’s premier medical institutions, serving 6.9 million outpatients, 420,000 inpatients, and performing 156,000 surgeries annually—providing vital clinical support for large-scale, multimodal medical AI model development.

The collaboration between HKUST and Wuhan Union Hospital began in 2023, with the first official visit by Prof. Zhang Yu. In 2024, both parties signed a formal agreement and established the first base of the Medical-Engineering Joint Innovation Center at Wuhan Union Hospital. Since then, the partnership has achieved breakthrough progress in AI and medical imaging, including the development of the world’s largest ultrasound AI model trained on data from nearly one million patients, achieving diagnostic accuracy for thyroid nodules on par with experienced clinicians.

Leadership and Research Excellence

The Center is jointly led by Prof. Hao Chen (Assistant Professor, Computer Science & Engineering, Chemical & Biological Engineering, and Life Science, HKUST) and Prof. Zheng Chuansheng (Director of Radiology, Wuhan Union Hospital), with Prof. Huang Ziwei (Associate Professor, Chemical & Biological Engineering, HKUST) and Prof. Xie Mingxing (Director of Ultrasound, Wuhan Union Hospital) as Deputy Directors.

Prof. Chen and Prof. Huang’s teams have recently developed four novel AI medical foundation models and an innovative cellular imaging technology capable of accurately detecting residual cancer cells, driving medical innovation and translating scientific breakthroughs into clinical benefit.

About SmartX Lab

SmartX Lab, led by Prof. Hao Chen, is committed to pushing the boundaries of trustworthy AI technologies for healthcare and science. Our research directions include large-scale models for healthcare, computer-assisted intervention, AI for science, and bioinformatics, etc. Our ultimate goal is to spearhead a transformative revolution in medical practices and scientific discoveries, paving the way for a brighter and healthier future.

About The Hong Kong University of Science and Technology

The Hong Kong University of Science and Technology (HKUST) is a world-class university that excels in driving innovative education, research excellence, and impactful knowledge transfer. With a holistic and interdisciplinary pedagogy approach, HKUST was ranked 3rd in the Times Higher Education’s Young University Rankings 2024, while 12 of its subjects were ranked among the world’s top 50 in the QS World University Rankings by Subject 2024, with “Data Science and Artificial Intelligence” being ranked first in Hong Kong and 10th in the world. Our graduates are highly competitive, consistently ranking among the world’s top 30 most sought-after employees. In terms of research and entrepreneurship, over 80% of our work was rated “internationally excellent” or “world leading” in the latest Research Assessment Exercise 2020 of Hong Kong’s University Grants Committee. As of October 2024, HKUST members have founded over 1,800 active start-ups, including 10 Unicorns and 14 exits (IPO or M&A).

[Press Release] The 2025 International Workshop on Large AI for Biomedical Imaging (LAI4BM 2025) Successfully Held at HKUST

2025-07-14T00:00:00+08:00

The International Workshop on Large AI Models for Biomedicine (LAI4BM 2025) successfully concluded on July 12, 2025, at the HKUST Jockey Club Institute for Advanced Study. This groundbreaking workshop brought together leading researchers, practitioners, and innovators from around the world to explore the latest advancements in Large AI Models for Biomedical Applications.

Workshop Overview

The LAI4BM 2025 Workshop focused on cutting-edge AI technologies, machine learning techniques, and their transformative applications in healthcare, medical research, and biomedical data analysis. Through keynote presentations, technical sessions, and interactive panel discussions, attendees gained valuable insights into the latest developments in AI-driven biomedical research.

The event successfully fostered collaboration between AI researchers and biomedical professionals, accelerating innovation in healthcare technology and medical AI applications. The workshop featured distinguished speakers from prestigious institutions including Harvard University, Stanford University, Imperial College London, and leading healthcare organizations.

Distinguished Speakers and Their Contributions

The workshop featured an impressive lineup of international experts who shared their cutting-edge research and insights:

Morning Session Highlights

Prof. Pranav Rajpurkar from Harvard University delivered a compelling keynote on “Beyond Assistance: Rethinking AI-Human Integration in Radiology,” exploring how AI can transform medical imaging beyond simple assistance tools.

Prof. Ruijiang Li from Stanford University presented “Multi-modal Foundation AI for Precision Oncology,” discussing the integration of multiple data modalities for personalized cancer treatment.

Mr. Dennis Lee from Hong Kong Hospital Authority shared practical insights on “Reimagining Healthcare with AI: A Real-World Application of Clinical Management Systems in Hong Kong Hospital Authority,” providing valuable real-world implementation perspectives.

Prof. Dong Liang from SIAT, Chinese Academy of Sciences, presented “Magnetic Resonance Live imaging: from Photograph to Videography,” showcasing advances in real-time medical imaging.

Prof. Kyongtae Tyler Bae from The University of Hong Kong discussed “Clinical Implementation of AI in Radiology,” addressing practical challenges and solutions in clinical deployment.

Afternoon Session Highlights

Prof. Xiaomin Ouyang from HKUST presented “Building Foundation Model-Powered Multimodal Sensing Systems for Daily Healthcare,” exploring AI applications in everyday health monitoring.

Prof. Cheong Kin Ronald Chan from CUHK & Hong Kong Hospital Authority delivered an insightful talk on “What Doctors Think, Want and Fear About the Large AI Model?” providing crucial clinical perspectives.

Prof. Zheng Li from CUHK showcased “Intelligent Surgical Assistive Robots,” demonstrating the future of AI-powered surgical assistance.

Prof. Xin Wang from CUHK presented “Deep Learning Models for Cancer Subtype Classification to Advance Precision Oncology,” highlighting AI’s role in personalized cancer treatment.

Dr. Yang Cheng from AstraZeneca shared industry insights on “Charting the AI Pathway: Opportunities and Challenges in Pharma Development.”

Prof. Chen Qin from Imperial College London concluded with “Artificial Intelligence Meets Medical Imaging: From Signals to Interpretation,” bridging technical innovation with clinical application.

Panel Discussions: Addressing Critical Questions

The workshop featured two thought-provoking panel discussions moderated by Prof. Hao Chen:

“Are We There for Deploying Large AI Models in Biomedical Applications?” - This morning panel explored the current readiness and challenges in implementing large AI models in clinical settings.
“How Will Large AI Models Reshape the Future of Biomedicine?” - The afternoon panel discussed future directions and transformative potential of AI in healthcare.

These discussions provided valuable insights into both the opportunities and challenges facing the field, fostering meaningful dialogue between researchers, clinicians, and industry professionals.

Organizing Excellence

The workshop was expertly organized by a distinguished committee of HKUST faculty members:

Prof. Hao Chen - Assistant Professor of CSE, CBE and LIFS, HKUST
Prof. Jiguang Wang - Padma Harilela Associate Professor of CBE and LIFS, HKUST
Prof. Kai Liu - Cheng Professor of Life Science, HKUST
Prof. Yingcong Chen - Assistant Professor at AI Thrust, Information Hub of HKUST (GZ)
Prof. Can Yang - Dr Tai-chin Lo Associate Professor of Mathematics, HKUST
Prof. Xiaofang Zhou - Otto Poon Professor of Engineering & Chair Professor of CSE, HKUST

Supporting Organizations

The workshop was supported by several key organizations:

HKUST Collaborative Center for Medical and Engineering Innovation
State Key Laboratory of Nervous System Disorders
Center for Medical Imaging and Analysis
Department of Computer Science and Engineering, HKUST
Department of Chemical and Biological Engineering, HKUST
The Hong Kong University of Science and Technology (HKUST)

Workshop Impact and Success

Workshop participants and speakers gathered for group photo

The workshop successfully achieved its objectives of:

Bringing together leading experts from academia, industry, and healthcare
Facilitating knowledge exchange on the latest AI developments in biomedicine
Fostering collaborations between AI researchers and biomedical professionals
Addressing practical challenges in deploying AI solutions in healthcare settings
Exploring future directions for AI in biomedical applications

Looking Forward

The LAI4BM 2025 Workshop has set a strong foundation for continued collaboration and innovation in the field of AI for biomedicine. The insights shared, connections made, and discussions held during this event will undoubtedly contribute to advancing the field and improving healthcare outcomes through AI technology.

The organizers extend their heartfelt gratitude to all speakers, attendees, and sponsors who made this workshop a tremendous success. The enthusiasm and engagement demonstrated by all participants highlight the bright future ahead for AI applications in biomedicine.

The LAI4BM 2025 Workshop was organized by SmartX Lab at HKUST and supported by various departments and centers at The Hong Kong University of Science and Technology.

About SmartX Lab

About The Hong Kong University of Science and Technology

[IEEE-TMI] Reg2RG: Large Language Model with Region-Guided Referring and Grounding for CT Report Generation

2025-05-21T00:00:00+08:00

A recent study from SmartX Lab, led by Zhi-Xuan Chen, has been accepted by IEEE Transactions on Medical Imaging (Impact Factor: 8.9). The work presents Reg2RG, a novel region-guided large language model (LLM) framework for chest CT report generation. By explicitly linking image regions with diagnostic text, Reg2RG significantly improves the accuracy, interpretability, and clinical value of automated CT reporting.

Introduction

In medical image analysis, automated report generation aims to assist clinicians in interpreting and articulating the visual content of scans. The challenge lies in comprehending complex anatomical structures and expressing findings accurately in natural language. Chest CT report generation is a representative task that requires precise lesion localization, feature interpretation, and the synthesis of clinically relevant textual descriptions. However, most existing methods rely on end-to-end full-image modeling, where the entire image is encoded to train language models. Although these methods achieve initial automation, they commonly exhibit poor localization, diagnostic bias, and limited clinical interpretability.

To address this, we focus on a central question: Can a clear correspondence be established between specific visual regions in medical images and their respective diagnostic paragraphs in textual reports? Inspired by cross-modal tasks such as image-text matching, referring expression, and visual grounding, Reg2RG introduces a core concept: extracting discriminative local image features via region masks and embedding them as tokens into large language models. This allows the model to generate natural language reports that not only follow clinical conventions but also explicitly refer to the relevant anatomical regions in the CT scan, thereby improving the traceability and interpretability of diagnostic conclusions.

Moreover, chest CT scans are high-dimensional, with lesions that may be subtle and dispersed across multiple anatomical regions. Thus, simultaneously modeling both local anomalies and global anatomical context is essential for generating clinically valuable reports. To address this challenge, Reg2RG adopts a dual-branch encoding architecture that processes local region features and global semantics independently, then integrates them during the language generation phase. This design enhances both localized diagnostic precision and global contextual consistency, enabling the language model to produce well-structured, accurate, and region-aware diagnostic paragraphs. The overall framework of Reg2RG is illustrated in Figure 1.

Figure 1: Overview of the Reg2RG framework. The model extracts local region features and global context, then generates region-referred diagnostic paragraphs using a large language model.

Medical Image Report Generation

Medical image report generation is a critical task in multimodal understanding, aiming to automatically generate clinically meaningful descriptions from complex medical images. Early approaches typically adopted encoder-decoder frameworks, using convolutional neural networks (CNNs) to extract global image features, followed by recurrent neural networks (RNNs) or Transformer-based models to produce diagnostic text. These methods focused on capturing overall image semantics but often overlooked fine-grained representation and localization of specific lesions, leading to reports that lacked interpretability and diagnostic specificity. To improve spatial awareness, some studies introduced attention mechanisms to allow models to focus on key regions. However, these approaches still relied on global features, which made it difficult to explicitly align textual fragments with specific image regions. In recent years, the advent of multimodal large language models (LLMs) has significantly improved report generation quality, but region-level semantic description and grounding consistency remain substantial challenges.

Region-Level Referring and Grounding

In general vision-language tasks, various studies have explored integrating LLMs with region-level features to enhance fine-grained perception. However, applications in the medical imaging domain remain scarce. Existing methods often fail to capture semantic relationships between different regions or rely solely on coarse positional information when referring to image areas. This results in imprecise descriptions that do not meet the high standards of detail and accuracy required for clinical diagnosis. To address this, we propose a new region-level understanding framework tailored for report generation. Through a decoupled representation strategy, we preserve complete geometric information while enhancing texture details, satisfying the clinical demands for sensitivity to lesion location and morphology. Additionally, we incorporate global context to model inter-regional relationships, improving overall image understanding. A region-text alignment mechanism is also introduced to explicitly link visual regions with report content, enabling efficient and coherent multi-region expression in a single inference step—thus advancing the field of region-level visual perception in medical imaging.

Method

Overall Framework Overview

Traditional CT report generation methods primarily rely on global features of the entire scan, which often fail to capture subtle abnormalities located in specific anatomical regions, thus limiting the diagnostic accuracy and value of the generated reports. To overcome this limitation, our approach introduces multiple local regional features extracted using a generic segmentation module and combines them with global contextual information to guide report generation. Each local feature corresponds to a specific anatomical region, providing more targeted, fine-grained semantic information. After fusing local and global features, these representations are fed into a large language model for inference. This allows the model to accurately identify and describe multiple anatomical areas, resulting in diagnostic reports that are more comprehensive and clinically interpretable.

Decoupled Local Feature Representation Strategy

To efficiently preserve high-resolution details of local regions, we propose a decoupled representation strategy that separates local features into texture and geometric components, which are processed independently and then merged into a unified regional representation. For texture feature extraction, we first apply region masks to isolate target areas from the original CT images. To reduce redundancy and retain high-quality detail, we crop the extracted region images into smaller, focused patches. These are then encoded using an image encoder and compressed through a feature projection module to produce visual embeddings compatible with the language model. For geometric features, we retain the uncropped region masks to preserve spatial position and scale within the original image. These masks are processed through a lightweight encoding network and mapped into representations that can also be understood by the language model. Finally, the texture and geometric features are concatenated to form a complete local representation, which is subsequently used for region-level language generation tasks.

Global-Local Feature Collaboration Mechanism

Different anatomical regions in medical images often exhibit strong structural and semantic interdependencies. Relying solely on local information is insufficient for comprehensive understanding. To address this, we introduce a global-local feature collaboration mechanism that jointly models both levels of information to enhance contextual consistency and narrative coherence in the generated reports. Specifically, the model receives both independent local regional features and global visual features extracted from the entire CT volume. These features are processed by a unified projection module and embedded into the input prompt of the language model.

We design a structured prompt in which global and regional features are represented by special tokens that are subsequently replaced with their corresponding embeddings. The language model uses these embeddings during inference, allowing it to reason about relationships between different anatomical regions and produce reports that are consistent and explainable.

Region-Report Alignment Training Strategy

To enhance the model’s ability to refer accurately to specific regions during report generation, we introduce a region-report alignment training strategy. This approach requires the model to identify the anatomical region associated with each paragraph before generating the corresponding diagnostic content.
During training, we prepend a prefix to each ground-truth regional report indicating the region it describes. For instance, the model is prompted with text such as “Region i corresponds to the heart,” thereby guiding it to perform anatomical identification before generating the report. To prevent the model from memorizing the input order, we randomly shuffle the sequence of local features in each iteration. This forces the model to recognize region identity based solely on the embedded features rather than positional bias.
The regional reports are still trained using standard language modeling loss, and the generation remains in free-form textual format. This strategy significantly improves the model’s regional awareness, allowing each paragraph of the generated report to not only be accurate in content but also traceable to a specific anatomical location, thus enhancing interpretability and clinical reliability.

Experimental Results

Datasets & Metrics

Reg2RG was evaluated on two large-scale chest CT datasets:

RadGenome-ChestCT: 25,000+ region-annotated CTs with structured reports.
CTRG-Chest-548K: 1,800 CTs with unstructured reports, auto-segmented for region-level evaluation.

Performance was assessed using:

NLG metrics: BLEU, METEOR, ROUGE-L (textual similarity)
Clinical efficacy metrics: Precision, recall, F1 (diagnostic accuracy, via RadBERT extraction)
Region recognition accuracy: Correctly linking text to anatomical regions

Experimental Setup and Baselines

We use the SAT segmentation model to process CT volumes and resample all images to a spatial resolution of 256×256×64. Local texture features are extracted using a pretrained ViT3D encoder and adapted via a Perceiver-based dimensionality reduction module. The same ViT3D backbone is used for global feature extraction. Geometric features are encoded by a lightweight three-layer ViT3D model and mapped into the language model’s input space through fully connected layers.
For language decoding, we adopt the LLaMA2-7B model with the LoRA fine-tuning strategy for efficiency. The model is optimized using AdamW with a fixed learning rate and a warm-up schedule at the beginning. Training is conducted for 6 epochs on the RadGenome dataset and 10 epochs on the CTRG dataset using two NVIDIA 3090 GPUs with a batch size of 16. To reduce memory usage, we apply the ZeRO optimization strategy along with gradient checkpointing.
We compare Reg2RG against several state-of-the-art 3D report generation baselines including CT2Rep, RadFM, and M3D, as well as two 2D-based models: R2GenGPT and MedVInT. All models use LLaMA2-7B as the language decoder and receive the same preprocessed input to ensure fairness. For 2D models, we convert 3D volumes into multi-channel 2D representations.

Quantitative Results

As shown in Table 1, Reg2RG outperforms all competing methods on NLG metrics across both datasets. On the RadGenome-ChestCT dataset, Reg2RG surpasses the next-best model, MedVInT, on all language metrics. In particular, it achieves over 9% relative improvement in METEOR, indicating superior vocabulary diversity and semantic coherence. On the CTRG-Chest-548K dataset, Reg2RG also leads in BLEU and METEOR, though its ROUGE-L score is slightly lower, primarily due to the more fragmented structure of region-level reports, which affects the longest common subsequence matching.

Table 1: Reg2RG achieves state-of-the-art results on both language and clinical efficacy metrics compared to strong baselines.

As shown in Table 2, Reg2RG also demonstrates clear superiority in Clinical Efficacy (CE) on the RadGenome dataset. It achieves significant improvements in precision, recall, and F1 score compared to all other models. In particular, the F1 score improves by nearly 20%, validating Reg2RG’s reliability in capturing critical diagnostic information. Notably, Reg2RG maintains a strong balance between precision and recall, whereas many models trade off one for the other, highlighting its robust diagnostic capability.

Table 2: Comparison of Reg2RG with other models on Clinical Efficacy (CE) metrics.

Additionally, we evaluate the model’s performance in region recognition. As reported in Table 3, recognition accuracy exceeds 95% for eight out of ten anatomical regions. Lower performance on the lungs and pleura is attributed to segmentation challenges rather than model limitations. Once accurate masks are available, Reg2RG shows consistently strong recognition performance.

Table 3: Region recognition accuracy of Reg2RG.

Ablation Studies

Ablation experiments confirm the importance of each module:

Removing local feature decoupling (texture/geometry separation) degrades region identification.

To further assess the contribution of each module to overall model performance, we conducted a series of ablation experiments. As shown in Table 4, removing the local feature decoupling (LFD) strategy results in performance degradation across several metrics. This highlights the importance of maintaining high-resolution texture and separate geometric information for accurately identifying abnormal regions.

Table 4: Effectiveness of Local Feature Decoupling (LFD) strategy.

Using only texture or only geometry reduces performance; combining both with global context yields best results.

In Table 5, we examine the effects of different feature combinations. Using only texture features without spatial location information limits model performance. The inclusion of geometric features improves results significantly, and further addition of global context features enhances the model’s ability to capture semantic relationships across regions.

Table 5: Effectiveness of texture (TXT), geometric (GEO), and global (GLB) features.

Disabling region-report alignment leads to less accurate and less interpretable reports.

In Table 6, we explore the impact of the Region-Report Alignment (RRA) training strategy. Without region-guided prompts, the model can still generate coherent text, but the accuracy and consistency of region-specific content degrade notably. This confirms that regional prompting is crucial for reinforcing semantic alignment between visual input and generated text.

Table 6: Effectiveness of Region-Report Alignment (RRA) strategy.

Larger LLMs (LLaMA2-7B) provide significant gains over smaller models.

Furthermore, we evaluate how model performance varies with different language model sizes. As shown in Table 7, replacing LLaMA2-7B with a smaller model such as GPT-2 leads to substantial drops in performance across all metrics. This suggests that large models possess superior capabilities for region-level referring, cross-region reasoning, and complex report generation.

Table 7: Effectiveness of language model size on Reg2RG performance.

Report Quality Analysis

To gain deeper insights into the quality of generated reports, we conducted statistical analysis on report length distributions and compared them with real-world references.
As shown in Figure 2, the length distribution of reports generated by Reg2RG closely mirrors that of real clinical reports, exhibiting less deviation than the MedVInT baseline. This indicates that Reg2RG produces more complete and information-rich outputs, reducing the likelihood of missing findings.

Figure 2: Report length distribution comparison between Reg2RG and other methods.

A case study in Figure 3 demonstrates that Reg2RG accurately identifies and describes abnormalities across multiple anatomical regions, whereas competing methods frequently miss or misidentify findings. The output shown in Figure 4 reveals clearly delineated report sections, each explicitly referring to a specific region. This structure greatly enhances the clinical interpretability and practical utility of the model output. The region-aware alignment mechanism not only improves the precision of generated report but also provides clinicians with precise spatial references for validation.

Figure 3: Report comparison between Reg2RG and the second-best model.

Figure 4: Example of a region-aligned diagnostic report generated by Reg2RG.

Conclusion

This paper presents Reg2RG, a region-guided framework for chest CT report generation, which introduces region-aware mechanisms to enhance diagnostic accuracy, fine-grained expression, and interpretability. Unlike traditional approaches that rely solely on global image features, Reg2RG integrates anatomical region features with global contextual information to produce precise multi-region diagnostic reports.
We propose a decoupled local feature strategy to model both texture and geometric information and further enhance inter-region semantic relationships through a global-local collaboration mechanism. Additionally, a region-report alignment training strategy reinforces the alignment between visual and linguistic modalities, improving the model’s region-specific reasoning and reporting capability.
Experimental results demonstrate that Reg2RG surpasses state-of-the-art methods in both language generation quality and clinical accuracy. The framework shows strong generalizability and practical potential. In future work, we aim to extend the approach to additional imaging modalities and diagnostic tasks, further advancing the practical deployment of automated medical report generation.

For more details, code and models are available at:
Paper (arXiv) | GitHub | HuggingFace

References

Z. Chen, Y. Bie, H. Jin, and H. Chen, “Large Language Model with Region-guided Referring and Grounding for CT Report Generation,” IEEE Transactions on Medical Imaging, doi: 10.1109/TMI.2025.3559923.

[Call for Papers] Deep-Brea3th 2025: AI and Imaging for Breast Care Workshop to be Held at MICCAI 2025

2025-05-19T00:00:00+08:00

Daejeon, South Korea – September 23-27, 2025

The second edition of the Deep-Brea3th Workshop on AI and Imaging for Diagnostic and Treatment Challenges in Breast Care will take place during MICCAI 2025 in Daejeon, South Korea. This international event will spotlight the latest advancements in artificial intelligence (AI) and multimodal imaging for breast cancer diagnosis and treatment, uniting leading researchers, clinicians, and industry experts to drive innovation and cross-disciplinary collaboration in breast health.

Breast Cancer: A Global Challenge, AI as a Solution

Breast cancer remains the most common malignant tumor among women worldwide, characterized by significant heterogeneity and complex diagnostic needs. Medical imaging—including mammography, ultrasound, MRI, PET/CT, and pathology—plays a crucial role in detection and characterization. With recent breakthroughs in AI, automated breast image interpretation and clinical decision support are rapidly advancing, offering new hope for improved outcomes and personalized care.

Three International Challenges to Advance the Field

Deep-Brea3th 2025 will feature three associated challenges, fostering open innovation and benchmarking in key areas:

ODELIA 2025: Breast MRI Challenge
MAMA-MIA 2025: Tumor Segmentation and Treatment Response Prediction in Breast MRI
UUSIC 2025: Universal Ultrasound Image Challenge for Multi-Organ Classification and Segmentation

Broad Scope, Focused Impact

The workshop welcomes submissions in all areas of AI for breast cancer, including (but not limited to):

Breast imaging (mammography, ultrasound, MRI, PET/CT, pathology, etc.)
Detection, segmentation, and classification
Breast cancer screening and risk prediction
Image registration, synthesis, and reconstruction
Multimodal imaging fusion
Treatment response and drug selection
Lymph node status and molecular subtypes
Tumor microenvironment and recurrence prediction
Radiomics, reader studies, and federated/swarm learning
Natural language processing and large language models

High-Quality Peer Review and Open Science

Submissions are invited in two tracks:

Full Papers: Up to 8 pages (plus 2 pages for references), double-blind reviewed, to be published in the Springer LNCS MICCAI Satellite Events proceedings. Selected papers may be recommended for high-impact SCI-indexed journals.
Abstracts: Up to 2500 characters, including a supporting figure, for both new and recently published work. Accepted abstracts will be presented orally or as posters and made publicly available.

All accepted contributions are eligible for Best Student Paper, Best Workshop Paper, and Best Abstract awards.

Submission portal: CMT Submission System

Key Dates

May 13, 2025: Submissions open
June 25, 2025: Submission deadline
July 16, 2025: Notification of decisions
July 30, 2025: Camera-ready deadline
August 1, 2025: Full program released
September 23-27, 2025: Workshop at MICCAI 2025

International Leadership and Expertise

Deep-Brea3th 2025 is organized by an international committee of experts from the Netherlands, Germany, China, the UK, USA, Spain, Switzerland, Greece, and more, ensuring a high standard of academic rigor and global impact.

Join Us: Shape the Future of Breast Health with AI

We invite researchers, clinicians, and innovators from around the world to submit their work and participate in this landmark event. For more information, submission guidelines, and updates, please visit our website or contact the organizing committee at miccai.deepbreath@gmail.com.

Contact:
Deep-Brea3th 2025 Organizing Committee
Email: miccai.deepbreath@gmail.com
Submission Portal: https://cmt3.research.microsoft.com/DeepBreath2025

Let’s use AI to change the future of breast care—together.

[Press Release] SmartX Lab Members Shine at 2025 Undergraduate Research Opportunities Program Awards

2025-05-06T00:00:00+08:00

SmartX Lab is proud to announce the outstanding achievements of its students and faculty at the 2025 Kerry Holdings Limited Undergraduate Research Opportunities Program (UROP) Awards. The award presentation ceremony, held on April 23, recognized innovative research across multiple disciplines, with HKUST students taking home several top honors.

Top Honors for Biomedical AI Research

LIU Runsheng (COMP, Year 3), supervised by Prof. CHEN Hao from the Department of Computer Science and Engineering, clinched both the Champion and Best Poster Award for his groundbreaking project, “GAInS: Gradient Anomaly-aware Biomedical Instance Segmentation.”

Biomedical instance segmentation is crucial for accurately identifying and outlining structures such as tissues and cells in medical images. Traditional methods often struggle with overlapping or touching cells, treating each scenario separately and missing the connections between them. LIU’s GAInS approach introduces a new way to detect and refine these complex regions by analyzing the local gradient anomalies—essentially, the sharp changes in image intensity that signal different structures. His method uses a novel Gradient Anomaly Mapping Module (GAMM) to create detailed maps of these regions, and an Adaptive Local Refinement Module (ALRM) that fine-tunes the segmentation boundaries. The result is a system that outperforms current state-of-the-art techniques, as demonstrated in extensive experiments across multiple biomedical datasets. The work has also been recognized internationally, having been published at the IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 2024.

Advancing Medical Image Analysis with AI

GU Yi (COMP, Graduated), also supervised by Prof. CHEN Hao, was named Second Runner-Up for his project, “Deep learning for medical image analysis.” GU’s research tackles the challenge of detecting medical anomalies—critical for early disease diagnosis—using artificial intelligence. Traditional models are trained only on healthy data and must identify when something unusual appears. However, these models often struggle to confidently distinguish between normal and abnormal cases.

GU’s solution, D2UE (Diversified Dual-space Uncertainty Estimation), introduces two key innovations. First, a technique called Redundancy-Aware Repulsion (RAR) ensures that AI models learn to agree when analyzing healthy images and disagree when faced with potential anomalies. Second, his Dual-Space Uncertainty (DSU) method examines both the outputs of the models and how those outputs change with slight variations in the input, allowing for more sensitive detection of subtle anomalies. Tested across five medical datasets, D2UE demonstrated superior performance compared to existing methods, marking a significant step forward in AI-powered medical diagnostics.

Looking Ahead

The UROP Office encourages both faculty and students to participate in the upcoming Summer 2025 program. Project proposals and student applications are now open. For further details, please visit the UROP website.

Congratulations to all awardees for their dedication to advancing research and innovation at HKUST!

About SmartX Lab

About The Hong Kong University of Science and Technology

[Press Release] MOME Model: AI Empowers Breast Cancer Diagnosis | Dialogue between SmartX Lab and Shenzhen People’s Hospital

2025-04-30T00:00:00+08:00

On April 30, 2025, the SmartX Lab team, led by Prof. Hao Chen, had a fruitful dialogue with Shenzhen People’s Hospital. The discussion focused on the MOME model, an AI-powered breast cancer diagnosis model developed by SmartX Lab. This model has shown remarkable performance in assisting medical professionals with breast cancer diagnosis.

The dialogue was attended by Prof. Chen and Dr. Wu Mingxiang from Shenzhen People’s Hospital. The meeting provided an excellent opportunity for both teams to exchange ideas and explore potential collaborations in the field of AI-driven healthcare.

The MOME model, which stands for Mixture of Modality Experts, is a state-of-the-art AI model that integrates the multiparametric MRI (mpMRI) data. The model has been trained on the largest dataset and has demonstrated its ability to provide reliable diagnostic support for medical professionals.

The collaboration between SmartX Lab and Shenzhen People’s Hospital aims to further refine the MOME model and explore its potential applications in clinical settings. The dialogue highlighted the importance of interdisciplinary collaboration in advancing AI technologies for healthcare, ultimately benefiting patients and improving medical outcomes. The meeting also emphasized the significance of AI in transforming the healthcare landscape, particularly in the early detection and diagnosis of diseases. By leveraging advanced AI technologies, healthcare professionals can enhance their diagnostic capabilities, leading to more accurate and timely interventions for patients.

Following the meeting, both teams expressed their enthusiasm for future collaborations and the potential impact of AI in healthcare. The SmartX Lab team is committed to continuing its research in AI-driven healthcare solutions, with the goal of improving patient outcomes and revolutionizing medical practices.

Video of the dialogue is available as follows (in Mandarin Chinese with English subtitles):

Your browser does not support the video tag.

About SmartX Lab

About The Hong Kong University of Science and Technology

[CVPR 2025] FOCUS: Knowledge-Enhanced Adaptive Visual Compression for Few-Shot Whole Slide Image Classification

2025-02-28T00:00:00+08:00

Recetnly, SmartX Lab’s new work FOCUS: Knowledge-Enhanced Adaptive Visual Compression for Few-Shot Whole Slide Image Classification, accepted at CVPR 2025, a top-tier conference in computer vision. This study introduces FOCUS, a novel framework designed to tackle the few-shot weakly supervised learning (FSWL) problem in whole slide image (WSI) classification. By leveraging adaptive visual compression and knowledge enhancement, FOCUS effectively filters out irrelevant regions and prioritizes diagnostically significant areas, addressing key challenges in pathology AI under data-scarce conditions.

Background

Whole slide image (WSI) classification is a crucial task in computational pathology (CPath), enabling automated disease diagnosis from high-resolution pathology images. This is particularly important for cancer detection, where accurate classification can significantly impact clinical decision-making. However, WSIs are extremely high-resolution (often exceeding 100,000 × 100,000 pixels), and diagnostic information is sparsely distributed across the image. Traditional fully supervised learning approaches require extensive expert-annotated training data, which is costly and limited due to privacy constraints and expert availability.

To address this, few-shot weakly supervised learning (FSWL) has emerged as a promising solution, allowing models to learn effectively from limited labeled samples combined with unlabeled data. A widely used approach in FSWL is multiple instance learning (MIL), where WSIs are divided into smaller image patches, and features are aggregated to form a slide-level representation. However, when labeled samples are extremely scarce, MIL struggles to distinguish diagnostically relevant patches from irrelevant ones. Additionally, existing methods underutilize pathology foundation models (FMs) and language priors, limiting their effectiveness in few-shot scenarios.

This raises a key question:
How can we efficiently compress visual information and focus on diagnostically relevant regions under few-shot conditions?

FOCUS addresses this by integrating foundation models (FMs) and language knowledge into an adaptive compression mechanism, ensuring that critical diagnostic features are prioritized. Figure 1 illustrates the overall framework.

A. Multiple Instance Learning (MIL)

MIL is the dominant approach for WSI analysis, treating each WSI as a bag and its patches as instances. Patch features are aggregated to form a slide-level representation, making MIL well-suited for weakly supervised learning. However, traditional MIL requires large labeled datasets, making it ineffective in few-shot settings. FOCUS enhances MIL by introducing adaptive visual compression to improve feature selection.

B. Pathology Foundation Models (FMs)

Recent pathology foundation models (e.g., CONCH, UNI, GPFM, Virchow) have demonstrated strong feature extraction capabilities through large-scale pretraining. However, current methods limit their use to feature extraction, failing to leverage their semantic understanding. FOCUS utilizes FMs for global redundancy removal and semantic relevance assessment, improving feature selection.

C. Few-Shot Weakly Supervised Learning (FSWL)

FSWL aims to perform WSI classification with minimal labeled samples. Existing methods incorporate language guidance or additional visual samples, but they often struggle with multi-resolution requirements or reliance on extra reference samples. FOCUS introduces a three-stage compression strategy, offering a more generalizable and efficient solution.

Methodology

FOCUS employs a three-stage progressive visual compression strategy, consisting of the following steps:

A. Global Redundancy Removal

Each WSI is divided into non-overlapping patches, and features are extracted using pretrained pathology foundation models. To remove redundant information, FOCUS applies cosine similarity-based filtering within a local sliding window. A dynamic threshold is computed based on mean similarity and standard deviation, ensuring that highly redundant patches are eliminated.

\[\hat{\mathbf{b}}_i=\frac{\mathbf{b}_i}{\left\|\mathbf{b}_i\right\|_2}, \quad S_{i j}=\hat{\mathbf{b}}_i \cdot \hat{\mathbf{b}}_j, \quad \tau_g=\mu(S)+\sigma(S)\]

B. Language-Guided Visual Token Prioritization

After redundancy removal, the remaining patch features are matched with expert-provided textual descriptions. A text encoder converts expert descriptions into text tokens, which are then mapped to visual features. A cross-modal attention mechanism calculates similarity scores, ranking visual tokens by semantic relevance. The top-ranked tokens are selected for further processing, ensuring that diagnostically significant regions are prioritized.

\[\begin{gathered} \mathbf{A}=\operatorname{softmax}\left(\frac{\left(\mathbf{T} W_q\right)\left(\mathbf{B} W_k\right)^{\top}}{\sqrt{d}}\right), \quad r_i=\frac{1}{t_1+t_2} \sum_{j=1}^{t_1+t_2} \mathbf{A}_{j i}, \\ k=\min \left(M_{\max }, \gamma N^{\prime}\right), \quad \mathbf{B}_s=\left\{\mathbf{b}_i \mid \operatorname{rank}\left(r_i\right) \leq k\right\}, \end{gathered}\]

C. Sequential Visual Token Compression

To further refine the feature set, FOCUS applies sequential compression to remove local redundancy among selected visual tokens. Cosine similarity is computed between adjacent tokens, and a dynamic thresholding mechanism iteratively filters out redundant tokens. This process preserves spatial coherence while eliminating unnecessary information.

\[\begin{gathered} s_{j, j+1}=\frac{\mathbf{b}_j^{l \top} \mathbf{b}_{j+1}}{\left\|\mathbf{b}_j\right\|_2 \cdot\left\|\mathbf{b}_{j+1}\right\|_2}, \quad j \in\{1, \ldots, k-1\} \\ \operatorname{mask}_j= \begin{cases}1, & \text { if } \min \left(s_{j-1, j}, s_{j, j+1}\right)<\theta_i \\ 0, & \text { otherwise }\end{cases} \end{gathered}\]

The compressed visual tokens are fused with text embeddings using a multi-head attention mechanism. The attention scores are normalized and passed through a fully connected layer to generate the final classification probabilities. The model is trained end-to-end using a cross-entropy loss function, optimizing feature extraction, cross-modal attention, and classification jointly.

\[\begin{gathered} \operatorname{Head}_i=\operatorname{softmax}\left(\frac{\mathbf{Q} W_q^i\left(\mathbf{K} W_k^i\right)^{\top}}{\sqrt{d_k}}\right) \mathbf{V} W_v^i, \\ \mathbf{O}=\operatorname{LayerNorm}\left(\operatorname{Concat}\left(\operatorname{Head}_1, \ldots, \operatorname{Head}_h\right) W_o\right), \\ P\left(Y \mid \mathbf{B}_c, \mathbf{T}\right)=\operatorname{softmax}\left(W_c \mathbf{O}+\beta_c\right), \end{gathered}\]

Experimental Results

Datasets

FOCUS was evaluated on three benchmark pathology datasets:

TCGA-NSCLC: A dataset from The Cancer Genome Atlas, containing lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) WSIs (~1,000 slides).
CAMELYON: A dataset from the CAMELYON16 and CAMELYON17 challenges, including normal and metastatic lymph node samples (~900 slides).
UBC-OCEAN: A dataset from the University of British Columbia, containing five ovarian cancer subtypes (~500 slides).

Experiments were conducted under 4-shot, 8-shot, and 16-shot settings, where only 4, 8, or 16 labeled WSIs per class were used for training.

Evaluation Metrics

The model was evaluated using:

Balanced Accuracy (Balanced ACC): Accounts for class imbalance.
Area Under the ROC Curve (AUC): Measures classification performance.
F1 Score: Reflects overall classification quality.

Key Findings

The above Table summarizes FOCUS’s performance:

TCGA-NSCLC: Achieves 81.9% Balanced ACC and 91.5% AUC under 4-shot settings, outperforming all baselines.
CAMELYON: Maintains high accuracy across low-shot settings, reaching 70.1% ACC in 4-shot experiments.
UBC-OCEAN: Demonstrates strong performance in multi-class classification, achieving 70.4%, 77.3%, and 86.4% Balanced ACC in 4-shot, 8-shot, and 16-shot settings, respectively.

Effect of Foundation Model Selection

We evaluated different foundation models (FMs) for feature extraction, including CONCH, UNI, GPFM, and Virchow. CONCH consistently outperformed others, particularly in low-shot settings, highlighting the importance of vision-language pretrained models for pathology tasks.

Impact of Prompt Generation

We also examined the effect of different LLM-generated prompts. Prompts generated by Claude-3.5-Sonnet led to the highest Balanced ACC (86.4%) in 16-shot settings, outperforming GPT-3.5-Turbo (84.9%), demonstrating the impact of high-quality textual guidance.

Conclusion

FOCUS introduces a knowledge-enhanced adaptive visual compression framework for few-shot WSI classification, addressing key challenges in data-scarce pathology AI. By integrating foundation models and language priors, FOCUS significantly improves diagnostic feature selection and classification accuracy.

Future work will explore:

Expanding FOCUS to more computational pathology tasks.
Enhancing multi-resolution feature modeling.
Applying FOCUS to broader medical AI applications.

For more details, check out the full paper:
Guo, Zhengrui, et al. “FOCUS: Knowledge-Enhanced Adaptive Visual Compression for Few-Shot Whole Slide Image Classification.” arXiv preprint arXiv:2411.14743 (2024).

Smart Lab News

[Nature Communications] Generative AI for misalignment-resistant virtual staining to accelerate histopathology workflows

Introduction

Method

Results

Conclusion

Resources

[ICLR 2026] Exploiting Low-Dimensional Manifold of Features for Few-Shot Whole Slide Image Classification

Background

Related Work

A. Standard MIL Backbones

B. Few-Shot Specialized Methods

C. Manifold Residual Block (MR Block)

Methodology

Preliminaries

MR Block Architecture

Experimental Results

Comparison with State-of-the-Art Methods

Ablation Studies

Component Ablation

Capacity-Matched Analysis

Rank Sensitivity Analysis

Interpretability in Extreme Resource-Constrained Settings

Conclusion

Future Directions

[Seminar] Publishing in Nature Biomedical Engineering

[Nature Communications] Large-scale generative tumor synthesis in computed tomography images for improving tumor recognition

Introduction

Method

Results

Conclusion

Resources

[TPAMI] Large-Scale 3D Medical Image Pre-training with Geometric Context Priors

Introduction

Method

Experiments

Conclusion

Resources

[Press Release] SmartLab Introduces SmartPath: An AI Pathology Platform Transforming End-to-End Cancer Care

Groundbreaking Features for End-to-End Clinical Support

Proven Performance in Rigorous Clinical Trials

Leadership Voices

Real-World Impact and Collaboration

About SmartX Lab

About The Hong Kong University of Science and Technology

Sources and Further Reading

[EClinicalMedicine] Multi-task Deep Learning System achieves accurate identification and prognosis prediction of triple-negative breast cancer

Introduction

Clinical Challenges

Method

Validation and Results

Interpretability

Clinical Benefits and Limitations

Resources

[EClinicalMedicine] Multi-task Deep Learning System Enhances Integrated Non-invasive MRI Diagnosis of Nine Knee Abnormalities

Introduction

Clinical Challenges

Method

Validation and Results

Clinical Insights

Impact and Outlook

Resources

[Call for Participants] CSIG Youth Scientist Conference 2025 Forum

Overview

Agenda

Chairs and Keynote Speakers

Chairs Information

Keynote Speakers Information

More Information

[Call for Participants] 8th International Symposium on Image Computing and Digital Medicine

Overview

Forum Agenda

Chairs

Keynote Speakers & Topics

About ISICDM 2025

[Nature Biomedical Engineering] A generalizable pathology foundation model using a unified knowledge distillation pretraining framework

Introduction

Related Works

Method

Clinical Validation