Sydney Precision Data Science Centre
Sydney Data Science Insights
Edition 5, December 2025
|
|
|
|
Message from the Director - Professor Jean Yang
It’s been a dynamic first semester for 2025 at the Sydney Precision Data Science Centre as we continue to build new collaborations, both within the University and with external partners.
One of the highlights has been joining forces on the Snow Vision Accelerator, a $50 million initiative aimed at transforming glaucoma research. Our team will lead the data analytics work to uncover new insights into disease progression, with applications across the field of vision science.
We’re also celebrating the third year of the Winter Data Analysis Challenge, which continues to grow. A special thanks to Westpac Institutional Bank for sponsoring this year’s event and supporting the next generation of data scientists.
We were thrilled to welcome Dr Jie Kang, the first data science researcher in our newly launched Data Science Hub at the Charles Perkins Centre, a joint initiative designed to accelerate data-driven research.
Thank you to everyone who contributes to our Centre’s success. I hope you enjoy reading about the many achievements and activities in this newsletter — from researcher highlights to international workshops and new collaborative programs.
Best wishes,
Jean
| |
|
Researcher Spotlight: Dr Jie Kang
This year, we were delighted to welcome Dr Jie Kang to the team at the Sydney Precision Data Science Centre. Jie has joined us as the first data science researcher in the newly established Charles Perkins Centre Data Science Hub. The Data Science Hub is a collaborative partnership with the Charles Perkins Centre and Sydney Precision Data Science Centre, co-funded by the Charles Perkins Centre's Jennie Mackenzie Research Fund, the Faculty of Medicine and Health and the Faculty of Science.
The Data Science Hub advances data-intensive research with a direct translational impact in biomedical, metabolomics and epidemiological research and beyond. Jie’s own research interests focus on applied statistics and quantitative genetics.
What excites you about data science? To me, data science is about turning data into decisions through analytical thinking and technical tools. What excites me most is its potential to tackle complex, real-world challenges and drive meaningful, lasting change.
What is keeping you busy at the moment? Our newly established Data Science Hub at the Charles Perkins Centre is like my foster child. I’m fully committed to making it a success.
How long have you been working with the SPDS? 103 days. I’ve moved from New Zealand and am enjoying getting to know Sydney, the city and the university.
What is next for you at SPDS? Now that the Data Science Hub is on track, the next step for me is to strengthen its connection with SPDS, connecting people, expertise, and ideas to advance collaborative science with impact.
Where could we find you when you’re not working? I’ve got a band called Just Nuts. I fight in the cage and run marathons for charity. I drive ambulances. If there’s noise, sweat, or sirens, chances are I’m probably somewhere in the middle of it.
| |
Feature method: ClassifyR
In this newsletter, we introduce ClassifyR, an R package designed to advance precision medicine through robust multi-view classification frameworks by Dario Strbenac, Jasmine Gu, Harry Robertson, Jean Yang, and Ellis Patrick
Precision medicine is a personalised approach to healthcare that tailors prevention, diagnosis, and treatment to an individual’s unique characteristics such as genetics, environment, lifestyle, and molecular data. By integrating diverse data types, the ultimate goal of precision medicine is to deliver the right decision or treatment to the right patient at the right time. While this has driven the development of complex classification strategies, realising this vision relies on robust evaluation of model performance at both cohort and patient levels.
ClassifyR is an R package designed to meet this need. Built within the Bioconductor ecosystem, ClassifyR provides a flexible and modular framework for evaluating classification performance across high-dimensional omics datasets (Figure 1). Unlike general-purpose machine learning libraries, it is purpose-built to handle the structure, complexity, and biological nuance of multi-omics data integration.
Its key capabilities include:
- Bioconductor ecosystem integration: ClassifyR is interoperable with established omics data structures in the Bioconductor Project, ensuring seamless access to single-cell, multi-omics, and spatial technologies.
- Full cross-validation workflow: The package performs comprehensive cross-validation by including feature selection and hyperparameter tuning within the cross-validation procedure, which is essential for handling large-scale omics datasets.
- Cross-dataset and cross-modality validation: Models can be both constructed and evaluated across cohorts and omics platforms, addressing issues of reproducibility and transferability.
- Precision medicine focus: ClassifyR includes frameworks for assessing model appropriateness at an individual patient level, which is crucial for evaluating which model or modality is appropriate for which patient.
| |
ECR Annual development workshop
Chair of the ECR Committee, Farhan Ameen, shares some reflections on the workshop held in February as the 2025 academic year commenced.
The theme of this year’s workshop looked towards the future, dedicated to equipping our ECRs with the skills to supercharge their careers. We began with a series of communication sessions—covering visual storytelling in figures, crafting effective elevator pitches, and improvisation exercises. I particularly enjoyed the elevator pitch session, which offered reminders of the diversity of research interests within our centre and underscored the importance of communicating our work clearly to different audiences.
We then heard from research leaders about emerging trends in computer science, statistics, and bioinformatics. The discussion centred on keeping up with and adapting to rapid changes in the field and how we can leverage new technologies, such as large language models, to improve our research. These sessions served as a clear reminder of the need to stay informed and adaptable as the research landscape continues to evolve.
The workshop closed with a career-focused session featuring a panel of alumni who shared their experiences navigating careers beyond the early-career stage. Their insights into the challenges and decisions faced when navigating academia and industry were invaluable. A personal highlight for me was the mock interview session, which pushed me out of my comfort zone and gave me confidence for future interviews.
Of course, it would be remiss not to mention the camaraderie. Whether it was enjoying the chaos of 15 untrained “chefs” cooking a family meal, a competitive pool tournament, or seeing who could hit a golf ball the farthest, these shared moments created a strong sense of connection regardless of whether you were a summer student or a professor. Workshops like this remind us that while research drives our work, it is the people and shared experiences that make our community.
| |
News & updates
Harry Robertson was awarded the President’s Prize at the Transplantation Society of Australia and New Zealand annual scientific meeting. Congratulations Harry!
SPDS at the Cancer Bioinformatics Australia symposium On Wednesday 18 June, three members of the centre attended the Cancer Bioinformatics Australia symposium in Sydney, Australia. Post-doctoral research associate Yue Cao presented a poster on the epigenomic analysis of CAR-T cells, a promising avenue for cancer treatment. PhD student Cabiria Liang presented a poster about BenchHub, an under-development framework for discovering curated data sets for methodology benchmarking. Post-doctoral research associate Dario Strbenac presented a poster about the cancer microbiome of head and neck cancer, titled “But Wait, There’s More! Evaluating the Potential of Human Whole Genome Sequencing for Oral Cancer Microbiome Analysis” and won the Best Early Career Researcher Lightning Talk. The key message was that researchers could extract important microbiome information from their human-focused whole-genome sequencing experiment without extra effort or loss crucial detail, avoiding the need for an additional 16S rRNA sequencing experiment.
New supplementary scholarships At the start of the year, we launched a new collaborative research program with the Centre for Drug Discovery Innovation (DDI) to advance decision-making in drug discovery enabled by data science. With established strengths in both data science and drug discovery, the collaboration is ideally placed to explore how these new context-specific questions from drug discovery will inform methodological developments in data science. Four postgraduate research supplementary scholarships have been made available to support this program, funded by the Faculty of Science and DDI. This initiative is a result of a previous workshop that aimed to identify the grand challenges in drug discovery and data science. Congratulations to the first round of scholarship recipients, Cabiria Liang and Rojashree Jayakumar from SPDS, along with Tommy Lu and Joshua Mills from DDI.
Transforming glaucoma research The Sydney Precision Data Science Centre is proud to collaborate on the Snow Vision Accelerator initiative, a $50 million partnership between the Snow Medical Research Foundation and the University of Sydney to fight glaucoma, the world’s leading cause of irreversible blindness.
This is the largest single philanthropic investment in vision science in Australia, bringing together leading Australian and international research groups. Our Centre will utilise advanced data analytics to create new insights into the progress and mechanism of glaucoma, with the potential to be applied to other vision research. Professor Jean Yang, will play a lead role in guiding a team of data scientists to leverage the deep sea of new, complex data generated by the accelerator.
| |
Unlocking single cell omics workshop in Hong Kong
PhD Student Daniel Kim reports on the recent workshop.
Last month, a small team from the Sydney Precision Data Science Centre travelled to Hong Kong to deliver a workshop at the Chinese University of Hong Kong. The workshop was supported by a collaborative grant between the University of Sydney and the Laboratory of Data Discovery for Health (D24H). The team included Centre Director Professor Jean Yang, Dr Lijia Yu, and PhD students Daniel Kim and Andrew Zhang. Titled "Unlocking Single-Cell Spatial Omics Analyses with SCDNEY", the workshop focused on the challenges and analytical strategies involved in multi-sample spatial transcriptomics data analysis.
Around 20 postgraduate and undergraduate students from life and medical sciences, statistics, and computer science participated. This was the third iteration of the workshop, and it featured a distinctive design. Unlike traditional workshops that follow a fixed pipeline, this session was structured around posing scientific questions upfront, guiding participants to explore the data with these questions in mind. Each analytical section concluded with a group discussion to encourage engagement and critical thinking—an approach that many participants enjoyed.
The post-workshop evaluation was favourable and indicated strong interest in the technical concepts behind spatial transcriptomics. One participant even took a moment to mention the Tim Tams we had brought all the way from Australia. However, we were later informed that Tim Tams could also be purchased in Hong Kong.
The workshop concluded on a high note with a joint group dinner generously hosted by Associate Professor Zhixiang Lin. The evening fostered new connections as the group exchanged knowledge and experiences about cultural differences and the research ecosystems in Hong Kong and Australia. It was a valuable opportunity for students to develop a broader global perspective.
| |
Images: Left - Daniel Kim presenting at the workshop, Right - connecting over dinner post workshop.
| |
Australian Data Science Network Conference
Sydney Precision Data Science Centre is hosting the 4th Australian Data Science Network (ADSN) conference. The ADSN conference aims to connect Australia’s top experts in data science, fostering collaboration, expanding opportunities, and showcasing our collective capabilities. Registrations and the call for abstracts are open. Find out more on the conference website.
| |
|
|
|
|
|