Our website uses cookies and other similar tools. We also analyze anonymized web traffic. You can choose your cookie preferences below. You may choose only necessary cookies, specific cookies or all cookies. Read more in our privacy policy
Home > All articles > Medaffcon’s Machine Learning Algorithm Adapts to Various Research Needs
Medaffcon’s Machine Learning Algorithm Adapts to Various Research Needs
The machine learning algorithm was originally developed to extract smoking status from patient texts with purpose to analyze the effects of smoking on postoperative complications. Today, it is also being utilized in lung cancer research.
Medaffcon explored this issue first in study including half a million patients operated at Helsinki University Hospital (HUS). Identifying smoking status from patient records may sound straightforward, but in practice, it is not. Smoking-related data is not structured but rather recorded in free text within broader patient records. The challenge is: how can smoking status be efficiently extracted from a vast amount of text?
Medaffcon and the research group developed a machine learning-based classifier to assist in data analysis. To train the model, clinical experts manually classified a total of 20,000 smoking-related sentences. This task was completed in a single day by two clinical experts, supported by Medaffcon’s pre-processing techniques and specialized tools. Following this, a total of half a million smoking-related sentences were analyzed and categorized using the machine learning algorithm.
The Potential of Scalability in Machine Learning Models
According to Olivia Hölsä, scalability requires that the algorithm is trained on a sufficiently large and diverse population.
“The algorithm we developed is based on a large and representative patient cohort, which includes a wide variety of patients.”
Hölsä explains that for this reason Medaffcon’s machine learning model is robust enough to analyze both large patient populations and more specific patient subgroups.
“In machine learning, it is crucial to ensure that the training data is consistently annotated and comparable to the real-world data for which the model is intended, allowing the model to correctly interpret and extract relevant information from clinical documentation”.
Hölsä says that it would also be interesting to compare machine learning models developed for a specific patient group across different university hospitals in Finland to assess how scalable the models are across the regions.
When scaling the machine learning model to various research needs, it is essential to ensure flexibility and avoid overfitting it to a single specific use case.
“For example, in the case of smoking, we must account for the fact that smoking status can change over time. A person may be a smoker but later quit smoking. Therefore, the model should be able to incorporate time-based restrictions.”
Extensive Expertise in Real-World Evidence (RWE) Research
High-quality data is essential for developing an effective machine learning model. Medaffcon has extensive experience conducting Real-World Evidence (RWE) studies. As a result, its experts have a deep understanding of reliable data sources and know how to account for critical factors during data collection.
“We understand where the data is documented, and what information is available. We know the right questions to ask clinicians regarding data entry and can define the necessary specifications for data collection.”
Hölsä also emphasizes the importance of recognizing data limitations. For instance, the data of interest can be recorded across different healthcare systems such as specific laboratory tests being conducted in primary care rather than specialized healthcare. This must be addressed early in the study design, including specifications for data collection, and further considered during model development.
Beyond smoking status, other critical treatment-related factors, such as cancer progression and metastases, are still documented in unstructured formats. According to Hölsä, machine learning could be effectively used to analyze these records and extract valuable insights.
Medaffcon, founded in 2009, is a Nordic research and consulting company specializing in Real-World Evidence, Medical Affairs, and Market Access. With offices in Stockholm, Sweden, and Espoo, Finland, we provide expert services across the Nordic region. Our services combine strong medical and health economic expertise with modern data science.
The company employs some 30 experts. Since 2017, Medaffcon has been a subsidiary of Tamro Oyj and is part of the PHOENIX group, which is a leading provider of healthcare services in Europe.
Olivia joined Medaffcon in 2021 as a trainee to work on her Master’s thesis. She is finishing her studies in bioinformatics and digital health at Aalto University and has also worked as teaching assistant alongside her studies there for three years. Her main strengths are analytical thinking, problem solving skills and proactive mindset.
She is interested in data analysis in the field of social and healthcare to support decision making in improving social and healthcare services and allocating their resources more effectively.
Iiro joined Medaffcon in March 2017 as a Biostatistician. For the preceding four years, he has worked as a research assistant in an academic study group, analyzing clinical and genetic patient data. Iiro holds a Master of Science degree in Technology in Bioinformation Technology.
Iiro’s strengths include his strong expertise in statistics and data-analysis, hands-on experience in working with sensitive patient data, and strong interdisciplinary communication skills with experts from various fields. In the field, he is particularly interested in the large data amounts made available with the revolution of technology and how the information received such data can potentially be utilized to draw concrete conclusions, both in order to understand the nature of diseases and to advance the goals of the pharmaceutical industry and patient treatment.
“Machine learning and AI-based solutions will have a major impact on the healthcare sector now and in the future. However, effectively utilizing the already collected and available health-data will have a higher importance in order to improve health-care”.