INTRODUCTION
Frailty is a health condition characterized by increased vulnerability to adverse health outcomes, becoming more prevalent with aging.1,2 It is associated with the decline of multiple organ systems, threatening an individual’s ability to maintain independent activities of daily living and overall quality of life.2,3 Preserving independent activities of daily living is essential for reducing the burden on healthcare systems, such as hospital admissions and social care services, ultimately extending a healthy lifespan.4,5 Therefore, frailty is a critical issue that requires proactive prevention and management at a societal level.
Older adults experience a decline in physical function, which is strongly linked to increased dependence, the need for care, and low perceived quality of life.6,7 Additionally, sarcopenia—characterized by progressive loss of muscle mass, strength, and slow gait speed—has been recognized as a key condition for physical function impairment with frailty.8,9 Systematic review and meta-analysis demonstrated that sarcopenia and frailty share and in linked with core biomarkers, including metabolic, inflammatory, and hematologic markers, and nutritional deficiencies, leading to decreased physical capability.10,11 Given this strong overlap, assessing sarcopenia can provide valuable insights for frailty classification, particularly when combined with objective physical performance measures.
Notably, physical performance has been identified as a key component in the development of frailty.12,13 The short physical performance battery (SPPB) and single-leg stance (SLS) test are widely used assessments of physical function in older adults.14,15 The SPPB consists of three components: walking speed, standing balance, and a chair sit-to-stand test, with higher scores indicating better physical function.14 Balance is a fundamental requirement for daily activities such as standing, walking, and climbing stairs, making the SLS a useful test for evaluating postural control and stability. By incorporating sarcopenia-related assessments alongside SPPB and SLS, a more comprehensive evaluation of frailty risk can be achieved.
Given the strong association between physical performance and frailty, utilizing objective assessments is essential for early detection and intervention. Machine learning (ML) techniques are particularly suited for this purpose, as they can integrate multiple variables and capture complex patterns to improve classification accuracy.16 In this study, we selected five commonly used ML models—logistic regression, support vector machine (SVM), K-nearest neighbors (KNN), decision tree, and random forest to compare a range of linear and non-linear, parametric and non-parametric, and interpretable versus ensemble approaches. The aim was to (1) evaluate the ability of SPPB, SLS, and sarcopenia-related indicators to predict frailty through ML approaches, and (2) identify the most suitable ML model for classifying frailty status in community-dwelling older adults and determine the most influential features contributing to model performance. This approach may contribute to more effective screening and personalized management of frailty.
MATERIALS AND METHODS
Participants aged 65 years and older who lived in South Korea were recruited from the general community. All participants resided in urban areas and were able to live independently without caregivers or family. Inclusion criteria were as follows: 1) no neurocognitive impairment; 2) able to walk with or without assistive walking aids; 3) residents of the community. Exclusion criteria were as follows: 1) fracture within the past three months; 2) severe cardiovascular disease (myocardial infarction, unstable angina pectoris, heart failure); 3) hearing or language impairment that made communication difficult. All participants were informed about the purpose and procedures of the study and voluntarily signed the informed consent form to participate. Ethical approval for this study was obtained from the Institutional Review Board of Sangji University.
Participants completed a demographic questionnaire, the strength, assistance with walking, rising from a chair, climbing stairs, and falls questionnaire (SARC-F), and the mini-mental state examination (MMSE). Following this, frailty and physical performance tests (SPPB and SLS) were administered. Examiner A assessed participants’ eligibility, obtained written informed consent, and collected data related to the questionnaires. Examiner B assessed frailty, while Examiner C performed the physical performance tests.
Frailty was assessed using the Fried frailty phenotype, which consists of five components: unintentional weight loss, fatigue, low physical activity, weak grip strength, and slow walking speed.1,17,18 Unintentional weight loss was determined by asking participants whether they had lost more than 5% of their body weight or 4.5 kg unintentionally over the past year. Fatigue was evaluated based on self-reported exhaustion. Participants were classified as experiencing fatigue if they reported feeling exhausted on at least 3 to 4 days per week or most of the time in response to the following questions: “Have you felt that everything was difficult?” or “Have you felt that you couldn’t get anything done?” Low physical activity was defined as no self-reported engagement in physical activities such as walking, recreational activities, or sports.18 Grip strength was measured using a Jamar dynamometer, with participants seated on a chair without armrests, shoulders adducted, elbows flexed at 90 degrees, and wrists in a neutral position. The test was performed three times, and the average of the three trials was used for analysis. Weak grip strength was defined based on bady mass index (BMI)-specific cutoffs.1 In men, weak grip strength was classified as 29 kg or less for those with a BMI of 24 kg/m2 or lower, 30 kg or less for those with a BMI between 24.1 and 26 kg/m2, 31 kg or less for those with a BMI between 26.1 and 28 kg/m2, and 32 kg or less for those with a BMI greater than 28 kg/m2. In women, weak grip strength was classified as 17 kg or less for those with a BMI of 23 kg/m2 or lower, 17.3 kg or less for those with a BMI between 23.1 and 26 kg/m2, 18 kg or less for those with a BMI between 26.1 and 29 kg/m2, and 21 kg or less for those with a BMI greater than 29 kg/m2. Walking speed was determined using a standardized 4-meter walk test.17 A score of 1 point was assigned for frailty if the walking speed was < 1.0 m/s. Based on these five components, participants who met one or two criteria were classified as pre-frail, while those who met three or more were classified as frail.1
The SARC-F questionnaire consists of five components: muscle strength, assistance with walking, rising from a chair, climbing stairs, and falls.19 Muscle strength is assessed by asking participants how much difficulty they have lifting or carrying a 10-pound object, with scores of 0 (no difficulty), 1 (some difficulty), and 2 (a lot of difficulty or unable to do). Assistance with walking is evaluated based on the difficulty of walking across a room and the use of aids or personal assistance, scored as 0 (no difficulty), 1 (some difficulty), and 2 (significant difficulty, requiring aids, or unable to walk without help). The ability to rise from a chair is assessed by evaluating the difficulty in transitioning from a chair or bed and whether aids or assistance are needed, with scores of 0 (no difficulty), 1 (some difficulty), and 2 (significant difficulty, requiring aids, or unable to rise without help). Climbing stairs is evaluated by asking about the difficulty in ascending a flight of 10 steps, with scores of 0 (no difficulty), 1 (some difficulty), and 2 (a lot of difficulty or inability to do so). Falls are scored as follows: 0 points for no falls, 1 point for one to three falls, and 2 points for reporting four or more falls in the past year. Each item is scored from 0 (no difficulty) to 2 (severe difficulty or inability), resulting in a total score ranging from 0 to 10, and higher scores indicating a higher risk for sarcopenia.
Physical performance was evaluated using the SPPB and SLS tests. The SPPB consists of three components: walking speed, a standing balance test, and a chair sit-to-stand test. For the walking speed test, participants were required to walk 4 meters at their usual pace. A score of 4 was given if they completed the walk in less than 4.82 seconds, 3 if they took between 4.82 and 6.20 seconds, 2 if they took between 6.21 and 8.70 seconds, and 1 if they took more than 8.70 seconds. A score of 0 was assigned if the participant was unable to complete the walk. For the standing balance test, participants were asked to maintain three standing positions: side-by-side, semi-tandem, and full-tandem, with their arms crossed over their chest. For the side-by-side and semi-tandem stances, participants received 1 point if they maintained the position for 10 seconds, and 0 points if they could not. For the full-tandem stance, participants received 2 points if they maintained the position for 10 seconds, 1 point if they maintained it for at least 3 seconds but less than 10 seconds, and 0 points if they were unable to maintain the position for at least 3 seconds. For the chair sit-to-stand test, participants were instructed to stand up from a chair five times as quickly as possible without using their arms. A score of 4 was given if they completed the task in 11.19 seconds or less, 3 if they took between 11.20 and 13.69 seconds, 2 if they took between 13.70 and 16.69 seconds, and 1 if they required 16.70 seconds or more. A score of 0 was assigned if the participant was unable to complete five repetitions within 60 seconds or could not perform the test. The total SPPB score was calculated as the sum of the three component scores, ranging from 0 (indicating the poorest physical performance) to 12 (indicating the best physical performance). Higher scores reflect better physical function, while lower scores indicate greater impairment in mobility and balance.
To assess SLS, participants were instructed to stand on their preferred leg while keeping their arms crossed over their chest.15 The maximum time (in seconds) that participants could maintain balance was recorded, with the best of the two trial times used for analysis. The test ended after 60 seconds. If the participant lost balance or placed the raised foot on the ground, the test was terminated. A longer duration indicated better balance performance, while a shorter duration suggested impaired balance ability.15
Normality of the continuous independent variables was assessed using the Shapiro–Wilk test. An independent t-test was used for parametric variables, the Mann–Whitney U test for non-parametric variables, and the chi-squared test for categorical variables to assess differences between frail and pre-frail older adults. To classify frailty status (frail vs. pre-frail), five ML models—logistic regression, SVM, KNN, decision tree, and random forest—were applied. The input features included BMI, MMSE, SPPB, SLS, and the SARC-F questionnaire. The dataset was split into 70% for training and 30% for testing using stratified sampling to preserve the original class distribution. Hyperparameter tuning was conducted using grid search with 5-fold cross-validation, and models were optimized based on the F1-score to balance precision and sensitivity (recall). Model performance was evaluated on the test dataset using multiple classification metrics, including accuracy, sensitivity, specificity, precision, and F1-score. Additionally, receiver operating characteristic curve analysis was performed to evaluate the binary classification capability of each model, and the area under the curve (AUC) was calculated as a measure of discriminative performance. Permutation feature importance was applied to interpret model predictions and identify key predictors of frailty. All analyses were implemented in Python (ver. 3.11).
RESULTS
Table 1 shows the characteristics of frail and pre-frail older adults. Compared to the pre-frail group, the frail group had a shorter height and lower MMSE score (p<0.05). Additionally, the frail group showed lower SPPB scores, and higher SARC-F scores compared to the pre-frail group (p<0.05).
Table 2 displays the results of five prediction models based on accuracy, sensitivity, specificity, precision, and F1-score. Five ML models demonstrated varying classification performance with AUCs (Figure 1). Among the models tested, the KNN model achieved the highest accuracy (0.93) and F1-score (0.95), with an AUC of 0.86 for classifying frail older adults, demonstrating strong predictive performance. Logistic regression also showed high classification ability, with an accuracy of 0.86, an F1-score of 0.89, and an AUC of 0.98 for frailty prediction. The random forest model achieved an accuracy of 0.86, and an F1-score of 0.88, with an AUC of 0.96. The SVM model showed an accuracy of 0.79, an F1-score of 0.84, and an AUC of 0.80. The decision tree model demonstrated the lowest performance, with an accuracy of 0.71, an F1-score of 0.78, and an AUC of 0.64. MMSE and SARC-F were identified as the most important predictors for frailty classification, according to permutation importance analysis (Figure 2).


DISCUSSION
The primary objective of this study was to develop and evaluate an ML-based frailty classification model using physical performance assessments. In this study, KNN was the most effective ML method among five supervised models for frailty classification, integrating physical performance measures such as SPPB, SLS, and SARC-F, along with individual characteristics like BMI and MMSE score. This data-driven ML approach has the potential to facilitate early detection and enable targeted interventions for older adults at risk of frailty.
This study focused on distinguishing frail older adults from those in the pre-frail stage, rather than comparing them with robust individuals. Pre-frail individuals, although not yet fully frail, are high-risk populations who exhibit early signs of physiological vulnerability. Identifying frailty within this group is essential, as timely interventions at this stage can prevent or delay progression to full frailty.20 Several studies have shown that individuals transitioning from pre-frailty to frailty experience significant declines in gait speed, balance, strength, and physical activity levels, often accompanied by worsen disabilities and decreased resilience to stressors.2,21 These changes lead to loss of independence and a steep rise in healthcare needs. By training a ML model to differentiate between frailty and pre-frailty, this study aimed to develop a clinical decision-support tool that aids in the early detection of advanced frailty stages in at-risk populations.
Previous studies have investigated the performance of ML models for frailty classification considering to sociopsychological factors, physical function and physical activity.22-25 Leme and de Oliveira reported best performance among six ML models (logistic regression, random forest, SVM, neural network, KNN, and naive bayes classifier) and random forest has best performance (accuracy of 85.5% and precision-recall curve of 0.97) using social, clinical, and psychosocial factors.25 Elsa et al. reported 0.86 of accuracy and 0.67 of sensitivity for prediction of frailty using grip strength based on shallow neural network.22 Park et al. reported 0.80 of AUC to identify physical frailty from fourteen sensor-driven feature such as time standing, percentage time walking, walking cadence, using pendant sensor attached the sternum level based on logistic regression modeling with an accuracy ranging from 0.71 to 0.93 across the ML models.23 The findings of this study, along with prior research, support the notion that early assessment of physical function can facilitate timely interventions and potentially improve patient outcomes.
The KNN algorithm demonstrated the highest classification performance in this study, making it the most suitable model for frailty classification. One of the key advantages of KNN in frailty classification is its ability to adapt to nonlinear relationships between physical performance measures and frailty status. Since frailty is a multidimensional syndrome influenced by various physiological and functional factors,2 logistic regression may have limitations in capturing these complex interactions. KNN, a non-parametric and instance-based algorithm, utilizes distance-based comparisons to distinguish subtle differences in frailty-related features, contributing to its effective classification performance.26 Moreover, KNN’s ability to integrate multiple physical and cognitive parameters, including MMSE, highlights its potential to enhance the precision of frailty risk assessment. From a clinical perspective, the superior performance of KNN in this study suggests its potential utility for screening and early detection of frailty.
Logistic regression is widely used in the medical field for binary classification and is commonly applied to assess disease risk or analyze factors predicting patient prognosis.27,28 Unlike more complex ML models, logistic regression provides clear insights into the contribution of each variable to the classification outcome, making it particularly useful for clinical decision-making. In this study, logistic regression achieved an accuracy of 0.86 with an F1 score of 0.89, highlighting its effectiveness in distinguishing frail individuals based on physical performance measures, including SPPB, SLS, and SARC-F. Given its simplicity, efficiency, and interpretability, logistic regression remains a practical and valuable tool for frailty assessment, particularly in settings where explainability is crucial.
Through permutation feature importance analysis, the KNN model identified MMSE as the most important feature, suggesting that lower cognitive function may be associated with a higher risk of frailty. SARC-F emerged as the second most influential variable within the KNN model, highlighting its consistent role in frailty classification. This finding aligns with recent studies demonstrating the strong predictive validity of SARC-F for functional decline, adverse health outcomes, and frailty in older adults.29-31 The consistent importance of both cognitive status and sarcopenia-related measures supports their robustness and potential for integration into AI-based frailty screening tools.
Despite these promising results, several limitations must be considered. First, the sample size was relatively small, and the class distribution was imbalanced, with fewer frail participants compared to pre-frail participants. This limitation may reduce the generalizability and statistical power of the study results. To mitigate this issue, this study applied cross-validation with grid search, a widely used technique for improving model robustness in small and imbalanced datasets,32 and reported multiple evaluation metrics including sensitivity and specificity to provide a more comprehensive assessment of the model’s performance. Nevertheless, further studies with larger and more balanced samples are required to confirm and improve the robustness of the proposed classification model. Additionally, while physical performance measures serve as valuable objective indicators of frailty, incorporating biological markers—such as inflammatory or metabolic markers—could potentially improve the model’s predictive accuracy and provide a more comprehensive understanding of frailty pathophysiology.
CONCLUSION
In conclusion, this study demonstrates that supervised ML techniques, particularly KNN, can effectively classify frailty based on physical performance measures. These findings suggest that ML-based frailty classification has the potential to be integrated into clinical practice, enabling the early identification of at-risk individuals and facilitating targeted interventions to prevent the progression of frailty. Future research should focus on expanding the dataset, incorporating additional risk factors, and validating the model in diverse older adults, including robust individuals, to enhance its clinical applicability