INTRODUCTION
Recent evidence indicates that older adults with diabetes have an elevated risk of developing musculoskeletal disorders and that the interaction between diabetes and obesity further exacerbates this risk.1,2 These findings highlight the need to consider both body mass index (BMI) and overall adiposity when managing diabetes and musculoskeletal pain in older populations.3,4 Accumulating evidence also suggests that abdominal obesity, measured by waist circumference (WC) and waist-to-hip ratio (WHR), is more strongly associated with metabolic and musculoskeletal disorders than body mass index (BMI), underscoring the clinical importance of assessing central adiposity.5,6 However, current approaches often rely on manual tape measurements or specialized equipment, which limits their feasibility in routine community or primary care settings because the data are not digitally recorded or integrated into healthcare systems.6-8
Previous research has employed conventional anthropometric indices such as BMI, WC, and WHR, or imaging modalities including MRI, CT, and DEXA to assess obesity-related risks.9-11 While informative, these methods require trained personnel or expensive equipment, which reduces accessibility in rural areas and small healthcare centers.12,13 Machine learning approaches have also been explored for chronic disease classification; however, many models rely on data sources that are not easily obtainable in real-world environments, which has contributed to their limited adoption in clinical practice.14,15 Although WC measured by tape is relatively simple to collect, such data are rarely stored in a systematic or digitalized format, making them less practical for large-scale or longitudinal, data-driven applications.16 As highlighted by recent critiques in digital health research, developing classification models based on features that are both realistically collectable and easily storable is essential for achieving clinical applicability and real-world impact.14
Because smartphones are already the most widely available and user-friendly digital devices, Inertial Measurement Unit (IMU)-based measurements can be performed with minimal burden, while data are automatically collected, stored, and reused for repeated assessments.17 Recent studies have demonstrated that smartphone cameras combined with machine learning can objectively estimate total and abdominal fat mass from 2D body silhouette images in adults.8,16,18 However, these approaches require participants to wear form-fitting clothing and maintain specific standing postures, often with assistance or a tripod to ensure stable imaging conditions, which may pose practical challenges for older adults with limited digital literacy in real-world settings.19 IMU sensors embedded in smartphones have shown potential as an alternative for obesity assessment, but a previous study mainly analyzed gait patterns using devices worn at the waist, making it difficult to directly measure abdominal obesity.20 Prior research has demonstrated that smartphone inclinometer applications, which calculate tilt angles from the device’s built-in IMU sensor data, can reliably capture spinal alignment and deformity, as well as range of motion in major joints.21 Extending this evidence, the present study examines whether abdominal inclination features measured using a smartphone inclinometer can serve as novel digital features for classifying diabetes and musculoskeletal pain in older adults.
This study examined whether smartphone inclinometer–measured abdominal inclination features can classify diabetes and musculoskeletal pain in older adults, and explored key contributing features using explainable artificial intelligence (XAI). We hypothesized that (1) simple inclination features extracted from a single smartphone placement would provide sufficient discriminatory power, and (2) XAI would reveal clinically interpretable patterns that align with known biomechanical and metabolic characteristics. By validating a low-cost, easy-to-measure, and scalable assessment method, this study aims to bridge the gap between rapidly evolving machine-learning techniques and their practical deployment in community and primary care settings, thereby supporting early screening and personalized management in aging populations.
METHODS
This cross-sectional observational study included 105 community-dwelling older adults aged 60 years or older, who were recruited from local community health programs in Korea. All participants provided written informed consent prior to enrollment. Participants were excluded if they met any of the following criteria: (1) underweight (BMI<18.5 kg/m2) or obesity class III (BMI≥40 kg/m2); (2) medical conditions affecting fat distribution (e.g., Cushing’s syndrome, metabolic syndrome); (3) inability to maintain an upright standing position for abdominal measurements due to postural imbalance or congenital spinal deformities.9,22 The study protocol was reviewed and approved by the Institutional Review Board of Yonsei(202507-HR-3966-05), and all procedures were conducted in accordance with the principles of the Declaration of Helsinki.
Demographic characteristics, including age, sex, height and weight were collected. For anthropometric data, conventional indices were collected, including BMI and tape-based measurements. BMI was calculated as weight (kg) divided by height squared (m2). WC was measured at the midpoint between the 12th rib and the iliac crest, and hip circumference at the widest part of the buttocks, both recorded to the nearest 0.1 cm following WHO guidelines.9 The WHR was subsequently derived. Group comparisons were conducted to examine differences between participants with and without diabetes or musculoskeletal pain. Continuous variables were summarized as mean±standard deviation and compared using independent t-tests or Mann–Whitney U tests following assessment of normality with the Shapiro–Wilk test.
For outcome labeling, participants with diabetes were determined based on a documented physician diagnosis in medical records. Musculoskeletal pain status was labeled using a self-report questionnaire with a 10-cm visual analogue scale (VAS), where any score greater than zero was considered indicative of pain. Participants with a VAS score greater than 3 cm were classified into the pain group, whereas those with a score below 2 cm were classified into the non-pain group. These outcome labels were treated as binary target variables for subsequent model development.
Abdominal inclination measurements were performed using a smartphone (iPhone; Apple Inc., Cupertino, CA, USA) equipped with a single built-in IMU sensor and an inclinometer application (Angle Finder; JRSoftWorx, Berlin, Germany). All measurements were conducted by a trained examiner who was a health professional experienced in smartphone-based health monitoring. To ensure measurement consistency, the examiner underwent standardized training and followed a predefined protocol throughout the study. During the assessment, participants stood in a relaxed upright position with steady breathing, avoiding trunk bending or limb movement. For the upper abdomen (UA) angle, the examiner positioned the upper edge of the smartphone at the xiphoid process, maintaining full contact of the device’s bottom surface with the abdominal wall (Figure 1). For the lower abdomen (LA) angle, the device was placed at the midpoint between both anterior superior iliac spines under the same conditions (Figure 1). The smartphone was held steady for approximately 2 seconds, after which the inclinometer application automatically recorded the angle. The smartphone was held steady for approximately 2 seconds during the measurement, after which the inclinometer application automatically recorded the angle. Consistent with real-world clinical practice, where obesity-related variables such as WC and body weight are typically assessed only once, each abdominal inclination (UA and LA) was measured a single time for this study to enhance usability and reflect practical conditions in digital healthcare implementation. A pilot study involving 40 participants demonstrated good-to-excellent within-day intra-rater reliability for the smartphone-derived inclination measures, with intraclass correlation coefficients of 0.95 for UA, 0.90 for LA, and 0.94 for TA.
Raw abdominal inclination data consisted of two primary angular features: UA and LA. To capture nonlinear and interaction effects, additional features (Figure 1) were derived, including the summated angle of UA and LA (TA), squared terms (UA2, LA2, TA2), and a multiplicative interaction term (UA×LA). Furthermore, ratio-based features were generated to reflect the relative contribution of local abdominal regions: UA/TA, LA/TA, UA/TA2, LA/TA2, and the normalized interaction term (UA×LA)/TA2. These nonlinear features were included not only for mathematical completeness but also because squared terms (e.g., UA2 and LA2) may capture the disproportionate anterior protrusion that occurs as visceral fat increases, reflecting curvature-related changes in abdominal shape. Ratio-based features were added to express the relative contribution of upper and lower abdominal segments, which may vary depending on regional fat distribution.
Stratified five-fold cross-validation was applied to partition the dataset into training and test subsets while preserving class balance. Because the number of diabetic or pain cases was limited, this approach maximized data efficiency. All preprocessing steps and the mRMR feature selection were performed only on the training portion of each fold and then applied to the corresponding test set to prevent data leakage. This approach ensures an unbiased estimation of model performance while identifying features that are most relevant to the target outcome and least redundant with one another.23 For each classification task (diabetes and musculoskeletal pain), the top five features from the mRMR ranking were retained for model training. The selected features for musculoskeletal pain were UA2, LA2, UA×LA, TA2, and UA, while those for diabetes were TA2, LA2, UA×LA, UA2, and TA. To mitigate variance due to the limited sample size, the mRMR selection was repeated across folds, and the consistency of feature rankings was reviewed.
To capture both nonlinear and interpretable decision boundaries for classifying diabetes and musculoskeletal pain, we used two tree-based algorithms (Light Gradient Boosting Machine (LightGBM) and Balanced Random Forest (BalancedRF)) along with one linear margin-based algorithm (Linear Support Vector Classifier with probability calibration (LSVC_PC)). To address class imbalance, algorithm-specific balancing strategies were applied (e.g., class_weight adjustment in LightGBM and random undersampling in BalancedRF). These models were deliberately selected because they are well-suited for imbalanced and small-to-moderate clinical datasets, providing complementary strengths in handling nonlinear patterns, ensemble-based resampling, and interpretable linear boundaries.
1) LightGBM is a type of gradient boosting model that builds multiple small decision trees to make classifications. Unlike traditional tree models that grow evenly, LightGBM grows trees in a leaf-wise direction, meaning it expands only where improvement is most needed. This approach helps the model learn complex nonlinear relationships in the data while remaining fast and efficient, even with small sample sizes. In this study, class weights were adjusted so that the model could fairly learn from both common and rare cases, improving performance on imbalanced datasets.24
2) BalancedRF is an improved version of the traditional random forest model that is designed to handle imbalanced datasets. It builds many decision trees, but unlike the standard approach, each tree is trained on a randomly balanced subset of the data. This helps prevent the model from being biased toward the more frequent class and allows it to better detect rare cases, such as participants with diabetes or musculoskeletal pain. Because of its built-in balancing mechanism and strong generalization ability, BalancedRF is particularly useful for small or unevenly distributed clinical datasets.25
3) LSVC_PC is a machine learning model that separates two groups (e.g., presence or absence of disease) by finding the best linear boundary between them. It works well when the number of features is relatively small and the dataset is limited, as it focuses on maximizing the margin between classes rather than fitting every data point. However, a standard support vector machine does not provide probability estimates, which are necessary for evaluating model performance with metrics such as area under the curve. To address this, a probability calibration step (Platt scaling) was applied after training. This allows the model to output calibrated probabilities, making the results easier to interpret and compare with other models.25
Each model was trained and evaluated using stratified five-fold cross-validation to preserve class proportions across folds and to minimize optimistic bias arising from the small number of diabetic participants. This procedure ensured that minority cases were distributed across all folds rather than repeatedly sampled within a single training subset, thereby reducing the risk of overfitting and providing a more stable estimate of generalizability. Additional resampling methods, such as SMOTE-Tomek were also tested but did not yield further improvement and were therefore not used. All models were implemented in Python (v3.10) using the scikit-learn, imbalanced-learn, and LightGBM packages.
Evaluation metrics included accuracy, sensitivity, and specificity, representing overall, positive, and negative classification performance, respectively. The Matthews correlation coefficient (MCC) was also calculated, as it provides a balanced summary of performance even under class imbalance. Discriminative ability was quantified using the area under the receiver operating characteristic curve (ROC-AUC), while robustness to imbalance was assessed using the area under the precision–recall curve (PR-AUC), computed using the positive class as the target of interest. Because the dataset exhibited substantial class imbalance, PR-AUC was included as a metric, as it emphasizes performance on the minority (positive) class and is less influenced by the large number of true negatives. Confusion matrices were generated to visualize classification outcomes. Final model selection for each target was based on mean ROC-AUC and PR-AUC across five folds, with preference given to models demonstrating superior discrimination and consistent precision-recall performance.
To enhance interpretability of the final models, SHapley Additive exPlanations (SHAP) were applied to quantify the contribution of each feature to the model classifications. TreeSHAP or KernelSHAP analysis was performed only for the best-performing classifiers. For both models, SHAP values were computed on the test subsets of each cross-validation fold using the models trained on their corresponding training subsets, ensuring no information leakage. Global feature importance was visualized using SHAP beeswarm plots, which display the relative ranking and direction of feature influence across participants. This approach enabled identification of the most influential features and clarified whether higher or lower feature values increased the likelihood of classification into the positive outcome group.
RESULTS
Table 1 summarizes the demographic and anthropometric characteristics of the participants. Among all participants (n=105), 21 (20.0%) had diabetes, and 27 (25.7%) reported musculoskeletal pain. Participants with diabetes showed significantly higher weight (70.4±7.2 vs. 61.3±9.0 kg, p<0.01), BMI (28.5±2.7 vs. 24.9±3.0 kg/m2, p<0.01), WC (98.2±7.0 vs. 87.3±9.3 cm, p<0.01), and WHR (0.98±0.05 vs. 0.91±0.06, p<0.01) than those without diabetes. No significant differences were found in age or height between groups. In contrast, participants with musculoskeletal pain were significantly older than those without pain (67.7±1.6 vs. 64.6±3.2 years, p<0.01). However, no significant group differences were observed in weight, BMI, WC, or WHR (p>0.05).
The three machine learning models (LightGBM, BalancedRF, and LSVC_PC) exhibited distinct performance patterns for classifying diabetes and musculoskeletal pain based on mean results across five-fold cross-validation (Table 2). For diabetes classification, BalancedRF exhibited the most stable precision–recall performance and superior sensitivity (87%) while maintaining 82% specificity (Figure 2 and 3). Although LightGBM achieved slightly higher accuracy (0.89 vs. 0.83), its recall performance was less consistent across folds. In contrast, BalancedRF demonstrated more robust discrimination (ROC-AUC=0.93, PR-AUC=0.84, MCC=0.61) compared with LightGBM (ROC-AUC=0.91, PR-AUC=0.73, MCC=0.65) and LSVC_PC (ROC-AUC=0.83, PR-AUC=0.71, MCC=0.36). Therefore, BalancedRF was selected as the final model for diabetes classification.
For musculoskeletal pain classification, LightGBM demonstrated the most balanced results across folds (accuracy=0.78, PR-AUC=0.56, MCC=0.42), outperforming BalancedRF (accuracy=0.73, ROC-AUC=0.80, PR-AUC=0.56, MCC=0.39) and LSVC_PC (accuracy=0.70, PR-AUC=0.43, MCC=0.28) (Table 2 and Figure 2). LightGBM correctly identified only 53% of musculoskeletal pain cases, showing limited sensitivity but high specificity (87%) (Figure 3). Despite its moderate sensitivity, which indicates limited applicability for precise pain classification, LightGBM was selected as the final model for musculoskeletal pain classification based on its overall discrimination and precision–recall consistency.
For diabetes classification using the BalancedRF model (Figure 4), SHAP analysis identified TA2 and LA2 as the most influential features, followed by UA×LA, UA2, and TA. Higher values of TA2 and LA2 were associated with an increased probability of diabetes classification, whereas UA2 and UA×LA provided complementary contributions.
For musculoskeletal pain classification using the LightGBM model (Figure 4), SHAP analysis identified UA2 as the most influential feature, followed by LA2, UA×LA, TA2, and UA. Higher values of UA2 and LA2 were generally associated with an increased probability of pain classification, whereas interaction and squared terms contributed to nuanced variations in classification outcomes.
DISCUSSION
This study demonstrated that smartphone-derived abdominal tilt angles can serve as potential digital features for classifying both diabetes and musculoskeletal pain. The principal finding was that the LightGBM model for musculoskeletal pain achieved acceptable accuracy and specificity but suffered from low sensitivity, indicating that musculoskeletal pain could not be reliably distinguished using this feature set alone. In contrast, the BalancedRF model for diabetes showed robust discrimination with both high sensitivity and specificity. These results suggest that smartphone-based biomechanical features may have limited value as a stand-alone screening tool for musculoskeletal pain, but they hold promising potential for diabetes risk stratification in both community and clinical settings.
Previous studies in digital health have primarily relied on conventional anthropometric indices such as body fat percentage, WC, or imaging-based measurements obtained through radiographic devices and 3D body scanners.26 While these measures are clinically meaningful, they typically require specialized equipment and cannot be easily collected in daily life, limiting the real-world applicability of machine learning models developed from such data.26 In contrast, the smartphone-measured abdominal tilt angles used in the present study enable convenient and cost- effective data collection outside of clinical facilities. The feature-level interpretation further provides insight into how these smartphone-derived abdominal tilt patterns may be associated with distinct metabolic and biomechanical mechanisms. Consistent with previous studies identifying abdominal obesity as a major correlate of insulin resistance and diabetes risk, particularly among older adults,27,28 the SHAP analysis identified the squared total and lower abdominal tilt angles (TA2 and LA2) as the strongest features of diabetes, supporting their potential use as digital indicators of central obesity (Figure 4). The greater contribution of squared terms compared with linear counterparts suggests a nonlinear relationship, where larger abdominal tilt angles capture disproportionately greater anterior protrusion that becomes more relevant at higher levels of central adiposity. This alignment with established biomechanical and metabolic evidence suggests the potential feasibility of posture-based features for noninvasive diabetes risk screening. In contrast, the musculoskeletal pain model showed weaker and less consistent associations, with higher upper (UA2) and lower abdominal tilt (LA2) values only modestly increasing pain probability (Figure 4). These findings align with prior evidence showing that central adiposity and body fat distribution have minimal influence on musculoskeletal pain thresholds, suggesting that obesity does not inherently enhance nociceptive sensitivity or increase chronic pain susceptibility.29 Overall, smartphone-based silhouette metrics using inclined angle of abdomen appear condition specific, appearing more informative for metabolic disorders such as diabetes than musculoskeletal pain.
From a clinical and digital health perspective, the findings highlight the potential of smartphone-based postural metrics as scalable tools for chronic disease monitoring. Unlike conventional imaging or wearable systems, abdominal tilt can be directly measured using a smartphone inclinometer application, providing broad accessibility without additional devices or cost. Although standardization currently requires proper positioning and guidance, this measurement can be readily implemented in community or home-based settings. This approach aligns with the growing demand for self-assessable, noninvasive monitoring methods that integrate seamlessly into daily life. Incorporating abdominal tilt measurement into mobile applications may further enhance user engagement and adherence by delivering immediate visual feedback on abdominal obesity and metabolic risk, particularly for diabetes. Such user-centered design within diabetes care can strengthen perceived usefulness and trust in digital health technologies, thus promoting sustained engagement and supporting equitable access to preventive healthcare.
This study has several limitations that should be acknowledged. First, the study sample was relatively small, and the number of participants with diabetes was particularly limited. To address this, stratified cross-validation with a fully leakage-free pipeline was used to maximize data efficiency, although larger external cohorts will be needed to confirm generalizability. Second, the models were trained using static abdominal tilt angles, without accounting for temporal or contextual variations such as postprandial abdominal expansion or transient posture changes during daily activities. Third, although the smartphone inclinometer is less precise than research-grade IMU sensors, its measurements showed reasonable validity through moderate associations with WC and WHR. Lastly, this study did not include age or gender as features. Although age and sex are important factors related to diabetes and musculoskeletal pain, they were intentionally excluded from the models. Including these variables could have improved accuracy, but it would have prevented us from isolating the independent value of abdominal inclination features. In addition, achieving accurate classification without requiring demographic inputs may enhance usability in real-world digital healthcare settings. Future research should extend beyond data acquisition to evaluate the feasibility of smartphone-based abdominal tilt monitoring as a digital therapeutic tool for individuals diagnosed with diabetes. In particular, periodic measurements during physiologically dynamic states, such as empty stomach, pre-prandial, and post-prandial phases may provide meaningful insights into real time glycemic fluctuations and lifestyle related abdominal changes. Additionally, assessing the usability and engagement of such self-monitoring from a user experience perspective will be essential to determine its practicality and long-term effectiveness in diabetes self-management.
CONCLUSION
This study demonstrated that smartphone-derived abdominal tilt angles serve as practical digital features for classifying diabetes more effectively than musculoskeletal pain in elderly individuals. Among the evaluated models, the BalancedRF achieved the best performance for diabetes classification, showing high sensitivity and specificity, whereas the LightGBM model for musculoskeletal pain exhibited limited sensitivity. These findings suggest that biomechanical features extracted from simple smartphone measurements are more robust for detecting metabolic risk than for identifying musculoskeletal pain. SHAP analysis revealed that squared lower and total abdominal tilt angles were key features of diabetes, reflecting mechanisms related to central obesity. Future research should explore the potential of smartphone-based abdominal tilt monitoring as a digital therapeutic tool for individuals with diabetes, particularly through periodic measurements during fasting and postprandial states, to assess dynamic glycemic responses and enhance user engagement in real-world settings.






