INTRODUCTION
As the global population continues to age, musculoskeletal disorders such as sarcopenia, muscle atrophy, and chronic muscle pain have become increasingly important public health concerns.1,2 These conditions contribute to functional decline, frailty, reduced independence, and decreased quality of life, particularly among older adults.3–5 Growing clinical and societal demand for the prevention, monitoring, and treatment of muscle-related conditions has consequently accelerated interest in technologies aimed at improving musculoskeletal health.6,7 In particular, advances in biomedical engineering, wearable sensing systems, and digital healthcare have expanded the technological approaches available for muscle assessment, rehabilitation, and functional support.8–10
Building upon these technological advances, there has been a growing interest in the development and implementation of innovative technologies that assess muscle function, facilitate early detection, and support preventive as well as therapeutic strategies.6,7 In particular, the convergence of biomedical engineering, information technology, and materials science spurred the emergence of advanced solutions for musculoskeletal health.8 Notable among these were wearable devices capable of real-time biomechanical monitoring, biological signal sensors that captured electrophysiological data such as electromyography (EMG), and biologically active materials designed to stimulate muscle regeneration or mitigate degeneration.8–10 As technological innovation in physical health accelerates, the importance of intellectual property protection and strategic utilization has become increasingly evident.11 In rapidly evolving fields such as muscle function assessment, regenerative therapies, and digital health, patent filings serve as key indicators of not only technological advancement but also commercialization potential.12–14
Patent data in this context provides valuable insight into the translational trajectory of innovations beyond academic research, offering a practical view of how technologies are being developed for real world applications.15 With recent advances in natural language processing (NLP) and machine learning (ML), it has become possible to systematically extract meaningful information from large volumes of patent documents, enabling the classification, clustering, and temporal visualization of technology trends.16–18 Eventually, analyzing technologies from an intellectual property perspective goes beyond simply tracking innovation trends.15,18 It plays an important role in identifying emerging competitive technologies at an early stage and in supporting policy decisions, investment planning, and industrial strategy development.
Despite this progress, few studies have quantitatively analyzed domestic patents related to muscle health, especially from the perspective of clinical and industrial applicability. Most existing patent analyses either target general healthcare innovations or lack a specific emphasis on muscle-related technologies, thereby overlooking important developments in the field. Therefore, this study aimed to collect and analyze muscle-related patent and utility model filings. By applying NLP- and ML-based clustering algorithms, this study categorized core technology domains, extracted representative keywords, and visualized temporal trends in patent filings over the past two decades. Through this approach, the study aimed to map the evolving landscape of muscle health technologies and uncover patterns that are clinically and commercially significant.
METHODS
To clearly categorize the topics of muscle-related patents and utility models and analyze the current state and direction of technological development in the muscle field, this study used NLP- and ML-based analyses. The research process consisted of five main steps: data collection and text preprocessing, text embedding and dimensionality reduction, clustering and keyword extraction, and visual analysis (Figure 1).
South Korea was selected as the analytical setting because it represents a rapidly aging society with an increasing demand for musculoskeletal health technologies. South Korea is projected to become a super-aged society by 2025, with adults aged 65 years or older accounting for more than 20% of the population.19 In addition, Korea has strong technological capabilities in medical devices, wearable sensors, and digital health, providing an appropriate environment for the development and patenting of both physical and biological muscle-related innovations.20 The Korean Intellectual Property Rights Information Service (KIPRIS) further offers comprehensive and publicly accessible patent data, including both applications and registered utility models, enabling reproducible large-scale text-mining analyses. Accordingly, Korean patent data provide a relevant context for examining how aging-related healthcare needs are translated into applied muscle health technologies.
Publicly available data from KIPRIS were used to analyze trends in patents related to muscle technologies. A total of 2,836 patents and utility models, including both published and registered documents, were analyzed. Each record includes the application number, filing date, title of the invention, abstract, and claims. Titles and abstracts were used to construct the text dataset, reflecting the established practice in patent text mining, as these sections adequately capture the core technological themes and vocabulary required for large-scale clustering analyses.21 The data collection and preprocessing procedures described above were conducted on March 28, 2025.
The title and abstract were concatenated to form a unified text field for each document. Text cleaning was performed using regular expressions to remove special characters, punctuation, and extra spaces. Sentence segmentation was performed using the Korean Sentence Splitter (KSS). Next, nouns were extracted from each sentence using the Open Korean Text (Okt) morphological analyzer. To reflect the specific language of patent documents, a custom stopword list was applied to filter out legal expressions, common nouns, measurement units, and domain-general terms from the data. Each document was then represented as a list of filtered noun phrases, which were joined to form a condensed summary text for further embedding.
To capture the semantic features of the summarized patent texts, Sentence-BERT was applied using the pretrained multilingual model paraphrase-multilingual-MiniLM-L12-v2. Each document was converted into a 384-dimensional embedding vectors. Dimensionality reduction was conducted using Uniform Manifold Approximation and Projection (UMAP), tested at output dimensions of 5, 10, and 15.
Two density-based clustering algorithms were applied to the Uniform Manifold Approximation and Projection (UMAP) reduced embeddings. Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) was selected as the primary clustering method, with the minimum cluster size parameter explored across a range of 20 to 100. A second-level HDBSCAN analysis was subsequently performed within each top-level cluster to identify the sub-domains. For comparison, DBSCAN was also applied as a supplementary clustering method, with the epsilon parameter evaluated across values ranging from 0.3–1.0. To identify representative themes in each cluster, keywords were extracted using the Term Frequency-Inverse Document Frequency (TF-IDF) method. For each cluster, the top 10 terms with the highest average TF-IDF scores were selected, excluding domain-irrelevant stopwords. These keywords were used to describe the technical focus of each group. Cluster labels were assigned by reviewing the dominant TF-IDF keywords and representative patents within each cluster, with the aim of identifying the primary technological themes represented in the patent corpus.
To enable intuitive interpretation, Three-dimensional UMAP embeddings were visualized using 3D scatter plots. Each point represents a patent colored by cluster assignment. Cluster centroids were calculated and used for the visualization guidance. In addition, a trend analysis was conducted by extracting the filing year from each patent record. The number of patents per cluster was aggregated annually to examine the shifts in thematic focus over time.
The study was conducted in the Google Colab environment using Python with a Tesla T4 GPU and a high-memory runtime to improve the analysis efficiency. The tools used for preprocessing included Pandas, Konlpy (Okt), KSS, and tqdm. Sentence-BERT (sentence-transformers 3.3.1) was used for embedding, UMAP for dimensionality reduction, HDBSCAN and DBSCAN for clustering, and scikit-learn (TF-IDF Vectorizer 1.6.0) for the keyword analysis. Visualizations were created using matplotlib (3.10.0).
RESULTS
From the initial 3,047 muscle-related patent records, duplicates were removed, resulting in 2,836 unique patents for further analysis. The embedded text vectors (384 dimensions) were reduced using UMAP to 10 dimensions, which provided the clearest cluster separation for down-stream clustering analyses. For HDBSCAN, a minimum cluster size of 80 with the excess-of-mass selection rule yielded the highest silhouette coefficient (0.610) and the lowest Davies–Bouldin index (0.525), and was therefore selected as the primary configuration. In the supplementary DBSCAN analysis, the selected configuration showed lower cluster validity than HDBSCAN, with a silhouette coefficient of –0.441 and a Davies–Bouldin index of 1.669 (Figure 2). Across the full dataset, HDBSCAN classified 238 patents (8.4%) as noise, which were excluded from the cluster-level interpretation but retained in the temporal trend analysis. Per-sub-cluster silhouette coefficients ranged from 0.80 for the Massage and Vibration Devices sub-domain to -0.32 for the heterogeneous biological residual group, with smaller specialized sub-domains generally exhibiting higher cluster coherence than larger heterogeneous groups. Within each top-level macro-cluster, a second-level HDBSCAN analysis with a minimum cluster size of 15 was applied to identify finer-grained sub-domains. This parameter was selected to be sufficiently small to detect specialized technological niches while remaining large enough to preserve per-cluster statistical robustness, with named sub-domain sizes ranging from 18 to 163 patents. The full hierarchical partition (10 sub-domains) yielded a lower aggregate silhouette coefficient of 0.074 compared to the top-level two-cluster solution (0.610), indicating that sub-domain boundaries were less distinct than the macro-level separation. Accordingly, the primary validated structure of the dataset was the two-domain macro-cluster solution, whereas the sub-domain assignments provided an exploratory, finer-grained thematic organization supported by per-cluster silhouette values.
The clustering results of the muscle-related patents and utility models are summarized in Table 1 and Figure 3. HDBSCAN identified two top-level technological macro-domains and multiple finer-grained sub-domains, excluding noise. The cluster validity metrics for the HDBSCAN top-level partition (silhouette=0.610; Davies–Bouldin=0.525) exce-eded those of the supplementary DBSCAN partition (silhoue-tte=–0.441; Davies–Bouldin=1.669). The top-level domains were Wearable and Sensor Technology (n=1,695) and Biological and Preventive Technology (n=903). Within the Wearable domain, a Massage and Vibration Devices sub-domain (n=81) was identified. Within the Biological domain, eight sub-domains were identified: Functional Food and Bioactive Extracts (n=163), Stem Cells and Regenerative Therapy (n=122), Antibody and Antiviral Therapeutics (n=49), Atrophy and Animal Models (n=38), Pharmaceuticals and Chemical Compounds (n=25), Aging and Sarcopenia Mechanisms (n=24), Protein Biomarkers and Diagnostics (n=18), and Other Biological/Preventive Technologies (n=464). A further 238 patents (8.4%) were classified as noise.
Patent filings increased steadily from 2004 to 2023, peaking in 2022 (Figure 4). The Wearable and Sensor domain maintained the highest annual filing volume throughout the study period, increasing from fewer than 20 filings per year before 2010 to a peak of 199 filings in 2022. The Biological and Preventive domain showed increased filing activity after 2015, peaking at 62 filings in 2021. The Functional Food and Bioactive Extracts sub-domain increased gradually after 2017 and peaked at 35 filings in 2022. Stem Cells and Regenerative Therapy and Massage and Vibration Devices showed lower but persistent annual activity, with peak filings of 29 in 2022 and 16 in 2021. The remaining biological sub-domains maintained relatively low annual filing volumes throughout the study period.
DISCUSSION
This study identified two dominant technological macro-domains and several finer-grained biological sub-domains from 2,836 unique muscle-related patents using NLP and unsupervised machine learning techniques. UMAP-based dimensionality reduction (10 dimensions) followed by HDBSCAN clustering produced two well-validated macro-domains and several coherent sub-domains, with cluster validity metrics substantially better than those of the supplementary DBSCAN analysis. The largest group was centered on measurement devices, wearable technologies, and sensor applications, showing consistent growth over the years. Another major group was characterized by technologies related to disease prevention, functional improvements, and bioactive substances. Trend analysis revealed a steady increase in patent filings, with measure-ment- and biohealth-related technologies leading to overall growth.
The analysis of muscle-related patent data revealed two dominant clusters representing distinct technological domains. The Wearable and Sensor domain, the largest with 1,695 patents, is centered on wearable sensors and body measurement technology. Frequent keywords such as signal, measurement, wearable, and sensor indicate a strong focus on physiological monitoring systems. These technologies involve smart devices that can track muscle activity, motion, and biometric data in real time. This trend aligns with the global rise of digital health, personalized fitness, and remote rehabilitation.22–24 The growing availability of compact, non-invasive sensors and the integration of AI into healthcare have supported continuous growth in this area.25 Previous studies have emphasized the importance of wearables in enabling real-time and personalized health monitoring.23,25 From a clinical perspective, the wearable and sensor technologies in the Wearable and Sensor domain hold immediate relevance for sarcopenia screening, post-stroke and post-operative rehabilitation monitoring, gait-based fall-risk assessment in older adults, and real-time biofeedback during physical therapy, where surface electromyography and inertial sensing enable individualized progress tracking.26 Within the Wearable and Sensor macro-domain, hierarchical sub-clustering additionally identified a Massage and Vibration Devices sub-domain (n=81) characterized by terms such as massage, vibration, acupressure, and roller.
The Biological and Preventive domain, consisting of 903 patents, encompasses a wide range of biological and biochemical technologies aimed at improving physiological function, preventing diseases, and modulating internal processes. Keywords such as prevention, disease, cell, and protein suggest that this cluster includes bioactive compounds, therapeutic formulations and regenerative treatments. The increase in filings in this area since the mid-2010s reflects a convergence of biotechnology, preventive medicine, and functional enhancement, particularly in response to aging populations and chronic health conditions.27,28 Hierarchical sub-clustering identified seven internally coherent sub-domains within this macro-domain: Functional Food and Bioactive Extracts (n=163), Stem Cells and Regenerative Therapy (n=122), Antibody and Antiviral Therapeutics (n=49), Atrophy and Animal Models (n=38), Pharmaceuticals and Chemical Compounds (n=25), Aging and Sarcopenia Mechanisms (n=24), and Protein Biomarkers and Diagnostics (n=18), together with a heterogeneous residual of approximately 464 patents that did not consolidate into dense sub-clusters, indicating substantial diffuse innovation alongside the well-defined sub-domains. The Protein Biomarkers and Diagnostics sub-domain also included patents related to livestock muscle proteomics and meat-quality discrimination, which may explain the occurrence of terms such as “Korean beef” among the representative keywords.
The Functional Food and Bioactive Extracts sub-domain, characterized by terms related to protein extracts and bioactive food ingredients, reflects active patenting of nutritional and functional products targeted at musculoskeletal health Although these everyday food terms may appear unconventional in a patent-clustering context, they correspond to a substantive functional food industry recognized under Korean health-food regulations, and therefore reflect a legitimate technological domain rather than artifacts of incomplete preprocessing; studies on dietary protein and bioactive peptides have demonstrated their positive effects on muscle mass maintenance and recovery, especially in populations vulnerable to sarcopenia.29 From a clinical perspective, these biological sub-domains offer direct translation pathways, including bioactive peptides and protein-rich functional formulations for muscle mass preservation in sarcopenia and cachexia, cell-based and regenerative approaches for muscle repair after acute injury, and myostatin pathway pharmacological candidates for muscular dystrophy and age-related atrophy.30–32 Linking the Biological and Preventive domain with the Wearable and Sensor domain through digital monitoring and biological intervention suggests an emerging clinical pipeline in which wearable sensors quantify therapeutic response while biological products provide the therapy itself.
Beyond these well-defined sub-domains, 238 patents (8.4%) constituted a heterogeneous long tail of specialized topics that did not form sufficiently dense clusters under HDBSCAN, consistent with possible over-fragmentation in the original DBSCAN partition. The themes included veterinary rehabilitation technologies, diabetes-related muscle preservation, respiratory and sleep monitoring systems, compression devices, and vibration-based therapies. Previous studies have reported increasing application of rehabilitation technologies in veterinary medicine,33,34 metabolic interventions targeting diabetes-associated muscle atrophy,35,36 and respiratory monitoring approaches related to physical recovery and sleep disorders in older adults.37 Compression garments and vibration-based therapies have likewise been associated with muscle recovery and fatigue reduction,38–40 and neuromuscular electrical stimulation has been applied for pelvic floor and post-operative rehabilitation.41,42 Collectively, the diversity and low density of these themes suggest that muscle-related innovation extends beyond the two dominant macro-domains into a broader landscape of lower-volume specialized technologies.
Patent filing trends showed a steady increase, with a significant acceleration after 2015. This increase reflects technological advances in wearables, IoT-based healthcare, and biotechnology, along with growing societal and policy support. Notably, filings in the Wearable and Sensor and the Biological and Preventive macro-domains have driven much of this growth. The impact of the COVID-19 pandemic further heightened the demand for health technologies, likely contributing to the filing peak in 2022. The apparent decline in filings observed in the most recent years (2024–2025) should be interpreted with caution, as KIPRIS records are subject to publication delays, typically about 18 months from filing for unexamined applications, as well as an additional lag between application and registration. Therefore, the most recent annual counts represent partial rather than complete yearly cohorts. Meanwhile, the smaller sub-domains and long-tail specialized topics showed modest but persistent activity, underscoring the expanding scope of muscle-related innovations across health, therapy, and wellness applications.
The optimal dimensionality reduction and clustering parameters used in the analysis were determined based on objective and reproducible criteria, allowing the formation of the meaningful clusters. Nevertheless, this study has some limitations. First, the analysis was based only on patent titles and abstracts, which may not fully capture the technical depth or novelty of each invention. Important details found in the full text or claims were not included, potentially affecting the accuracy of the clustering results. Second, this study did not examine the commercialization status or real-world application of patents. Without linking to market data or product development outcomes, it is difficult to assess the practical impact or economic value of these technologies. Third, the scope was limited to domestic (Korean) patents, excluding international patents. This national focus may not fully reflect Korea’s position in the global innovation landscape or capture the cross-border technology trends. Fourth, while clustering and keyword extraction were based on objective algorithms, the interpretation of clusters involved a degree of subjectivity. Manual labeling and thematic grouping may vary depending on the analyst perspective. Fifth, this study analyzed only the volume of patent filings and did not incorporate qualitative impact indicators. Future studies should integrate forward and backward citation network analyses to identify high-impact and foundational inventions, examine patent family size as a proxy for international commercial reach, and link patent records to commercialization indicators, such as product launches, regulatory approvals, and licensing activity, to assess which technological domains carry the greatest real-world influence. Finally, smaller clusters with niche or emerging technologies were not explored in detail, although they may represent important directions for future research. Future research should integrate full patent texts, citation networks, and global data sources to provide a more comprehensive perspective. Furthermore, combining patent analysis with clinical, regulatory, or market information could provide greater insights into the real-world relevance of muscle-related technologies.
CONCLUSIONS
This study analyzed domestic muscle-related patent data using NLP and machine learning techniques to identify technological themes and trends over the past 20 years. The analysis revealed two dominant categories: wearable sensors and measurement technology, and biological disease prevention and improvement technology, alongside several finer-grained biological subdomains, including stem cell and regenerative therapy, functional food and bioactive extracts, antibody and antiviral therapeutics, and aging mechanism research. This patent-based analysis provides insights into the structure and evolution of muscle-related technologies, suggesting potential clinical applications. This may serve as a valuable foundation for the future development of diagnostic and therapeutic strategies in the musculoskeletal field.







