Author: Denis Avetisyan
A new study rigorously evaluates how well machine learning models trained in one home can accurately detect occupancy in others, revealing crucial insights for truly smart building technology.

Researchers compare the generalization performance of Logistic Regression, Support Vector Machines, and Long Short-Term Memory networks for occupancy detection using time series data from multiple residential units and a digital twin.
Accurate and reliable occupancy detection is crucial for optimizing energy usage and enhancing comfort in smart building environments, yet models trained on data from one residential unit often struggle to perform consistently in others. This paper, ‘Generalizability of Learning-based Occupancy Detection in Residential Buildings’, investigates the cross-apartment performance of three machine learning approaches-logistic regression, support vector machines, and long short-term memory networks-using data from a real-world testbed and a calibrated digital model. Results demonstrate that while all models achieve comparable in-apartment accuracy, LSTM networks exhibit superior generalization capabilities, offering a promising pathway toward robust and scalable occupancy detection systems. Will these findings pave the way for truly adaptive and personalized smart home experiences?
Unveiling Occupancy: Patterns in Environmental Data
Precisely determining how many people occupy a building is fundamental to realizing the potential of smart building technologies and optimizing energy usage, but current detection methods frequently fall short of delivering reliable data. Existing systems often rely on simplistic approaches – like counting individuals through doorways or using basic motion sensors – which struggle to account for the nuanced reality of building usage. These limitations lead to inaccuracies when estimating true occupancy, hindering the ability to effectively adjust heating, ventilation, and air conditioning (HVAC) systems, or to understand how spaces are actually utilized. Consequently, buildings may waste energy heating or cooling unoccupied areas, or fail to provide adequate comfort when spaces are crowded, demonstrating a clear need for more sophisticated and accurate detection techniques.
Conventional occupancy detection systems frequently falter when faced with the nuances of authentic building environments. Simple motion sensors, for instance, cannot differentiate between a person actively using a space and one merely passing through, leading to inflated occupancy counts and wasted energy. Furthermore, human behavior is inherently unpredictable; individuals may remain stationary for extended periods while working, cluster in impromptu meetings, or utilize spaces in ways not anticipated by the system’s programming. These variations, combined with factors like furniture arrangements, lighting conditions, and the presence of pets or large objects, introduce significant errors. Consequently, traditional methods often prove unreliable in delivering the precise, granular data needed for truly effective building management and optimization.
Effective occupancy detection transcends the limitations of conventional motion sensors by integrating a diverse array of environmental data. Researchers are discovering that a comprehensive understanding of space utilization requires analyzing factors such as carbon dioxide levels, soundscapes, and even Wi-Fi signal strengths – each providing unique insights into human presence and activity. This multi-sensor approach allows for differentiation between mere movement and actual occupancy, as well as the potential to estimate the number of occupants, rather than simply detecting if anyone is present. By correlating these indicators, systems can achieve a far more nuanced and accurate assessment of building usage, leading to optimized energy consumption and improved building management strategies.
The KTH Live-In Lab functions as a fully-functional apartment meticulously designed for research into future living solutions, offering an unparalleled platform for refining occupancy detection technologies. This unique environment allows researchers to gather comprehensive datasets encompassing not only conventional sensor readings, but also nuanced behavioral patterns, environmental factors like temperature and lighting, and even appliance usage-all within a realistic domestic setting. Unlike simulated or limited-scope testing grounds, the Lab facilitates longitudinal studies capturing the complexities of daily life, enabling the development and validation of algorithms capable of distinguishing between mere presence and genuine occupancy – crucial for optimizing energy consumption and creating truly intelligent buildings. The detailed data streams generated within the Lab are instrumental in moving beyond simple motion detection toward a richer, more accurate understanding of how spaces are actually used.

Environmental Signatures: Decoding Human Presence
The correlation between indoor environmental parameters and occupancy levels was investigated based on the premise that human metabolic activity directly influences these factors. Specifically, increased occupancy leads to a measurable rise in indoor temperature due to body heat, an increase in relative humidity from respiration, and a corresponding elevation in carbon dioxide (CO2) concentration as a byproduct of breathing. This hypothesis posits that a combined analysis of temperature, humidity, and CO2 levels provides a statistically reliable indicator of the number of occupants within a defined space, offering a non-invasive method for assessing building usage.
Traditional methods of determining building occupancy, such as door counters or badge swipes, provide limited data regarding how spaces are utilized and for how long. Conversely, monitoring environmental features like temperature, humidity, and carbon dioxide (CO2) concentration offers a more detailed profile of building usage patterns. These features reflect metabolic activity – a proxy for the number of occupants and their activity level – providing insight beyond simple presence or absence. For example, a consistently elevated CO2 level suggests prolonged occupancy, while fluctuations correlate with intermittent use. This granular data allows for a more nuanced understanding of space utilization, enabling optimized resource allocation and improved building management strategies compared to binary occupancy detection.
The rate of change of carbon dioxide concentration, termed CO2 Slope, functions as a dynamic indicator of occupancy by reflecting ventilation and metabolic activity over time. Unlike static CO2 readings which only indicate the presence of occupants, CO2 Slope – calculated as the change in CO2 concentration per unit time – reveals information about the number of occupants and their activity level. A rapidly increasing CO2 Slope suggests recent or ongoing occupancy, while a stable or decreasing slope indicates either vacancy or sufficient ventilation. Incorporating CO2 Slope as a feature in occupancy detection models significantly improves accuracy, particularly in distinguishing between occupied and unoccupied states and in estimating occupancy density, as it captures temporal dynamics not present in single-point measurements.
Feature engineering is critical for maximizing the predictive power of indoor environmental features. Raw data from temperature, humidity, and CO2 sensors often requires transformation to improve model performance; this includes scaling, normalization, and the creation of interaction terms. Specifically, calculating the [latex] \frac{\Delta CO2}{\Delta Time} [/latex] – the CO2 Slope – represents a derived feature that captures the rate of CO2 change, providing temporal information absent in static CO2 readings. Careful feature selection, informed by domain expertise and statistical analysis, reduces dimensionality and mitigates overfitting, ultimately leading to more robust and accurate occupancy detection models. The process may also involve handling missing data and outlier removal to ensure data quality and model stability.

Model Validation: A Rigorous Examination of Predictive Power
Three machine learning models – Logistic Regression, Support Vector Machine (SVM), and Long Short-Term Memory (LSTM) – were implemented to predict occupancy levels utilizing environmental feature data. Logistic Regression served as a baseline model due to its computational efficiency and interpretability. The SVM model was selected for its effectiveness in high-dimensional spaces and capacity for non-linear relationships. The LSTM model, a recurrent neural network, was included to leverage potential temporal dependencies within the environmental data streams, allowing it to consider sequential patterns for improved prediction accuracy. All models were trained and evaluated using the same dataset of environmental features collected from the KTH Live-In Lab.
Bayesian Optimization was utilized to automate the process of hyperparameter tuning for both the Support Vector Machine (SVM) and Long Short-Term Memory (LSTM) models. This technique employs a probabilistic model to efficiently explore the hyperparameter space, balancing exploration of novel configurations with exploitation of promising regions. The optimization process sought to maximize model performance, specifically measured by the F1 Score on the validation dataset. Unlike grid or random search, Bayesian Optimization leverages past evaluation results to intelligently suggest subsequent hyperparameter settings, resulting in a more efficient convergence to optimal or near-optimal values. The hyperparameters tuned included parameters governing regularization strength, kernel selection for the SVM, and the number of layers and neurons within the LSTM network.
Evaluation of the Logistic Regression, Support Vector Machine, and Long Short-Term Memory models on the Apartment 2 test dataset yielded a consistent F1 Score ranging from approximately 0.88 to 0.90. This indicates a high degree of similarity in predictive capability across all three algorithms when applied to this specific dataset. The F1 Score, calculated as the harmonic mean of precision and recall, provides a balanced measure of the models’ accuracy in predicting occupancy, demonstrating that none of the models significantly outperformed the others in terms of overall performance on this data.
The Long Short-Term Memory (LSTM) model exhibited enhanced performance due to its capacity to process sequential data, effectively capturing temporal dependencies present within the environmental datasets used for occupancy prediction. This capability distinguishes it from models like Logistic Regression, which does not inherently account for time-series information. However, this increased complexity translated to a substantially longer training duration; the LSTM model required over 50 minutes to complete training, compared to the 11.3 seconds required for Logistic Regression. This difference in training time represents a trade-off between predictive accuracy and computational cost.
A calibrated Digital Twin of the KTH Live-In Lab was constructed utilizing the IDA ICE platform to facilitate model evaluation and validation. This virtual representation of the physical environment allowed for repeatable and controlled experimentation, mitigating the challenges associated with real-world testing. Calibration involved aligning the Digital Twin’s behavior with historical data from the KTH Live-In Lab, ensuring its predictive accuracy reflected actual building performance. The Digital Twin served as the sole environment for assessing the Logistic Regression, Support Vector Machine, and Long Short-Term Memory models, enabling a consistent and comparable evaluation of their ability to predict occupancy based on environmental features without disrupting the live laboratory environment.

Extending the Vision: Generalization and Implications for Sustainable Buildings
The capacity of these predictive models to perform accurately in previously unseen environments was a central focus of evaluation. Researchers rigorously tested the algorithms’ ability to generalize across multiple apartment units, moving beyond the limitations of single-apartment training. Through the implementation of Cross-Apartment Generalization techniques – a methodology designed to enhance adaptability – a demonstrable improvement in predictive performance was achieved. This approach effectively mitigates the risk of overfitting to specific apartment characteristics, fostering models capable of reliably forecasting occupancy patterns in diverse residential settings and ultimately paving the way for broader deployment in smart building initiatives.
Despite a noted performance decrease – achieving an F1 Score between 0.75 and 0.85 when applied to data from Apartment 3 – the predictive models demonstrated sustained acceptable accuracy levels. This suggests a degree of robustness even when confronted with novel environments, highlighting the model’s ability to generalize, albeit with some performance loss. Notably, the Long Short-Term Memory (LSTM) network consistently outperformed other architectures throughout testing, reinforcing its suitability for capturing the temporal dependencies inherent in occupancy patterns and maintaining reliable predictions even with shifting data characteristics. This continued efficacy of LSTM is crucial for real-world application where complete data consistency is unlikely.
The predictive power of occupancy models was notably strengthened by incorporating seasonal trends. Analyses revealed that human behavior within buildings isn’t static; occupancy patterns demonstrably shift with the time of year – for example, increased activity during school semesters or reduced usage over summer breaks. By explicitly accounting for these cyclical variations, the models moved beyond simply recognizing when spaces were occupied, and began to anticipate how occupancy would change based on the calendar. This proactive approach significantly improved prediction accuracy and reliability, reducing the impact of unpredictable fluctuations and enabling more robust forecasts of building energy needs and resource allocation. The integration of seasonality, therefore, proved crucial in developing a truly responsive and adaptable system for smart building management.
Data-driven methodologies are proving increasingly vital for addressing the complex challenges of building energy consumption and fostering genuinely sustainable infrastructure. This research demonstrates how sophisticated modeling, leveraging occupancy data, can move beyond static energy management towards dynamically responsive systems. By accurately predicting building usage patterns, these approaches enable proactive adjustments to heating, cooling, and lighting – minimizing waste and maximizing efficiency. The implications extend beyond simple cost savings; optimized energy use directly contributes to reduced carbon emissions and a smaller environmental footprint, paving the way for smarter, more resilient, and ecologically sound built environments capable of adapting to future needs and contributing to global sustainability goals.
The core methodology established in this research extends beyond the specific context of apartment occupancy prediction, offering a versatile framework for a range of smart building applications. The techniques for data integration, feature engineering, and model training – particularly the Cross-Apartment Generalization and Seasonality adjustments – are readily adaptable to forecasting energy demands, optimizing HVAC systems, or even predicting equipment failures in diverse building types. Furthermore, the study’s success in mitigating performance degradation across different environments suggests the robustness of the approach, facilitating its deployment in geographically distinct locations with varying climate conditions and occupant behaviors. This adaptability positions the developed methodology as a valuable tool for enhancing building intelligence and promoting sustainable practices on a global scale, contributing to more efficient and responsive built environments worldwide.

The study meticulously dissects the performance of various machine learning models – Logistic Regression, Support Vector Machines, and Long Short-Term Memory networks – much like a scientist calibrating a microscope to observe a specimen. This methodical approach to occupancy detection, particularly the emphasis on generalization across diverse residential settings, echoes John Locke’s sentiment: “All mankind… being all equal and independent, no one ought to harm another in his life, health, liberty, or possessions.” Just as Locke advocated for protecting individual rights, this research seeks to establish a robust and reliable system – a ‘model’ – that consistently ‘detects’ occupancy without infringing upon the privacy or security of residents, ensuring a consistently accurate assessment across different ‘apartments’ and even a digital twin. The focus on generalization isn’t merely about technical accuracy; it’s about building a system that respects the unique characteristics of each environment.
Where to From Here?
The demonstrated success of LSTM networks in generalizing occupancy detection across diverse residential settings is notable, yet hardly conclusive. The relative simplicity of Logistic Regression, achieving comparable performance with significantly reduced computational demand, hints at a fundamental principle: complex models aren’t inherently superior. The observed performance gap between physical apartments and the digital twin suggests an over-reliance on data fidelity, overlooking the subtle, unquantifiable aspects of human behavior that shape occupancy patterns. It begs the question: are these models truly learning occupancy, or merely memorizing correlates?
Future investigations should move beyond simply optimizing model architecture and focus on feature engineering that captures the why behind occupancy-factors like time of day, day of week, seasonal changes, and even probabilistic estimations of resident activity. Furthermore, the development of robust anomaly detection techniques will be crucial for identifying deviations from established patterns, potentially indicating unusual events or system malfunctions. The digital twin, rather than being a direct replication of physical space, could become a platform for controlled experimentation, testing hypotheses about occupancy behavior under various simulated conditions.
Ultimately, the true measure of any model lies in its ability to predict beyond the training data. If a pattern cannot be reproduced or explained, it doesn’t exist.
Original article: https://arxiv.org/pdf/2604.14841.pdf
Contact the author: https://www.linkedin.com/in/avetisyan/
See also:
- Gear Defenders redeem codes and how to use them (April 2026)
- Annulus redeem codes and how to use them (April 2026)
- Robots Get a Finer Touch: Modeling Movement for Smarter Manipulation
- Last Furry: Survival redeem codes and how to use them (April 2026)
- All Mobile Games (Android and iOS) releasing in April 2026
- The Real Housewives of Rhode Island star Alicia Carmody reveals she once ‘ran over a woman’ with her car
- All 6 Viltrumite Villains In Invincible Season 4
- The Spider-Man: Brand New Day Trailer Finally Reveals MCU’s Scorpion & 5 Other Foes
- Vans Reimagines the Classic Slip-On With the OTW 98 “Knot Vibram” Pack
- 2 Episodes In, The Boys Season 5 Completes Butcher’s Transformation Into Homelander
2026-04-19 22:38