Counting the Crowd with Wi-Fi: A Federated Learning Approach

Author: Denis Avetisyan

A new system leverages the power of distributed learning to accurately estimate crowd size using readily available Wi-Fi signals, even with diverse data sources.

Distinct environmental layouts were explored to facilitate data collection for Wi-Fi Channel State Information (CSI)-based crowd counting, acknowledging that all sensing systems are subject to the constraints-and opportunities-inherent in their operational context.

FedAPA adaptively aggregates prototypes and utilizes contrastive learning to address data and model heterogeneity in federated Wi-Fi CSI-based crowd counting.

While widespread deployment of Wi-Fi-based sensing for tasks like crowd counting is hampered by the need for site-specific data, this work introduces FedAPA: Federated Learning with Adaptive Prototype Aggregation Toward Heterogeneous Wi-Fi CSI-based Crowd Counting, a novel federated learning approach to overcome these limitations. By adaptively weighting class prototypes and leveraging contrastive learning, FedAPA effectively addresses data and device heterogeneity, yielding improved accuracy and reduced communication overhead in distributed Wi-Fi deployments. Our results demonstrate significant performance gains over existing methods-including a 9.65% increase in accuracy-but can this adaptive aggregation strategy be extended to other Wi-Fi-based sensing applications beyond crowd counting?

The Inevitable Shift: Beyond Direct Observation

Conventional methods of determining crowd size, such as those employing camera systems or requiring individuals to wear tracking devices, present significant hurdles beyond mere technical implementation. The pervasive nature of camera surveillance raises legitimate privacy concerns for citizens, while the logistical demands of distributing and maintaining wearable sensors across large populations – ensuring battery life, data collection, and participant compliance – often prove impractical and costly. Furthermore, these direct sensing approaches necessitate the collection of personally identifiable information, creating potential vulnerabilities regarding data security and misuse. Consequently, a growing need exists for alternative solutions that can accurately assess crowd density while simultaneously safeguarding individual privacy and reducing the complexities associated with widespread deployment.

A paradigm shift in crowd monitoring is occurring with the development of device-free counting techniques, which estimate the number of people present by analyzing naturally occurring ambient signals rather than relying on direct observation via cameras or sensors. These innovative systems capitalize on disruptions to existing signals – such as radio frequency (RF), WiFi, or even subtle changes in the electromagnetic field – caused by the movement and presence of individuals within a space. By analyzing the patterns and variations in these ambient signals, algorithms can infer crowd density and distribution without needing to track or identify specific people, thereby addressing significant privacy concerns and logistical hurdles associated with traditional methods. This approach offers a compelling advantage in scenarios where discreet monitoring is crucial, or when covering large areas with minimal infrastructure becomes essential, paving the way for more efficient and privacy-respecting public space management.

The utility of device-free counting methods becomes exceptionally pronounced within environments where traditional surveillance poses practical or ethical dilemmas. Utilizing radio frequency (RF) signals, these techniques offer discreet density estimation in spaces like hospitals, places of worship, or large public gatherings – scenarios where camera deployment is intrusive or simply unfeasible. Unlike systems requiring individual tracking, RF-based counting operates by analyzing changes in signal characteristics – such as signal strength or phase – caused by the presence of people, effectively treating the crowd as a dynamic radio-frequency scattering medium. This approach not only bypasses privacy concerns associated with visual monitoring but also scales readily to expansive areas, offering a cost-effective solution for monitoring foot traffic in shopping malls, transportation hubs, or even agricultural settings where precise personnel location isn’t required, merely the overall count.

Interpreting the ambient signals used in device-free counting presents a significant analytical hurdle. Radio frequency (RF) signals, for instance, are susceptible to multipath fading, interference from other devices, and attenuation caused by building materials and even the human body. Sophisticated algorithms are therefore crucial to disentangle the subtle variations caused by the presence of individuals from this pervasive environmental noise. Researchers employ techniques like signal filtering, advanced statistical modeling, and machine learning to identify patterns indicative of crowd density, but achieving robust and reliable performance remains a core challenge. The accuracy of these systems is fundamentally limited by the ability to effectively subtract the noise floor and reliably correlate signal changes with actual person counts, requiring ongoing innovation in signal processing and data analysis.

Harnessing the Wireless Realm: Wi-Fi as a Sensing Modality

Wi-Fi sensing leverages the changes in signal characteristics – specifically, Received Signal Strength Indicator (RSSI), Channel State Information (CSI), and Time of Flight (ToF) – to determine the location and quantity of devices, and by extension, people, within a defined space. These technologies operate by analyzing how Wi-Fi signals are reflected, scattered, and absorbed by objects and people. Algorithms then process these signal variations to create a ‘fingerprint’ of the environment, enabling the system to estimate the number of individuals present based on the detected signal disturbances. This differs from traditional Wi-Fi usage, which prioritizes data transmission; here, the signals themselves are the data source for counting and positioning.

The economic viability of utilizing Wi-Fi for people counting stems from two primary factors: existing infrastructure and signal propagation characteristics. Most buildings already possess a Wi-Fi network deployed for data communication, eliminating the need for dedicated sensor hardware and cabling installation. Furthermore, radio frequency signals within the 2.4 GHz and 5 GHz bands commonly used by Wi-Fi are capable of penetrating common building materials like walls, floors, and furniture. This penetration allows for signal detection across a wide area without requiring line-of-sight, reducing the number of access points needed for coverage and minimizing deployment costs compared to technologies like cameras or infrared sensors. This combination of leveraging existing assets and robust signal behavior positions Wi-Fi as a scalable and cost-effective solution for widespread deployment in various indoor environments.

Raw Wi-Fi signals, while readily available, are inherently noisy and contain contributions from multiple sources beyond people counting, including signal reflections, interference from other wireless devices, and environmental factors. Consequently, specialized signal processing techniques are required to isolate the signal variations attributable to human presence. These techniques often involve applying machine learning algorithms trained on datasets that correlate specific Wi-Fi signal characteristics – such as Received Signal Strength Indicator (RSSI), Channel State Information (CSI), and angle of arrival – with known occupancy levels. Filtering algorithms are also employed to reduce the impact of static and dynamic noise, and advanced methods like Kalman filtering can track changes in signal strength over time to improve the reliability of people count estimates.

The performance of Wi-Fi-based people counting systems is significantly impacted by the characteristics of the deployment environment. Factors such as building materials, furniture arrangement, and the presence of moving objects – including people not part of the counted population – introduce signal reflections, multipath effects, and non-line-of-sight interference. These phenomena distort the received signal strength indicators (RSSI) and channel state information (CSI) used for estimation, reducing accuracy. Furthermore, dynamic environments with frequent changes in layout or the introduction of new obstacles require continuous recalibration and adaptation of the algorithms to maintain reliable counting. The complexity increases substantially in open-plan spaces or areas with high foot traffic, where distinguishing individual signals becomes more difficult.

The encoder learns CSI features for classification by combining cross-entropy loss with contrastive alignment to prototype sets, initially prioritizing classification before gradually emphasizing robust representation learning.

Decentralized Intelligence: The Promise of Federated Learning

Federated Learning (FL) is a distributed machine learning approach designed to train algorithms on decentralized datasets residing on individual devices – such as smartphones or edge servers – without the explicit exchange of data samples. This is achieved by training local models on each device using its own data, and then aggregating these models – typically through averaging or more sophisticated techniques – to create a global model. The core principle is to share model updates, rather than raw data, thereby preserving data privacy and addressing concerns related to data security and regulatory compliance. This approach is particularly relevant in scenarios where data is sensitive, voluminous, or subject to legal restrictions, such as healthcare, finance, and personalized services.

Existing Federated Learning (FL) methodologies, including MOON, CARING, WiFederated, FedRep, FedBN, and FedProto, have been investigated for diverse machine learning tasks. However, these approaches commonly encounter performance degradation when applied to Non-IID (Non-Independent and Identically Distributed) data. Non-IID data refers to scenarios where each participating device possesses a unique data distribution, differing in both feature and label spaces. This heterogeneity introduces statistical variance during the model aggregation phase, causing the global model to diverge from optimal performance. Specifically, variations in data quantity, feature distributions, and label distributions across devices contribute to the challenges faced by these FL algorithms, requiring mitigation strategies to ensure robust and accurate model training in realistic deployment scenarios.

Data heterogeneity, or Non-IID (Non-Independent and Identically Distributed) data, presents a significant challenge to the efficacy of standard Federated Learning (FL) algorithms. In typical FL deployments, data is often distributed unevenly across participating devices or locations; this means each device possesses a unique data distribution that differs from the global distribution and from other devices. This statistical variance can lead to model divergence during training, as local model updates are biased towards the specific characteristics of each device’s data. Consequently, aggregating these biased local models into a global model results in reduced generalization performance and overall accuracy. The severity of this degradation is directly proportional to the degree of data heterogeneity present in the distributed dataset.

FedAPA addresses performance degradation in Federated Learning caused by Non-IID data by integrating global model knowledge and employing adaptive prototype aggregation. This approach utilizes global information to guide local model updates, mitigating the impact of data distribution differences across devices. Experimental results demonstrate a 15.19% improvement in accuracy when dealing with data heterogeneity and a 33.5% increase in accuracy in scenarios characterized by model heterogeneity, indicating a significant enhancement in robustness and generalization capability compared to standard Federated Learning algorithms.

FedAPA performance is sensitive to both temperature and the warm-up lambda coefficient, demonstrating the importance of these parameters for effective federated learning.

Harmonizing Local and Global Insights: The Mechanics of FedAPA

FedAPA employs Prototype Contrastive Loss (PCL) as a key mechanism for knowledge transfer in federated learning. PCL operates by minimizing the distance between local client prototypes and the global model prototype, thereby encouraging consistency across distributed nodes. Specifically, the loss function calculates the contrastive loss between these prototypes, penalizing significant discrepancies and promoting alignment. This alignment process ensures that local updates contribute to a more cohesive global model, effectively reducing the variance introduced by heterogeneous data distributions and improving the overall accuracy of the federated model. The implementation of PCL facilitates the distillation of knowledge from individual clients into a unified global representation, enhancing generalization performance and model robustness.

The Adaptive Prototype Aggregation (APA) module functions by evaluating the similarity between prototypes generated by individual clients and a global model representation. This evaluation utilizes a similarity metric to assign weights to each client prototype; higher similarity scores result in greater weight during the aggregation process. The weighted aggregation effectively prioritizes prototypes that align with the global model, reducing the influence of noisy or irrelevant local data. This dynamic weighting scheme allows the model to focus on the most representative features, improving the accuracy and robustness of the federated learning process by intelligently combining local insights with global knowledge.

The FedAPA framework demonstrably reduces the negative effects of data and model heterogeneity on overall performance. Testing indicates a 14.59% improvement in F1-Score when addressing data heterogeneity, signifying increased accuracy in scenarios with varying data distributions across clients. Furthermore, the framework achieves a 39.49% increase in F1-Score in the presence of model heterogeneity, demonstrating its ability to effectively aggregate knowledge from diverse model architectures or initializations. These gains indicate improved generalization capabilities and robustness across varied deployment conditions.

FedAPA addresses challenges in crowd counting across diverse environments by integrating global model knowledge with locally sourced data. Performance evaluations demonstrate a reduction in Mean Absolute Error (MAE) of 0.46 in scenarios exhibiting data heterogeneity and 0.78 in environments characterized by model heterogeneity. This improvement indicates that the system effectively leverages both overarching learned patterns and specific local data characteristics to enhance counting accuracy, providing a robust and scalable solution for complex crowd analysis applications.

The proposed framework enables personalized federated learning by aggregating client-specific channel state information (CSI) to create tailored prototype sets for optimizing both backbone and classifier networks using a hybrid loss function.

Beyond Numbers: Implications for Smart Cities and Beyond

The evolution of device-free crowd counting, and notably the implementation of Federated Averaging (FL) techniques like FedAPA, presents a transformative opportunity for the development of truly responsive smart cities. By accurately estimating crowd density without requiring individual tracking – thus preserving privacy – urban planners and emergency responders gain access to real-time data crucial for optimizing resource allocation. This includes dynamically adjusting public transportation schedules, strategically deploying security personnel, and efficiently managing public spaces like parks and event venues. Furthermore, the decentralized nature of FL, where data processing occurs on local devices, reduces bandwidth requirements and enhances data security, making widespread implementation across complex urban environments both feasible and sustainable. This technology doesn’t simply count people; it empowers cities to proactively adapt to the needs of their populations, fostering safer, more efficient, and more livable urban centers.

The potential for device-free crowd counting extends significantly into practical applications for public spaces and event management. Accurate, real-time population data allows for dynamic resource allocation – adjusting public transportation schedules, deploying security personnel, and ensuring adequate sanitation facilities respond directly to need. Beyond logistical improvements, this technology dramatically enhances safety and security; anomalous crowd behavior or rapidly increasing densities can trigger alerts, enabling proactive interventions to prevent dangerous situations like trampling or overcrowding. Large-scale events, from concerts to sporting competitions, benefit from optimized emergency response planning, while public spaces like parks and plazas can be monitored to ensure comfortable and safe conditions for all visitors, fostering a more responsive and secure urban environment.

The true potential of device-free crowd counting extends beyond simply quantifying numbers; future studies aim to fuse this technology with data from diverse sensor networks to unlock nuanced understandings of human behavior. Integrating signals from sources like environmental sensors – monitoring temperature, air quality, or noise levels – alongside communication data from mobile devices, and even video analytics focused on individual actions, could reveal patterns beyond population density. This multi-sensor approach promises to differentiate between normal and anomalous crowd activity, predict potential bottlenecks or safety hazards, and ultimately create “smart spaces” that proactively respond to the needs of occupants. Such integrated systems could, for example, correlate crowd density with environmental factors to optimize ventilation in public transport hubs, or identify unusual movement patterns indicative of distress or security breaches, moving beyond simple counting towards truly intelligent environmental awareness.

The principles underpinning device-free crowd counting, refined through techniques like FedAPA, extend beyond simply enumerating people. This technology demonstrates considerable potential for broader applications centered around understanding spatial dynamics. Researchers envision systems capable of accurately estimating occupancy levels in buildings – optimizing energy usage and resource allocation – and even recognizing basic human activities within a space. By analyzing subtle shifts in the wireless signal environment, these systems could discern patterns indicative of movement, such as walking, standing, or gathering, without relying on cameras or personal devices. This capability unlocks possibilities for improved building management, enhanced security protocols, and a deeper understanding of how people interact with their surroundings, ultimately paving the way for truly intelligent environments.

FedAPA performance is sensitive to both feature dimensions and the number of warm-up rounds, indicating the importance of these parameters for effective federated learning.

The pursuit of robust systems, as demonstrated by FedAPA, inherently acknowledges the inevitability of decay. This research, focusing on federated learning for Wi-Fi CSI-based crowd counting, actively mitigates the effects of heterogeneous data-a form of systemic entropy. Donald Davies observed, “You reason through version history: every commit is a record in the annals, and every version a chapter.” FedAPA embodies this principle; adaptive prototype aggregation and contrastive learning represent iterative ‘commits,’ refining the model against the challenges of real-world variability. The system doesn’t attempt to prevent decay, but rather to manage it through continuous adaptation, ensuring graceful aging even amidst diverse and evolving data landscapes.

What Lies Ahead?

The presented work, like any logging of system state, provides a snapshot, a momentary calibration against the inevitable drift toward entropy. FedAPA addresses the immediate challenges of heterogeneous data in federated learning, but the chronicle isn’t complete. The aggregation of prototypes, while demonstrably effective, remains a localized solution; future iterations must consider the long-term evolution of these prototypes across a distributed network, acknowledging that even ‘adaptive’ systems eventually succumb to the pressures of differing timescales.

A significant, and perhaps unavoidable, limitation lies in the very premise of ‘federated’ learning – the assumption that data distribution, while disparate, remains relatively stable. Real-world deployments, however, present a constantly shifting landscape. The system’s timeline will be marked by node failures, network disruptions, and the introduction of entirely new data modalities. Research should therefore investigate methods for gracefully degrading performance under these conditions, prioritizing resilience over peak accuracy.

Ultimately, the true test of FedAPA, and of federated learning itself, won’t be its performance on benchmark datasets, but its ability to function as a durable component within a larger, more complex ecosystem. The focus must shift from achieving incremental improvements to building systems that accept, even embrace, the inevitability of decay – systems that age, not gracefully perhaps, but persistently.

Original article: https://arxiv.org/pdf/2511.21048.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/