A main challenge of FL is that the training data are usually non-Independent, Identically Distributed (non-IID) on the clients, which may bring the biases in the model training and cause possible accuracy degradation. To address this issue, this paper aims to propose a novel FL algorithm to alleviate the accuracy degradation caused by non-IID data at clients. Firstly, we observe that the clients with different degrees of non-IID data present heterogeneous weight divergence with the clients owning IID data. Inspired by this, we utilize weight divergence to recognize the non-IID degrees of clients.
Key Words: Federated Learning,Client Selection
‘‘Federated learning with non-IID data’’ proposes that the public data could be distributed to clients such that the clients’ data become IID.
The work in ‘‘Hybrid-FL for wireless networks: Cooperative learning mechanism using non-IID data’’ proposes a Hybrid-FL scheme, which allows a few number of clients to upload their data to a FL server.
The work in ‘‘Astraea: Self-balancing federated learning for improving classification accuracy of mobile deep learning applications’’ assumes that the clients are willing to send their data distribution/label information to the server.
However, few work studies how to identify the non-IID degrees of the data at clients.
Similar to ‘‘Federated learning with non-IID data’’ , we assume that there exists a limited amount of approximately uniform data, which could be gathered from clients (e.g., 1% of the collection of the data from clients) or public available data.
- We define between clients, and observe that larger value of weight divergence indicates higher non-IID degree of data at client.
- Based on weight divergence, we propose an efficient client selection algorithm (CSFedAvg), in which clients with lower non-IID degrees of data will be more often chosen in training.
There are the following two questions:
How to differentiate the clients according to the degrees of their non-IID data, and choose them in the model training?
If we let the clients with lower degree of non-IID data participate in the training more often than the clients with higher degree of non-IID data, whether the accuracy of FL can be improved or not?
0x03 THE DESIGN OF CSFedAvg
A. OBSERVATION ON WEIGHT DIVERGENCE BETWEEN CLIENT MODELS
An intuition is that the divergence of models between clients with all IID data should be smaller than one of clients with non-IID data.
It is observed that the average weight divergence of the clients with non-IID data is always higher than the clients with IID data, which confirms our intuition.
Furthermore, the weight divergence of clients with IID data is very close to each other.
B. THE FRAMEWORK OF CSFEDAVG
CSFedAvg FL 主要过程如下：
Global Model Downloading
Client Model Update
Client Set Update and Model Aggregation
C. THE ALGORITHM DESIGN OF CSFedAvg
0x04 SIMULATION RESULTS
These two datasets will be dis- tributed to clients, with each client holding 500 sam- ples of CIFAR-10, and each client holding 600 samples of Fashion MNIST.
For both Hybrid I and Hybrid II settings, only a small number of clients, clients are assigned with IID data, while other clients are randomly assigned by images from a few certain limited classes.
Specifically, we set four 3 × 3 convolution layers (32, 32, 64, 64 channels, each of which was activated by ReLU and batch normalized, and every two of which were followed by 2 × 2 max pool- ing), two fully connected layers (384 and 192 units withReLU activation), and a final output layer with 10 units for training both CIFAR-10 and Fashion MNIST datasets.
B. PERFORMANCE COMPARISON ON TRAINING ACCURACY
There are two reasons why CSFedAvg can mitigate more accuracy loss on Hybrid I setting.
Firstly, clients with higher degree of non-IID data will lead to larger weight divergence, when the number of clients with IID data and non-IID data is constant.
Secondly, clients with higher degree of non-IID data can accelerate the degradation of the global model performance.
C. PERFORMANCE COMPARISON ON THE NUMBER OF COMMUNICATION ROUNDS
We record the Time of Arrival at a desired accuracy, denoted by ToA@x, where x denotes the target accuracy. Similarly, we measure the accu- racy of each algorithm after running 300 rounds, denoted by in the table.
D. THE IMPACT OF SELECTION FACTOR ON ACCURACY AND COMMUNICATION ROUNDS
we record the number of communication rounds required to reach a target accuracy based on both CIFAR-10 and Fashion MNIST datasets.
E. THE IMPACT OF THE NUMBER OF CLIENTS
F. THE IMPACT OF GLOBAL IMBALANCED DISTRIBUTION
- 5.1. A. SETUP
- 5.2. B. PERFORMANCE COMPARISON ON TRAINING ACCURACY
- 5.3. C. PERFORMANCE COMPARISON ON THE NUMBER OF COMMUNICATION ROUNDS
- 5.4. D. THE IMPACT OF SELECTION FACTOR α\alphaα ON ACCURACY AND COMMUNICATION ROUNDS
- 5.5. E. THE IMPACT OF THE NUMBER OF CLIENTS
- 5.6. F. THE IMPACT OF GLOBAL IMBALANCED DISTRIBUTION