While most prior works assume uniform and unbiased client selection, recent work on biased client selection has shown that selecting clients with higher local losses can improve error convergence speed.
In this paper, we present a bandit-based communication-efficient client selection strategy UCB-CS that achieves faster convergence with lower communication overhead.
Key Words: Federated Learning,Client Selection,Multi-armed bandits
Due to the inherent data heterogeneity across clients, judicious use of bias in client selection presents an untapped opportunity to improve error convergence.
However the client selection strategies proposed in previous works either require the server to additionally communicate with clients to retrieve the accurate local loss values or use stale loss values received from selected clients from previous communication rounds. Communication is expensive in FL, and furthermore, we show that using stale loss values can lead to slower error convergence or even divergence.
In this paper, we propose a bandit-based client selection strategy UCB-CS that is communication-efficient and use the observed clients’ local loss values more appropriately instead of using stale values.
To the best of our knowledge, there has been no work that proposes or evaluates biased client selection strategies in the context of fairness.
0x02 PROBLEM FORMULATION
A. Federated Learning with Partial Device Participation
A central aggregating server optimizes the model parameter by selecting a subset of clients for some fraction in each communication round (partial-device participation).
The set of active clients at iteration is denoted by . Since active clients performs steps of local update, the active set also remains constant for every iterations.
B. Biased Client Selection for Faster Convergence
It has been noted in ,  that selecting clients with higher local loss at each communication round leads to faster convergence but incurs an error floor.
Under the scheme, the central server with clients obtains their local loss for the current global model . A drawback of scheme is that it requires additional communication, as the central server is required to poll clients before selecting clients for the next communication round.
To reduce this additional communication, it is desirable to have a proxy for the local loss of each client available at the center. Motivated by this,  proposes the scheme, where the local loss is approximated by the local loss of the client when it was last selected in the client selection procedure. However, this approximation can be misleading at times as the client loss evaluation can be noisy and stale. Due to these reasons, it was observed that in certain cases the scheme does not have desirable convergence.
-  Y. J. Cho, J. Wang, and G. Joshi, “Client selection in federated learning: Convergence analysis and power-of-choice selection strategies,” vol. abs/2010.01243, 2020. [Online].
-  J. Goetz, K. Malik, D. Bui, S. Moon, H. Liu, and A. Kumar, “Active federated learning,” ArXiv, 2019.
C. Fairness in Client Selection
According to (1), clients with larger will intuitively yield lower local loss performance, and vice versa.
However, if clients with small perform significantly worse than the other clients, this can be unfair.
We show that client fairness can be improved by incorporating the estimated local loss values and client’s selected frequency to the client selection scheme.
We measure fairness by the Jain’s index , where 1 is when all clients have the same performance.
Why is it unfair using ?
0x03 CLIENT SELECTION WITH DISCOUNTED UCB
In order to achieve faster convergence with low error floor, it is important to select clients with larger local loss (i.e., exploitation) as that leads to faster convergence. It is also important to ensure diversity (i.e., exploration) in selection to achieve a lower error floor.
0x04 EXPERIMENT RESULTS
NVIDIA TitanX GPU
Hence we show that is efficient in the three important factors in FL: loss performance, fairness, and communication-efficiency.