AAAI Conference on Artificial Intelligence

论文链接

0x00 Abstract

Though FL paradigm has received significant interest recently from the research community, the problem of selecting the relevant clients w.r.t. the central server’s learning objective is under-explored. We refer to these problems as Federated Relevant Client Selection (FRCS). Because the server doesn’t have explicit control over the nature of data possessed by each client, the problem of selecting relevant clients is significantly complex in FL settings.

The problems of FRCS need to be resolved:

  • selecting clients with relevant data
  • detecting clients that possess data relevant to a particular target label
  • rectifying corrupted data samples of individual clients.

We follow a principled approach to address the above FRCS problems and develop a new federated learning method using the Shapley value concept from cooperative game theory. Towards this end, we propose a cooperative game involving the gradients shared by the clients. Using this game, we compute Shapley values of clients and then present Shapley value based Federated Averaging (S-FedAvg) algorithm that empowers the server to select relevant clients with high probability.

Key Words:Relevant data,Shapley value,Federated Learning

0x01 Introduction

FL setting assumes a syn- chronous update of the central model and the following steps proceed in each round of the learning process (McMahan et al. 2017).

Note that only a fraction of clients is selected in each round as adding more clients would lead to diminishing returns beyond a certain point.

Though initially the emphasis was on mobile- centric FL applications involving thousands of clients, recently there is significant interest in enterprise driven FL applications that involve only a few tens of clients.

Motivation:

We apply standard FedAvg algorithm to two cases:

(a) where all clients possess relevant data

(b) where some clients possess irrelevant data.

从MNIST建立针对偶数学习的模型。

NOTE:将奇数标签改成偶数!

To simulate irrelevance, we work with open-set label noise (Wang et al. 2018) strategy, wherein we randomly flip each odd label of the 4 irrelevant clients to one of the even labels.

Contributions of Our Work:

  • 将夏普利值作为衡量设备数据相关性的度量
  • 基于夏普利值魔改FedAvg算法(S-FedAvg)
  • 解决了两个FRCS问题:
    • 选择高质量数据的节点
    • 检测和纠正节点的异样数据

0x03 Problem Statement

无关数据的存在使模型准确性和稳定性都下降!

0x04 Proposed Solution

博弈基础

夏普利值定义如下:

SVi(v)=1N!πΠ[v(Cπ(i){i})v(Cπ(i))]SV_i(v)=\frac{1}{|N|!}\sum_{\pi \in \Pi}[v(C_{\pi}(i)\cup\{i\})-v(C_{\pi}(i))]

其中:v()v(\cdot)是特征函数(贡献值函数);Π\PiNN个参与者全排列(入场顺序);Cπ(i)={jπ:π(j)<π(i)}C_{\pi}(i)=\{j\in\pi:\pi(j)<\pi(i)\},π(j)\pi(j)jj在排列π\pi中的位置。

通俗理解Cπ(i)C_{\pi}(i):就是ii还没入场前的合作联盟。

考虑全部参与者皆会入场(全排列),上式改下如下:

SVi(v)=CA{i}C!(NC1)!N![v(C(i){i})v(C(i))]SV_i(v)=\sum_{C \in A-\{i\}}\frac{|C|!(|N|-|C|-1)!}{|N|!}[v(C(i)\cup\{i\})-v(C(i))]

公式注解:

此时CC表示的是一个联盟,而不是一个排列。

一个联盟就会存在入场顺序,此处就是用联盟遍历排列的过程。

假设A={1,2,3,4,5,6}A=\{1,2,3,4,5,6\},取i=1,C={2,3}i=1,C=\{2,3\},就会存在以下排列:

{2,3,1,4,5,6}{2,3,1,4,6,5}...{3,2,1,4,5,6}{3,2,1,4,6,5}...\{2,3,1,4,5,6\}\\ \{2,3,1,4,6,5\}\\ ...\\ \{3,2,1,4,5,6\}\\ \{3,2,1,4,6,5\}\\ ...\\

由于ii在上述排列的过程中,都是第三个入场的,所以v(C(i){i})v(C(i))v(C(i)\cup\{i\})-v(C(i))值是相同的。

相同个数是C!(NC1)!|C|!(|N|-|C|-1)!{2,3}\{2,3\}ii左边,其余在右边的情况总数。

相关度得分的计算流程:

  • 训练节点kk将参数更新δkt+1=θkt+1θt\delta^{t+1}_k=\theta^{t+1}_k-\theta^t发送至服务器;

  • 定义合作游戏(δt+1,v)(\delta^{t+1},v),其中δt+1={δst+1}sSt\delta^{t+1}=\{\delta^{t+1}_s\}_{s\in S^t},v()v(\cdot)是特征函数;

    θXt+1=θt+1XsXδst+1v(X,DV)=P(fθXt+1,DV)\theta^{t+1}_{X}=\theta^t+\frac{1}{|X|}\sum_{s \in X}\delta^{t+1}_s\\ v(X,D_V)=\mathcal{P}(f_{\theta^{t+1}_X},D_V)

    P(,)\mathcal{P}(,)代表中央模型参数θXt+1\theta^{t+1}_X在验证数据集DVD_V上的性能.

  • φ=(φ1,φ2,...,φK)\varphi=(\varphi_1,\varphi_2,...,\varphi_K)表示相关度向量,其中φk\varphi_k表示ClientkClient_k的相关度。(φ\varphi只保留在服务器,初始化为φk=1K\varphi_k=\frac{1}{K})

    We posit that more the relevance value for a client, more is its likelihood to be relevant and thereby more is its contribution to the objective of central model.

  • 每轮聚合时,从合作游戏中获得夏普利值sv(s),sStsv(s),\forall s\in S_t,更新相关度向量:

    φs=αφs+βsv(s);sSt\varphi_s=\alpha*\varphi_s+\beta*sv(s);\forall s \in S_t

  • 使用softmax()softmax(\cdot)计算相对相关度PkP_k,依据PkP_k选取参与训练节点集StS^t

具体算法:

实验中,T=100,m=5T=100,m=5φk\varphi_k可以使用多阶马尔科夫计算。

0x05 Experiments

1.Setup

  • Extreme 1-class-non-iid

As is typical with the federated learning setting, we assume that the data is distributed non-iid with each client. We follow the extreme 1-class-non-iid approach mentioned in (Zhao et al. 2018b) while distributing the data to clients.

  • DV=1000,Dtest=4000|D_V|=1000,|D_{test}|=4000

  • 验证无关数据的干扰,改变了奇数的label,规则如下:

    k={10,52,34,96,78}k=\{1→0,5→2,3→4,9→6,7→8\}

  • Hyper-parameter Values:

2.S-FedAvg: To Selects Relevant Clients

3.Class-specific Best Client Selection

为了检测最好的客户端,将验证集DVD_V设置成目标数据集。

若想检测出拥有样本22最好的训练节点,就将DVD_V设置成22的数据集:

4.Data Label Standardization

本实验将Client3Client3的label:242→4

参考: