1.问题背景

1
2
3
4
5
6
7
8
9
10
11
12
【1.测试版本】:[GACXXXX_A88_AVNT_SF_211224.R6.33_D]
【2.前置条件】: 1.车机上电
【3.操作步骤】:1.开机进入桌面 2.点击右上角wifi图标
【4.实际结果】:1.wifi和蓝牙无法使用
【5.期待结果】:1.蓝牙和wifi正常使用
【6.问题发生时间】:2022-01-05 9时30分41
【7.最简复归方法】:不可恢复
【8.发生频率】:3/3
【9.硬件版本】:A1
【10.测试环境】:台架
【11.实车配置】:
【12.备注】:

systemui FO回复

1
目前有重现处理,问题原因是因为系统原生 WiFiManager 获取数据卡住导致 anr,已经交给 fw 开发在分析中

2.调查过程

一听到ANR,立马找/data/anr目录下的日志

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
----- pid 26716 at 2022-01-13 13:31:04 -----
Cmd line: com.gxa.service.systemui
Build fingerprint: 'qti/gac/gac:9/PQ1A.190105.004/1:userdebug/test-keys'
ABI: 'arm64'
Build type: optimized
......
"main" prio=5 tid=1 Native
| group="main" sCount=1 dsCount=0 flags=1 obj=0x73f0bb28 self=0x6f69814c00
| sysTid=26716 nice=-19 cgrp=default sched=0/0 handle=0x6fef389548
| state=S schedstat=( 8037523822 1796068645 29536 ) utm=563 stm=240 core=1 HZ=100
| stack=0x7ff9371000-0x7ff9373000 stackSize=8MB
| held mutexes=
kernel: (couldn't read /proc/self/task/26716/stack)
native: #00 pc 000000000006e204 /system/lib64/libc.so (__ioctl+4)
native: #01 pc 0000000000029110 /system/lib64/libc.so (ioctl+136)
native: #02 pc 000000000005a5e4 /system/lib64/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+256)
native: #03 pc 000000000005b374 /system/lib64/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+60)
native: #04 pc 000000000005b1c0 /system/lib64/libbinder.so (android::IPCThreadState::transact(int, unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+176)
native: #05 pc 0000000000051318 /system/lib64/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+72)
native: #06 pc 000000000012f13c /system/lib64/libandroid_runtime.so (android_os_BinderProxy_transact(_JNIEnv*, _jobject*, int, _jobject*, _jobject*, int)+152)
at android.os.BinderProxy.transactNative(Native method)
at android.os.BinderProxy.transact(Binder.java:1127)
at android.net.wifi.IWifiManager$Stub$Proxy.getConfiguredNetworks(IWifiManager.java:1028)
at android.net.wifi.WifiManager.getConfiguredNetworks(WifiManager.java:1039)
at com.gxa.service.systemui.mvvm.service.NetworkService.notifyWifiList(NetworkService.kt:267)
at com.gxa.service.systemui.mvvm.service.NetworkService.handleWifiStateChangedEvent(NetworkService.kt:227)
at com.gxa.service.systemui.mvvm.service.NetworkService.handleWifiStateEvent(NetworkService.kt:184)
at com.gxa.service.systemui.mvvm.service.NetworkService.access$handleWifiStateEvent(NetworkService.kt:35)
at com.gxa.service.systemui.mvvm.service.NetworkService$mWifiStateReceiver$1.onReceive(NetworkService.kt:176)
at android.app.LoadedApk$ReceiverDispatcher$Args.lambda$getRunnable$0(LoadedApk.java:1391)
at android.app.-$$Lambda$LoadedApk$ReceiverDispatcher$Args$_BumDX2UKsnxLVrE6UJsJZkotuA.run(lambda:-1)
at android.os.Handler.handleCallback(Handler.java:873)

从anr日志中可以看到systemui 主线程阻塞在IWifiManager的对端方法getConfiguredNetworks

查看服务端代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
android/frameworks/opt/net/wifi/service/java/com/android/server/wifi/WifiServiceImpl.java

public ParceledListSlice<WifiConfiguration> getConfiguredNetworks() {
enforceAccessPermission();
if (mVerboseLoggingEnabled) {
mLog.info("getConfiguredNetworks uid=%").c(Binder.getCallingUid()).flush();
}
if (mWifiStateMachineChannel != null) {
List<WifiConfiguration> configs = mWifiStateMachine.syncGetConfiguredNetworks(
Binder.getCallingUid(), mWifiStateMachineChannel);
if (configs != null) {
return new ParceledListSlice<WifiConfiguration>(configs);
}
} else {
Slog.e(TAG, "mWifiStateMachineChannel is not initialized");
}
return null;
}

从这个方法里看,没有什么阻塞点,最核心的代码就是从mWifiStateMachine状态机中获取网络配置状态,有就返回ParceledListSlice没有就返回空。

从此处完全看不出为什么会阻塞。

既然服务端没有阻塞点,那是不是binder管道阻塞了?

从车机中导出binder的日志:/d/binder目录下

1
2
3
4
5
6
7
8
9
# failed_transaction_log
288279: async from 1446:1504 to 1709:0 context binder node 84178 handle 34 size 96:0 ret 29189/-3 l=3274
715770: call from 2243:2432 to 0:0 context binder node 0 handle 23 size 92:0 ret 29189/-22 l=3128
735299: reply from 1240:1256 to 2589:2589 context binder node 0 handle -1 size 4:0 ret 29189/-3 l=3274
735314: call from 2589:2589 to 1240:0 context binder node 12046 handle 25 size 124:8 ret 29189/-3 l=3274
835473: call from 514:514 to 479:0 context hwbinder node 743 handle 3 size 140:16 ret 29189/-3 l=3274
6914896: reply from 1628:20008 to 18166:18166 context binder node 0 handle -1 size 8:0 ret 29189/-3 l=3274
15167312: reply from 488:1012 to 1240:3189 context hwbinder node 0 handle -1 size 8:0 ret 29189/-3 l=3274
22202337: call from 1240:3189 to 1018:0 context binder node 2830 handle 1 size 168:0 ret 29189/-3 l=3274

从这个文件中可以看到binder传输失败的痕迹。

结合log,查看systemui是不是binder传输失败。

1
2
I/CarSystemUI( 1709): systemuiNetworkService t-box onTboxConnectionStateChanged:2
I/CarSystemUI( 1709): systemuiNetworkService t-box onTboxConnectionStateChanged:2

可以看到1709是systemui 进程,而在failed_transaction_log中第一条就可以看到从1446进程发送到1709进程有一次发送失败。因此继续看systemui的日志。

1
2
3
4
5
6
7
8
9
10
11
......一共回调了22
I/CarSystemUI( 1709): systemuiNetworkService t-box onTboxConnectionStateChanged:2
I/CarSystemUI( 1709): systemuiNetworkService t-box onTboxConnectionStateChanged:2
I/CarSystemUI( 1709): systemuiNetworkService t-box onTboxConnectionStateChanged:2
I/CarSystemUI( 1709): systemuiNetworkService t-box onTboxConnectionStateChanged:2
I/CarSystemUI( 1709): systemuiNetworkService t-box onTboxConnectionStateChanged:2
I/CarSystemUI( 1709): systemuiNetworkService t-box onTboxConnectionStateChanged:2
I/CarSystemUI( 1709): systemuiNetworkService t-box onTboxConnectionStateChanged:2
I/system_server( 1018): libdebuggerd_client: started dumping process 1709
I//system/bin/tombstoned( 650): registered intercept for pid 1709 and type kDebuggerdJavaBacktrace
I/ervice.systemu( 1709): Thread[3,tid=1763,WaitingInMainSignalCatcherLoop,Thread*=0x6f69816400, peer=0x13d80088,"Signal Catcher"]: reacting to signal 3

可以看到日志中当tbox发出网络状态变化就会通知system ui。有那么多次回调。似乎可以联想到多次请求WifiManager获取网络配置信息的接口。

1
2
3
4
5
6
7
8
9
10
11
12
// service/src/main/java/com/gxa/service/systemui/mvvm/service/NetworkService.kt
private fun notifyWifiList() {
val results = wifiManager.scanResults
// 主线程获取配置文件
val configurations: List<WifiConfiguration> = wifiManager.getConfiguredNetworks()

val ipAddress: Int = wifiManager.getConnectionInfo().getIpAddress()

// 遍历配置文件
results.forEach {
......
}

继续看日志

1
2
3
InputDispatcher( 1018): Application is not responding: Window{e8e7ceb u0 gaei_fullscreen}.  It has been 5003.1ms since event, 5002.8ms since wait started.  Reason: Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago.  Wait queue length: 2.  Wait queue head age: 5656.0ms.

I/WindowManager( 1018): Input event dispatching timed out sending to gaei_fullscreen. Reason: Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago. Wait queue length: 2. Wait queue head age: 5656.0ms.

这里也印证了主线程在getConfiguredNetworks()阻塞了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
01-13 12:52:45.392 E/ActivityManager( 1018): ANR in com.gxa.service.systemui
01-13 12:52:45.392 E/ActivityManager( 1018): PID: 1709
01-13 12:52:45.392 E/ActivityManager( 1018): Reason: Input dispatching timed out (Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago. Wait queue length: 2. Wait queue head age: 5656.0ms.)
01-13 12:52:45.392 E/ActivityManager( 1018): Load: 0.0 / 0.0 / 0.0
01-13 12:52:45.392 E/ActivityManager( 1018): CPU usage from 82769ms to 0ms ago (2022-01-13 12:51:22.335 to 2022-01-13 12:52:45.105):
01-13 12:52:45.392 E/ActivityManager( 1018): 19% 2506/com.iflytek.cutefly.speechclient.hmi: 18% user + 1% kernel / faults: 6725 minor
01-13 12:52:45.392 E/ActivityManager( 1018): 9.5% 514/surfaceflinger: 5% user + 4.4% kernel / faults: 3943 minor
01-13 12:52:45.392 E/ActivityManager( 1018): 5.4% 2877/com.gxatek.cockpit.launcher: 4% user + 1.3% kernel / faults: 32226 minor
01-13 12:52:45.392 E/ActivityManager( 1018): 4.8% 508/audioserver: 3.4% user + 1.4% kernel / faults: 107 minor
01-13 12:52:45.392 E/ActivityManager( 1018): 3.7% 467/android.hardware.audio@2.0-service.rbxc: 2.6% user + 1% kernel / faults: 111 minor
01-13 12:52:45.392 E/ActivityManager( 1018): 3.6% 516/vendor.bosch.hardware.receiverbasesoftware@1.0-service: 2.2% user + 1.3% kernel / faults: 13044 minor
01-13 12:52:45.392 E/ActivityManager( 1018): 3.3% 479/android.hardware.graphics.composer@2.2-service: 1.7% user + 1.6% kernel / faults: 19 minor
01-13 12:52:45.392 E/ActivityManager( 1018): 2.7% 1018/system_server: 1.3% user + 1.4% kernel / faults: 1816 minor
01-13 12:52:45.392 E/ActivityManager( 1018): 2.2% 482/android.hardware.sensors@1.0-service: 0.5% user + 1.7% kernel
// 当时的CPU并不繁忙
01-13 12:52:45.392 E/ActivityManager( 1018): 1.4% 1709/com.gxa.service.systemui: 1% user + 0.3% kernel / faults: 11016 minor

解决方法:问题基本定位,在获取wifi扫描结果的配置信息是耗时操作,需要systemui在子线程中获取数据,获取到数据之后再主线程中更新。

3.回顾binder传输失败

1
2
3
4
Line 34303: 01-13 12:52:45.392 I/ActivityManager( 1018): Killing 1709:com.gxa.service.systemui/1000 (adj -800): bg anr
Line 34304: 01-13 12:52:45.393 W/libprocessgroup( 1018): kill(-1709, 9) failed: No such process
Line 34316: 01-13 12:52:45.440 W/libprocessgroup( 1018): kill(-1709, 9) failed: No such process
Line 34317: 01-13 12:52:45.440 I/libprocessgroup( 1018): Successfully killed process cgroup uid 1000 pid 1709 in 47ms

从日志中可以看到1709进程以及被系统杀掉了,所以binder在回传结果数据的时候失败了(pid和线程:1709:0),得到印证。

4.界面不更新wifi配置信息

systemui使用子线程获取WIFI配置信息,但是偶发获取不到配置信息。从此可以看出上面用子线程的方案治根不治本,继续调查。

回到systemui ANR调用栈

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// android/frameworks/opt/net/wifi/service/java/com/android/server/wifi/WifiServiceImpl.java
@Override
public ParceledListSlice<WifiConfiguration> getConfiguredNetworks() {
enforceAccessPermission();
if (mVerboseLoggingEnabled) {
mLog.info("getConfiguredNetworks uid=%").c(Binder.getCallingUid()).flush();
}
if (mWifiStateMachineChannel != null) {
List<WifiConfiguration> configs = mWifiStateMachine.syncGetConfiguredNetworks(
Binder.getCallingUid(), mWifiStateMachineChannel);
if (configs != null) {
return new ParceledListSlice<WifiConfiguration>(configs);
}
} else {
Slog.e(TAG, "mWifiStateMachineChannel is not initialized");
}
return null;
}

获取网络配置信息方法中通过WIFI状态机同步获取。

1
2
3
4
5
6
7
8
9
10
11
12
// android/frameworks/opt/net/wifi/service/java/com/android/server/wifi/WifiStateMachine.java
public List<WifiConfiguration> syncGetConfiguredNetworks(int uuid, AsyncChannel channel) {
// 同步发送一个请求给wifi hal
Message resultMsg = channel.sendMessageSynchronously(CMD_GET_CONFIGURED_NETWORKS, cd);
if (resultMsg == null) { // an error has occurred
return null;
} else {
List<WifiConfiguration> result = (List<WifiConfiguration>) resultMsg.obj;
resultMsg.recycle();
return result;
}
}
1
2
3
4
5
// android/frameworks/base/core/java/com/android/internal/util/AsyncChannel.java
public Message sendMessageSynchronously(Message msg) {
Message resultMsg = SyncMessenger.sendMessageSynchronously(mDstMessenger, msg);
return resultMsg;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// android/frameworks/base/core/java/com/android/internal/util/AsyncChannel.java 内部类SyncMessenger
private static Message sendMessageSynchronously(Messenger dstMessenger, Message msg) {
SyncMessenger sm = SyncMessenger.obtain();
Message resultMsg = null;
try {
if (dstMessenger != null && msg != null) {
msg.replyTo = sm.mMessenger;
synchronized (sm.mHandler.mLockObject) {
if (sm.mHandler.mResultMsg != null) {
Slog.wtf(TAG, "mResultMsg should be null here");
sm.mHandler.mResultMsg = null;
}
dstMessenger.send(msg);
// 发送获取配置信息给wifi native服务,然后等待!!
sm.mHandler.mLockObject.wait();
resultMsg = sm.mHandler.mResultMsg;
sm.mHandler.mResultMsg = null;
}
}
} catch (InterruptedException e) {
Slog.e(TAG, "error in sendMessageSynchronously", e);
} catch (RemoteException e) {
Slog.e(TAG, "error in sendMessageSynchronously", e);
}
sm.recycle();
return resultMsg;
}

发送获取配置信息给wifi native服务,然后等待回复!!而WIFI服务一直没有回复,导致system ui也一直收到信息,最终导致ANR或者无信息跟新。