自 2025 年 3 月 27 日起,我们建议您使用 android-latest-release
而非 aosp-main
构建 AOSP 并为其做出贡献。如需了解详情,请参阅 AOSP 的变更。
语音交互简介
使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
Voice Interaction Service API 可在不同的潜在语音控制应用之上提供一个抽象层。此类实现可按照应用开发中所述的指南进行开发。本集成指南介绍了如何将此类应用集成到某个特定的 Android Automotive OS (AAOS) 系统映像中。
术语
本指南中使用了以下术语:
- 辅助数据:启动语音交互会话后,系统便可以对视图和屏幕中的内容截图,并将这些信息传递给会话。应用可以通过实现
Activity#onProvideAssistData()
和 Activity#onProvideAssistContent()
提供更多信息。
- 按住开始讲话 (PTT):一种实体的语音控制按钮,通常位于方向盘上。
- RecognitionService (RS):应用通过
SpeechRecognizer
API 使用的语音识别服务。VIA 必须同时包含 VoiceInteractionService
和 RecognitionService
。
- 点按后说话 (TTT):软件的语音控制按钮,通常包含在系统界面中。在 Android 中也称为“辅助手势”。
VoiceInteractionService
:VIA 开发者实现的轻量级系统服务。在设备启动时,所选服务由系统服务绑定,且始终处于运行状态。
- VoiceInteractionSession (VIS):该类可封装用户交互业务逻辑,用于向用户显示语音交互的状态,处理 VoiceInteractor 请求并接收辅助数据及屏幕截图数据。
- VoiceInteractionSessionService (VSS):VIA 中的一项服务,用于处理语音交互会话。在用户进行语音交互期间,Android 系统服务会绑定此服务。此会话的所有业务逻辑均在
VoiceSession
类中实现。系统只能保证此服务在单个用户语音会话期间处于活动状态。
- 语音交互应用 (VIA)。充当语音控件的 Android 应用(称为“助理”)。此类应用可通过在其清单中添加
VoiceInteractionService
进行标识。每次只能将其中一个应用设为系统中的默认应用。只有默认应用处于活动状态(与系统服务绑定),且是按住开始讲话 (PTT) 或点按后说话 (TTT) 事件的接收方。
职责
下表介绍了各方的职责。
汽车制造商 (OEM) |
AOSP |
应用开发者 |
- 构建与 AAOS 兼容的信息娱乐系统。
- 实现音频的输入和输出,可选择性地添加 DSP 启动指令检测支持。
- 为语音交互服务授予系统特许权限。
- 遵守与对应用的设置屏幕的访问权限相关的
VoiceInteractionService 要求。
|
- 定义并优化
VoiceInteractionService 及相关 API。
- 向 VIA 开发者提供 API 文档、示例代码和其他相关支持材料。
- 提供含有相关要求和建议的用户体验指南。
|
- 实现
VoiceInteractionService API、RecognitionService API 和 NotificationListenerService API(如需查看详细说明,请参阅应用开发)
- 提供可自定义的界面,以便原始设备制造商 (OEM) 根据各种汽车设计系统加以调整。
|
用户体验要求
OEM 的根本职责是向客户提供良好的用户体验,同时必须确保预安装的所有语音交互服务均满足 预加载的助理:用户体验指南中所述的要求。
核心助理体验
车载语音交互应用 (VIA) 将执行以下操作:
- [必须] 响应由系统处理的语音交互触发器(PTT、TTT)。
- [必须] 直观呈现触发器的进度(例如监听、处理和执行)。
- [必须] 通过语音或声音传达对用户请求的理解和完成情况。
- [必须] 用作其他应用的语音识别程序(请参阅 SpeechRecognizer API)。
- [应该] 响应启动指令触发器。
- [可以] 显示“设置”Activity,以供用户配置此 VIA(例如权限、启动指令配置和登录)。
- [可以] 处理辅助数据 (
Intent#ACTION_ASSIST
)
- [可以] 支持在锁屏(锁定屏幕)时进行语音交互。
组件
大体上讲,语音交互应用会与以下操作方进行交互:

图 1. 语音交互操作方
详细信息:
VoiceInteractionManagerService
:此系统服务负责管理默认 VIA,并将其功能提供给系统的其余部分。
RecognitionService
:此服务向系统中的其他应用提供语音识别功能。
SoundTrigger
:用于实现启动指令管理,且 VIA 可通过 AlwaysOnHotwordDetector 进行使用。
MediaRecorder
:可通过其访问音频输入,以进行启动指令检测(使用 CPU 时)和语音识别。
PhoneWindowManager
/CarInputService
:除了其他方面之外,这些服务还负责利用 VoiceInteractionManagerService
将 PTT 路由到 VIA 以处理关键事件。
User
:用户通过触发器(PTT、TTT、启动指令)或语音控制面板界面与 VIA 交互。
- CarService、Notifications、Media、Telephony、ContactsProvider 等:VoiceInteractionSession 执行用户的语音指令所用的服务和应用。
特定于 Android Automotive 的概念
AAOS 与 Android 的区别体现在以下几个方面:
- 除了常规的助理功能之外,AAOS VIA 还可以控制车载功能(例如,HVAC、车座和车内照明)。如果原始设备制造商 (OEM) 按照特许权限许可名单中的说明正确配置访问权限,则可使用 CarPropertyManager API 对这些功能进行集成(如需了解详情,请参阅读取车辆属性)。
- 与任何其他设备类型相比,Automotive 设备中自定义和一致性的相关度更高。如需详细了解如何按照这些指南进行实现,请参阅自定义。
本页面上的内容和代码示例受内容许可部分所述许可的限制。Java 和 OpenJDK 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2025-07-27。
[null,null,["最后更新时间 (UTC):2025-07-27。"],[],[],null,["# About voice interaction\n\nThe Voice Interaction Service API provides an abstraction over different\npotential voice control apps. Implementations can be developed following the guidelines\ndescribed in\n[App development](/docs/automotive/voice/voice_interaction_guide/app_development).\nThe content in this integration guide describes how to integrate these apps into\na specific Android Automotive OS (AAOS) system image.\n\nTerminology\n-----------\n\nThese terms are used through this guide:\n\n- **Assist data.** When a voice interaction session is started, the system is able to capture views and screenshots, and pass this information to the session. Apps can expose additional information by implementing [Activity#onProvideAssistData()](https://developer.android.com/reference/android/app/Activity#onProvideAssistData(android.os.Bundle)) and [Activity#onProvideAssistContent()](https://developer.android.com/reference/android/app/Activity#onProvideAssistContent(android.app.assist.AssistContent)).\n- **[Push-to-talk (PTT)](https://developer.android.com/reference/android/service/voice/VoiceInteractionSession#SHOW_SOURCE_PUSH_TO_TALK)**. Physical voice control button, usually located in the steering wheel.\n- **RecognitionService (RS).** Voice recognition service used by apps through the [SpeechRecognizer](https://developer.android.com/reference/android/speech/SpeechRecognizer)`\n ` API. VIAs must include both the `VoiceInteractionService` *and* the `RecognitionService`.\n- **[Tap-to-talk (TTT)](https://developer.android.com/reference/android/service/voice/VoiceInteractionSession#SHOW_SOURCE_ASSIST_GESTURE)** . Software voice control button, usually included as part of the system UI). In Android this is also referred to as *Assist Gesture*.\n- **[VoiceInteractionService](https://developer.android.com/reference/android/service/voice/VoiceInteractionService)**. Lightweight system service implemented by the VIA developer. The selected service is bound from system service on boot, and is always running.\n- **VoiceInteractionSession (VIS).** This class encapsulates the user interaction business logic. It is responsible for presenting the user with status of the voice interaction, handling VoiceInteractor requests and receiving assist and screenshot data.\n- **VoiceInteractionSessionService (VSS).** A service, part of a VIA, responsible for handling a voice interaction session. This service is bound from Android's system service during a voice interaction with a user. All business logic of this session is implemented in the `VoiceSession` class. This service is only guaranteed to stay alive during a single user voice session.\n- **Voice Interaction App (VIA).** Android app designed to serve as a voice control (referred to as *assistant* ). These apps can be identified by including a `VoiceInteractionService` in their manifest. Only one of these apps can be selected as *default* at a time in the system. Only the default app will be maintained alive (bound from a system service), and will be the receiver of [Push-To-Talk (PTT)](https://developer.android.com/reference/android/service/voice/VoiceInteractionSession#SHOW_SOURCE_PUSH_TO_TALK) or [Tap-To-Talk (TTT)](https://developer.android.com/reference/android/service/voice/VoiceInteractionSession#SHOW_SOURCE_ASSIST_GESTURE) events.\n\nResponsibilities\n----------------\n\nThis table describes the responsibilities of each party.\n\n| Car Manufacturers (OEMs) | AOSP | App Developers |\n|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| - Build a [compatible](/compatibility/android-cdd) infotainment system with AAOS. - Implement audio input and output, optionally including DSP hotword detection support. - Grant system-privileged permissions for the voice interaction services. - Respect `VoiceInteractionService` requirements regarding access to app's settings screens. | - Define and evolve `VoiceInteractionService` and related APIs. - Provide API documentation, sample code and other support material to VIA developers. - Provide UX guidance with requirements and recommendations. | - Implement `VoiceInteractionService` API, RecognitionService API and NotificationListenerService API (see detailed description at [App development](/docs/automotive/voice/voice_interaction_guide/app_development)). - Provide a customizable UI that can be adjusted by OEMs to match each car design system. |\n\nUX requirements\n---------------\n\nOEMs have the ultimate responsibility of providing a good user experience to customers.\nOEMs must ensure that the all pre-installed voice interaction services fulfill the\nrequirements described in\n[Preloaded Assistants: UX Guidance](/static/docs/automotive/voice/voice_interaction_guide/preloaded-assistants_UX-guidelines.pdf).\n\nCore assistant experience\n-------------------------\n\nAn automotive Voice Interaction Application (VIA) performs the following actions:\n\n- \\[MUST\\] Respond to system-handled voice interaction triggers (PTT, TTT).\n- \\[MUST\\] Display a visual representation of their progress (for example, listening, processing, and fulfilling).\n- \\[MUST\\] Use voice or sounds to indicate understanding and completion of user requests.\n- \\[MUST\\] Serve as a speech recognizer for other apps (see the [SpeechRecognizer\n API](https://developer.android.com/reference/android/speech/SpeechRecognizer)).\n- \\[SHOULD\\] Respond to a hotword trigger.\n- \\[MAY\\] Display a settings activity where users can configure this VIA (for example, permissions, hotword configuration, and sign-in).\n- \\[MAY\\] Handle assist data ([Intent#ACTION_ASSIST](https://developer.android.com/reference/android/content/Intent#ACTION_ASSIST))\n- \\[MAY\\] Support voice interaction from Keyguard (lock screen).\n\nComponents\n----------\n\nAt a high level, a voice interaction app interacts with these actors:\n\n**Figure 1.** Voice interaction actors\n\nDetails:\n\n- `VoiceInteractionManagerService`. This system service is responsible for managing the default VIA, and exposing its functionality to the rest of the system.\n- `RecognitionService`. This service exposes speech recognition capabilities to other apps in the system.\n- `SoundTrigger`. Implements hotword management and it's available to VIAs through the AlwaysOnHotwordDetector.\n- `MediaRecorder`. Provides access to audio input for both hotword detection (when using CPU) and speech recognition.\n- `PhoneWindowManager`/`CarInputService`. These services are responsible (among other things) for handling key events, routing PTT to the VIA, by means of the `VoiceInteractionManagerService`.\n- `User`. The user interacts with a VIA by means of Triggers (PTT, TTT, Hotword) or the Voice Plate UI.\n- **CarService, Notifications, Media, Telephony, ContactsProvider, and so on.** Services and apps used by the VoiceInteractionSession to fulfill the user's commands.\n\nAutomotive-specific concepts\n----------------------------\n\nAAOS diverges from Android in the following aspects:\n\n- Besides normal Assistant functionalities, AAOS VIAs can control vehicle functions (for example, HVAC, seats, and interior lights). These functionalities can be integrated using the CarPropertyManager API (see more at [Read a\n vehicle property](/docs/automotive/voice/voice_interaction_guide/fulfilling_commands#vehicle-property)) provided OEMs configure access correctly as described in [Privileged permission allowlisting](/docs/core/permissions/perms-allowlist).\n- Customization and consistency are more relevant in Automotive than in any other form factor. See [Customization](/docs/automotive/voice/voice_interaction_guide/integration_flows#customization) to read more about implementing these guidelines."]]