DARPA to fund culturally aware natural language processing models

To help translate in the field, as well as target enemies' feelings and ideas

Sebastian Moss

June 29, 2021

4 Min Read
Defense Advanced Research Projects Agency (DARPA)

The US military wants to develop advanced artificial intelligence systems that can not only translate other languages, but understand social customs and cultural backgrounds.

The Defense Advanced Research Projects Agency (DARPA) is seeking "natural language processing technologies that recognize, adapt to, and recommend how to operate within the emotional, social, and cultural norms that differ across societies, languages, and group affinities."

The agency admits that the Computational Cultural Understanding (CCU) program will require advanced artificial intelligence, far more complex than the technology used today.

With this in mind, DARPA said it was "specifically excluding" any evolutionary improvements to the existing state of practice, and would only fund "revolutionary advances in science, devices, or systems."

Revolutionary science to quell revolutions

“To support users engaged in cross-cultural dialogue, AI-enabled systems need to go beyond providing language translation – they need to leverage deep social and cultural understanding to assist communication,” Dr. William Corvey, a program manager in DARPA’s Information Innovation Office, said when the program was announced earlier this year.

“Moving AI from a tool to a partner in this capacity will require significant advances in our machines’ ability to discover and interpret sociocultural factors, recognize emotions, detect shifts in communication styles, and provide dialogue assistance when miscommunications seem imminent – all in real-time.”

The latest batch of procurement documents provided more details about the CCU program. Potential businesses have until July 2 to apply.

The final outcome of the CCU is expected to be a machine learning model that requires minimal-to-no training data in local culture, and no labeled data. Instead, it is expected to infer the meaning of unlabeled discourse behaviors in context.

It will also be expected to understand and interpret human emotions, highlight when communication failure has occurred, and provide real-time dialogue assistance to cross-cultural interaction.

The system should rely on both audio and visual data to come to its conclusions.

Such a system could be used to replace or augment human interpreters, with DARPA planning to test prototypes against humans in negotiation scenarios drawn from military training resources and/or after-action reporting.

This would replace the existing 'Machine Foreign Language Translation Systems' AI model developed for the US military, which can handle basic two-way speech-to-speech machine translation on laptops or smartphones.

Also funded via DARPA, the older system can perform some translation tasks, but cannot understand context or culture, leading to potential miscommunication that could prove deadly.

Ultimately, DARPA suggests the CCU could be used in negotiations instead of humans. The US military uses hundreds of local interpreters in conflict zones, but has something of a track record of leaving them behind. With its withdrawal from Afghanistan, hundreds of interpreters are set to be abandoned.

Since 2014, more than 300 interpreters or their family members were killed because of their affiliation with the United States, non-profit group No One Left Behind found.

The AI-based systems funded by DARPA could also be used for wider covert and shadow warfare efforts.

"Cultural understanding is critical to successful Information Operations, which increasingly involve so-called 'cognitive-emotional conflict,' where feelings are targeted, as well as ideas," procurement documents state.

To build these models, DARPA will split the research into three technical areas:

  • T1 – Sociocultural Analysis;

  • T2 – Cross-Cultural Dialogue Assistance;

  • T3 – and Data Creation for Development and Evaluation.

The first two are expected to be awarded to multiple competing companies, while the third, an effort to collect tens of thousands of documents per language (20 percent annotated and the other 80 percent unlabeled), will go to a single enterprise.

At key milestones, the systems will be evaluated by the National Institute of Standards and Technology, as well as an undisclosed federal research center.

The models are expected to work across different languages and cultures, given the global spread of US military actions. In Afghanistan, for example, the two main languages are Pashto and Dari, but there are more than 40 minor languages, with around 200 dialects. In Iraq, the two main languages are Kurdish and Arabic.

Many other languages and dialects are found in Syria, Yemen, Somalia, Libya, and Niger – all countries the US admitted it was at war with, back in 2018. As of 2019, US troops are officially in combat in at least 14 countries, undertake military exercises in 26 more, and conduct 'counterterrorism training' in yet another 65.

The DARPA procurement document does not detail every language and culture the model will be expected to handle, noting that "program language + culture pairs will be announced incrementally."

It does, however, reveal that the first language and culture pair "will be Chinese (Mandarin)."

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like