August 30, 2022
TiCoder’s creates code that is 90% in line with user intent.
Researchers have developed an AI tool that can generate code based on natural language inputs.
Dubbed TiCoder, which stands for Test-driven Interactive Coder, the system is capable of generating code that is 90.4% consistent with user intent, according to a paper by researchers from Microsoft, the University of Pennsylvania and the University of California, San Diego.
The group of researchers said that pre-trained large language models such as OpenAI's Codex have shown “immense potential” in automating code production from informal natural language intent. However, they argue that the code generated by these models “does not have any correctness guarantees around satisfying user intent.”
"It is hard to define a notion of correctness since natural language can be ambiguous and lacks a formal semantics," according to the paper.
The paper’s authors adopted a test-driven user-intent formalization (TDUIF) approach to address issues like buggy code.
TiCoder validates user intent through generated tests and then generates equivalent code based on those tests.
The paper describes the process of how it works in this 4-step method:
The human user prompts the agent for completing a function body given the prefix in a file, a natural language description and the function header/signature containing method name, parameters and returns.
The agent repeatedly queries the user (until a stopping criterion is reached) asking if a set of behaviors (or a test) is consistent with the user intent.
The user responds either ‘Yes,’ ‘No,’ or ‘Don’t Know’ to each of the queries from the agent.
Once the interaction terminates, the agent outputs (a) a set of tests that the user has approved, and (b) a ranked list of code suggestions that are consistent with the user responses.
Using the code generation benchmark dataset Mostly Basic Python Problems (MBPP), the researchers tested TiCoder and found that after just one user query, the system increased the statistic for code generation accuracy from 48.39% to 70.49%, an increase of over 22%. That number then rose to 85.48% with feedback from up to five users.
“The preliminary results from our experiments are encouraging,” the researchers wrote. “Additionally, we establish that there is significant room to improve current algorithms given the best performance an ideal algorithm can have.”