Cognition Emerges From Stealth To Launch AI Software Engineer ‘Devin’

Longtime Slashdot reader ahbond shares a report from VentureBeat: Today, Cognition, a recently formed AI startup backed by Peter Thiel’s Founders Fund and tech industry leaders including former Twitter executive Elad Gil and Doordash co-founder Tony Xu, announced a fully autonomous AI software engineer called "Devin." While there are multiple coding assistants out there, including the famous Github Copilot, Devin is said to stand out from the crowd with its ability to handle entire development projects end-to-end, right from writing the code and fixing the bugs associated with it to final execution. This is the first offering of this kind and even capable of handling projects on Upwork, the startup has demonstrated. […]
In a blog post today on Cognition’s website, Scott Wu, the founder and CEO of Cognition and an award-winning sports coder, explained Devin can access common developer tools, including its own shell, code editor and browser, within a sandboxed compute environment to plan and execute complex engineering tasks requiring thousands of decisions. The human user simply types a natural language prompt into Devin’s chatbot style interface, and the AI software engineer takes it from there, developing a detailed, step-by-step plan to tackle the problem. It then begins the project using its developer tools, just like how a human would use them, writing its own code, fixing issues, testing and reporting on its progress in real-time, allowing the user to keep an eye on everything as it works. […]
According to demos shared by Wu, Devin is capable of handling a range of tasks in its current form. This includes common engineering projects like deploying and improving apps/websites end-to-end and finding and fixing bugs in codebases to more complex things like setting up fine-tuning for a large language model using the link to a research repository on GitHub or learning how to use unfamiliar technologies. In one case, it learned from a blog post how to run the code to produce images with concealed messages. Meanwhile, in another, it handled an Upwork project to run a computer vision model by writing and debugging the code for it. In the SWE-bench test, which challenges AI assistants with GitHub issues from real-world open-source projects, the AI software engineer was able to correctly resolve 13.86% of the cases end-to-end — without any assistance from humans. In comparison, Claude 2 could resolve just 4.80% while SWE-Llama-13b and GPT-4 could handle 3.97% and 1.74% of the issues, respectively. All these models even required assistance, where they were told which file had to be fixed. Currently, Devin is available only to a select few customers. Bloomberg journalist Ashlee Vance wrote a piece about his experience using it here.
"The Doom of Man is at hand," captions Slashdot reader ahbond. "It will start with the low-hanging Jira tickets, and in a year or two, able to handle 99% of them. In the short term, software engineers may become like bot farmers, herding 10-1000 bots writing code, etc. Welcome to the future."