
For me, the absolute best thing about Devin is its complete autonomy and the way it handles the entire end-to-end automation process without needing me to babysit it. I can simply drop in a link to the test suite from Azure DevOps, and it takes it from there. It logs into the application, finds the UI elements, and writes the Java code in the local Eclipse setup we have on its machine. The fact that it can run the tests and keep healing the script until it passes is a massive time-saver. I can have five different sessions running in parallel, which means I’m getting a whole week’s worth of manual automation done in a single day.
Ease of use is genuinely high because it’s mostly just natural-language prompting. I don’t have to write code snippets the way I do with other ai tools; I just explain the logic and it does the rest. Implementation was a bit more of a project, though, because setting up the dedicated machine with Eclipse and the right paths for our Azure Git repo took some time. Once that was done, everything has been smooth. The integration with Azure DevOps is surprisingly good as well, since it has a native way to handle those connections through the secrets manager and PAT.
I use Devin almost every single day now for any new test case development. The feature set is impressive, especially how it creates its own computing environment and uses its own browser to analyze the UI. It feels more like an actual teammate than just a tool. Customer support has been fairly responsive when I’ve hit those weird ACU consumption bugs, although most of the time I can figure things out from the logs Devin provides.
Like I mentioned, it’s not perfect. Sometimes it gets overexcited and changes core framework methods, which is something I have to watch out for in every PR. And that deviation after 50 ACU is definitely annoying, because it starts to ignore the initial logic. Still, as a tester who wants to scale up automation quickly, these feel like small prices to pay for the amount of work it gets done. It has completely changed how I manage my sprint tasks. Review collected by and hosted on G2.com.
It keeps messing with things it shouldn’t touch. There have been several times when it decided to refactor our core pre-built methods in the automation framework, even though it was only supposed to write a simple test script. That’s frustrating because I then have to spend extra time during PR review double-checking that it didn’t break some global logic that all our other tests depend on. It’s like it gets overexcited and tries to be too helpful, but it ends up creating more work for me to verify.
The other major issue is how it starts to deviate after a long session. I’ve noticed that once the ACU consumption hits around 40 or 50, Devin really starts to lose the plot. It begins ignoring the initial instructions I gave it, and the logic starts drifting in weird directions. It feels like the model gets tired and forgets the original goal of the session. I usually have to kill the session and start a completely fresh one just to get it back to being productive, which is a bit of a waste of time.
I also find the initial setup for the dedicated machine and secrets a bit tedious. Since it doesn’t have direct access to Azure DevOps, I have to manage all the creds and PATs as secrets inside Devin, which is just another thing to keep track of. And while it’s impressive that it can run Eclipse locally and debug its own code, execution speed can sometimes be slow compared to a human just running the script. Overall, it’s a great tool, but the overreaching code changes and the reliability issues in long sessions are definitely the biggest downsides for me. Review collected by and hosted on G2.com.
Our network of Icons are G2 members who are recognized for their outstanding contributions and commitment to helping others through their expertise.
The reviewer uploaded a screenshot or submitted the review in-app verifying them as current user.
Validated through a business email account
Invitation from G2. This reviewer was not provided any incentive by G2 for completing this review.