I’ve tested the coding capabilities of many generative AI tools for ZDNET — and this time, it’s Perplexity‘s turn.
Perplexity feels like a cross between a search engine and an AI chatbot. When I asked Perplexity how it differs from other generative AI tools, the chatbot said it uses real-time information access, indexing the web daily. Users can narrow searches by asking Perplexity to focus on sources or platforms.
Also: How to use ChatGPT to write code: What it can and can’t do for you
The free version of Perplexity is fairly limited. It uses OpenAI’s GPT-3.5 model for analysis, only allows five questions a day, and while it supports document uploads, those uploads are limited to three per day.
The Pro version of Perplexity is $20/month. That version allows for unlimited “quick” searches, 600 Pro searches per day, and AI model optionality. You can choose from GPT-4o, Claude 3, Sonar Large (LLaMa 3), and others. The Pro version also provides $5/month in API credits.
We decided to forego the Pro and run the free version for our initial test of Perplexity’s programming prowess. I’ve run these coding tests against AI chatbots with varied results. If you want to follow along, point your browser to “How I test an AI chatbot’s coding ability – and you can too,” which contains all the standard tests I apply, explanations of how they work, and details on what to look for in the results.
Also: Will AI take programming jobs or turn programmers into AI managers?
Now let’s dig into the results of each test and see how they compare to previous tests using Claude 3.5 Sonnet, Microsoft Copilot, Meta AI, Meta Code Llama, Google Gemini Advanced, and ChatGPT.
1. Writing a WordPress plugin
This challenge asks several things. First, it asks the chatbot to create a user interface for entering lines to be randomized (but not de-duped). Then the test requires the chatbot to create a button that not only randomizes the list but makes sure any duplicate items are presented such that they are not next to each other in the resulting list.
So far, most AI chatbots, except for Meta Code Llama, have created a fairly reasonable UI. Some were more attractive than others, but they all did the job.
Also: Code faster with generative AI, but beware the risks when you do
However, only ChatGPT (powered by GPT-3.5, GPT-4, and GPT-4o) produced the correct randomized output. Most of the other AI chatbots just presented a button which, when clicked, did nothing.
Perplexity worked. It produced a UI accurate to the spec, and the Randomize button worked and separated duplicate lines.
Here are the aggregate results of this and previous tests:
- Perplexity: Interface: good, functionality: good
- Claude 3.5 Sonnet: Interface: good, functionality: fail
- ChatGPT using GPT-4o: Interface: good, functionality: good
- Microsoft Copilot: Interface: adequate, functionality: fail
- Meta AI: Interface: adequate, functionality: fail
- Meta Code Llama: Complete failure
- Google Gemini Advanced: Interface: good, functionality: fail
- ChatGPT using GPT-4: Interface: good, functionality: good
- ChatGPT using GPT-3.5: Interface: good, functionality: good
2. Rewriting a string function
This test fixes a validation function that checks for dollars and cents.
My original code was in error, allowing only integer dollars, but no cents. I found out when a user submitted a bug report. I initially fed the incorrect code to ChatGPT, which did a good job of rewriting the function to allow dollar amounts and two digits to the right of the decimal point.
Perplexity also passed this test.
The code it generated could have been tighter, but it worked. In a case where the string provided by the user contained only zeros, Perplexity’s implementation removed everything. To compensate, Perplexity checked for zero first.
Also: Implementing AI into software engineering? Here’s everything you need to know
This approach is workable, but the regular expression Perplexity generated could have been written to account for this variation. It’s a simple implementation choice and many qualified programmers would have taken either path, so Perplexity’s approach is acceptable.
Perplexity’s code correctly tested the submitted data to ensure it matched the dollars and cents format. The code then converted the string to a number. It also checked if the number parsed was valid and non-negative.
Overall, Perplexity produced solid code. Here are the aggregate results of this and previous tests:
- Perplexity: Succeeded
- Claude 3.5 Sonnet: Failed
- ChatGPT using GPT-4o: Succeeded
- Microsoft Copilot: Failed
- Meta AI: Failed
- Meta Code Llama: Succeeded
- Google Gemini Advanced: Failed
- ChatGPT using GPT-4: Succeeded
- ChatGPT using GPT-3.5: Succeeded
3. Finding an annoying bug
A bug in my code confused me, so I turned to ChatGPT for help. As it turned out, the source of the problem was not intuitively obvious, which is why I missed it.
A parameter passing error requires knowledge of how the WordPress framework functions. I missed the bug because PHP seemed to imply the problem was in one part of the code when, in fact, the issue was how the code transitioned through a WordPress-specific operation.
Perplexity found the problem and correctly diagnosed the fix.
Also: Can AI be a team player in collaborative software development?
Here are the aggregate results of this and previous tests:
- Perplexity: Succeeded
- Claude 3.5 Sonnet: Succeeded
- ChatGPT using GPT-4o: Succeeded
- Microsoft Copilot: Failed
- Meta AI: Succeeded
- Meta Code Llama: Failed
- Google Gemini Advanced: Failed
- ChatGPT using GPT-4: Succeeded
- ChatGPT using GPT-3.5: Succeeded
4. Writing a script
This final test analyzes the extensiveness of the AI chatbot’s knowledge base. The test asks for code to be generated that requires knowledge of the Chrome document object model, AppleScript, and a third-party Mac scripting tool called Keyboard Maestro.
Perplexity did not appear to know about Keyboard Maestro, so it did not write the necessary call to the scripting language to retrieve the value of a variable.
Also: Beyond programming: AI spawns a new generation of job roles
Perplexity also made the same mistake Claude 3.5 Sonnet made, generating a line of AppleScript code that would result in a syntax error message upon running. This mistake indicated a lack of knowledge about how AppleScript ignores upper and lower case, and where it considers the case of a string when comparing two values.
Here are the aggregate results of this and previous tests:
- Perplexity: Failed
- Claude 3.5 Sonnet: Failed
- ChatGPT using GPT-4o: Succeeded but with reservations
- Microsoft Copilot: Failed
- Meta AI: Failed
- Meta Code Llama: Failed
- Google Gemini Advanced: Succeeded
- ChatGPT using GPT-4: Succeeded
- ChatGPT using GPT-3.5: Failed
Overall results
Here are the overall results of the four tests:
Overall, Perplexity did well. I thought the AI chatbot might fail the fourth test, because GPT-3.5 did, and the free version of Perplexity uses the GPT-3.5 model.
I was surprised by these results because Microsoft’s Copilot is also supposed to use OpenAI’s AI engine, but Copilot failed at pretty much everything. Perplexity mirrored GPT-3.5’s results, which makes sense since the free version uses GPT-3.5.
Let me know if you want to see how Perplexity Pro performs. If I get enough requests, I’ll sign up for Yet Another Monthly AI Fee and run some tests.
Have you tried Perplexity’s free version or its Pro version? Let us know in the comments below.
You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter, and follow me on Twitter/X at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, and on YouTube at YouTube.com/DavidGewirtzTV.
+ There are no comments
Add yours