Is your AI language tutor lying to you?

5 min readAug 7, 2024

How reliable is AI grammar correction for language learners?

>> You can watch/listen to a video version of this article on YouTube

There are many new apps that use AI to provide tutoring and practice in a foreign language. I have an ever-growing list of examples here.

There are two main things that these AI language tutors offer.

1) They chat to you in a realistic human-like way in the language you want to practise.

2) They correct your mistakes and explain them.

As a chat partner they do a great job but what about the advice they give you about your mistakes? How reliable is that?

I’ve recently tested numerous AI language tutor apps and incorporated AI corrections into Go Correct (an app I’ve been running since 2017 where English learners have their daily writing practice corrected by a human).

As a result, I now have a good idea of how accurate AI corrections are and the areas where AI is most likely to fall short.

I’ve been investigating this topic for years, for example with Grammarly back in 2017, but the new generation of AI has brought new opportunities for automated language correction.

Firstly I’ll tell you what you need to know as a language learner using these apps. In the second half I’ll go into some detail that may be of more interest to people developing AI language tutor apps.

Advice for language learners

The accuracy of the corrections may vary slightly between apps but, assuming all apps are using Chat GPT, the output will be broadly similar.

Using differently refined prompts and different models (eg. GPT 3 vs 4) makes some difference to the accuracy of the corrections but none is yet close to 100% reliable. (There is some data later in the article to back that up).

There are two main ways that an AI tutor will give misleading advice:

It will miss a mistake, leaving you thinking something is fine when it’s not.
It will correct something that’s fine, leaving you thinking your grammar or vocabulary is worse than it is.

I’ll show some examples using real texts written by real English learners in various AI tutor apps.

Missed mistakes

The examples below are grammar mistakes but I’ve seen AI miss vocabulary mistakes too.

Here’s an example from Lengi.

‘Had participated’ should have been corrected because it’s not the correct tense here. The correct tense for this context is Past Simple (‘participated’).

And one from Langua.

Here ‘to be too rich’ should have been corrected to ‘being too rich’. I even asked Chat GPT its opinion on this and it agreed it should be ‘being’. For the grammar nerds — we use a gerund when the action is the subject of the sentence.

Correcting things that are not incorrect

This is where the creative ‘generative’ aspect of AI comes in and becomes a problem. AI is always trying to improve the style of a text or make it more precise or descriptive. This is great for some situations (eg. writing a cover letter for a job) but it creates a lot of noise when your goal is to find out whether you used sufficiently correct language.

If you’re at intermediate level in a language, your priority is learning correct grammar and vocabulary, not achieving beautifully elegant language. You need to focus on what’s wrong, rather than what’s correct but could be marginally better.

You also need to know the difference between a change that makes something correct and one that makes it better.

Here’s an example from Go Correct.

‘Huge’ is absolutely fine here and whereas ‘only’ could improve the style it’s certainly not incorrect without it.

**Why is this a problem if AI is right most of the time?**

For me, as a language learner, I don’t like this element of doubt because it makes me doubt every correction the AI gives me.

When using an AI tutor to practise Spanish, I look at some corrections and immediately see and agree that I obviously got it wrong but it makes me want to double check the rest with a human, which kind of defeats the whole purpose.

A human teacher would make a judgement call about what changes are worth focusing on, depending on the level of the learner. I have not yet seen an AI tutor that does that.

My advice to language learners using AI tutors

Focus on the corrections where you can immediately see and understand what you got wrong. You can rely on that because the correction is backed up by the grammar rules that you already know but temporarily forgot. For everything else, be cautious!

How often does AI get it wrong?

I recently analysed non-native English texts corrected by AI in Go Correct, the app I own. Each text was around 30–70 words with an average of roughly 3–4 mistakes in each text.

Here are the % of texts with at least one inaccurate correction:

GPT 3, Prompt V1: 80%

GPT 4o, Prompt V1: 78%

GPT 4o, Prompt V2: 56%

This shows that the wording of the prompt can have a big impact but I believe even the most perfectly crafted prompt would never give 100% accuracy. (Please let me know if you believe you’ve found the holy grail of prompts! I’d love to test it.)

I have also done similar tests with other AI language tutor apps (Univerbal, Langua and Lengi) on a more informal basis and the results are roughly similar. I may do a more scientific and detailed comparison of each but developers are probably iterating on their prompts and models all the time so the result may not stand for long.

How the corrections are displayed matters too

Displaying the corrections in a way that’s easy to read and digest makes a big difference and some apps do this better than others.

Go Correct does it really well but Langua Talk does it badly, creating a lot of hard work to jump your eyes back and force between the two texts to compare them.

What does all this mean?

Since 2017 I’ve run Go Correct as an app where English learners can have a human correct mistakes in their daily writing practice. By 2024 it became inevitable that I would add the option to have AI do the job instead of a human. However, users are still paying to have a human correct their English, even when the AI option is available at a much cheaper price.

I was initially resistant to using AI in my own app because I don’t like the unreliability and the uncertainty it creates. However, I have recently started to come round to the idea and realised that AI feedback can be useful if approached in the right way.

I am interested to see how long it will be before AI is so reliable that all language learners can confidently rely on it for this task.