Janusz Ok, here comes the moment when I need to do some explanations. It’s going to be lengthy, I apologise for that.
We all keep hearing the term “artificial intelligence” all around, but few people understand that this is just a marketing trick. There is no such thing as Artificial Intelligence, in the most original sense of the word. Not yet anyway. Instead, there is a set of quite effective machine learning methods, which are developing rapidly and get better and better at solving some tasks.
What is a machine learning algorithm? This is a kind of program (and it doesn’t matter on neural networks or any other architecture) that learns from examples, and which tries to generalize the examples obtained, and reproduce similar behavior on other examples from the subject area, which it has never seen before.
Tonfotos currently uses multiple different machine learning algorithms for different stages of face processing:
- Face detection in a photo
- Determining the geometry of faces for its subsequent normalization
- Calculating attributes of faces so that you can compare how similar they are
- Clustering similar faces into groups
However, what is important for our discussion, is that all algorithms are trained on some kind of training sample (which is never representative enough, and often also contains errors) to achieve maximum results. And the result is a certain metric, for example, the ratio of correctly and incorrectly guessed results (for example, faces).
So, we come to the main point. These neural networks never reach 100% quality as a result of training. There is and always will be some percentage of errors. And if some machine learning engineer says that he has reached 100%, then he most likely be fired for incompetence, because he just made common mistake of overfitting. In practice it means that the model simply memorized the entire training sample, instead of making generalisation, and will work poorly in real conditions.
So conclusion number 1: there will always be examples on which the network will make mistakes. No matter how good it is. Even if it gives 99.98% quality (which is generally fantastic for the face recognition task at the moment), this means that out of 100,000 recognized faces, 20 will be erroneous.
But that’s not all. There are very good neural networks from the best research centers that give a really good result, more than 99% (again, let’s leave aside for simplicity of presentation what exactly these percentages mean), for example, there is an article: https://paperswithcode.com/sota/face-verification-on-labeled-faces-in-the there are even the results of an independent study. And even these best networks in the world give errors of several tenths of a percent. However, those networks will not even run on regular home PC, unless it has couple of most expensive NVidia video cards plugged in for some reason.
Therefore, the second factor pops up here - in real conditions, simplified network architectures are always used, which represent a compromise between accuracy, speed and demand for computing resources. Which means that they give an even higher percentage of errors, but can still run on relatively outdated computers with reasonable speed.
And here we come to statistics. Tonfotos usually processes very large archives. Mine is about terabyte large. Under a hundred thousand photos, several hundreds of thousand faces. On such a scale, there will always be enough mistakes to demonstrate even for world-best networks, let alone ones that can run on average computers. So if we only focus on errors, we will aways find enough of them. However, why not focus on the bright side? @leonardoa mentioned 16,000 successfully recognised faces, and that only for just one person of many, right? I believe, program already finished working on something it was able to recognize with high accuracy, and now all that is left are challenging cases, that is why you see more and more of them. However, if you just look into proportion of successfully processed faces, I would say that program already did great job!
A funny fact. Usually people tend to evaluate these errors like the ones you sent above from the human perspective: “It’s obvious, as a person, I would never have make such a mistake, why program does it?” I do not have research at my hand to prove it, but in my experience with other AI systems I worked throughout my career, I would assume that percentage wise, even of those simplified networks that Tonfotos uses, most likely will beat average human in accuracy in fair experiment. It’s just experiment has to be fair. Just like program, person should also look at the thousands of plain photos of complete strangers and try to match them with the library of sample faces. Something tells me, that regular person would do that much slower and make significantly more mistakes. Yes, perhaps the mistakes will not be as “funny” as those that you have given here for example, a person will make mistakes on another type. But it will inevitably happen, and I dare to assume that people would be less efficient in this task than a program. Especially if the people on the photos are unfamiliar to the person.
Well, probably this is not kind of answer you expected. However, I hope it helps you to understand, that program does make mistakes from time to time, and this is totally OK, unless in general it does it job well. And it is totally OK if those mistakes may look stupid, because program is not human and analyses photos differently. There are definitely cases when program does much better job than people in face recognition(like on dark and low contrast photos).
Another point I would like to repeat - if you already marked up a lot of faces, and program starts to showing you more and more strange suggestions - that probably means there are no more good suggestions left to show you, and it tries to scrape a bit more out of what left. Probably it is time to stop there, as program already reached it’s limit.