You Know Right Away That This Is Porn. Will The Computer Understand? - Alternative View

Table of contents:

You Know Right Away That This Is Porn. Will The Computer Understand? - Alternative View
You Know Right Away That This Is Porn. Will The Computer Understand? - Alternative View

Video: You Know Right Away That This Is Porn. Will The Computer Understand? - Alternative View

Video: You Know Right Away That This Is Porn. Will The Computer Understand? - Alternative View
Video: What Does Pornography Do to Your Brain? 2024, May
Anonim

Tumblr announced early last month that it would ban porn. When the new content policy went into effect, about two weeks later - on December 17 - it became apparent that there would be problems. After deploying an artificial intelligence system that was supposed to ban all pornography on the site, it mistakenly flagged innocent posts in 455.4 million blogs on the site among 168.2 billion posts: vases, witches, fish and all that jazz.

Pornography for artificial intelligence

While it's unclear which automatic filter Tumblr used or created its own - the company didn't respond to inquiries on the topic - it's clear that the social network is stuck between its own politics and technology. For example, the site's inconsistent stance on "women showing nipples" and artistic nudity, for example, has led to contextual decisions that demonstrate that even Tumblr doesn't know what to ban on its platform. How can a frequent company determine what it considers obscene?

First, blocking risky content is difficult because it is difficult to define what it is from the outset. The definition of obscenity is a bear trap that is more than a hundred years old, back in 1896, the United States first passed laws regulating obscenity. In 1964, in Jacobellis v. Ohio, over whether Ohio could ban the screening of a Louis Malle film, the Supreme Court issued what is probably the most famous definition of hardcore pornography today: as I understand it will be included in the verbatim description; and I may never be able to make it intelligible,”said Judge Potter Stewart. "But I know what it is when I see it, and the movie associated with this case is not."

Machine learning algorithms have the same problem. This is exactly the problem Brian Delorge, CEO of Picnix, a company that sells specialized artificial intelligence technology, is trying to solve. One of their products, Iris, is a client-side application for detecting pornography to “help people,” as Delorge says, “who don't want porn in their lives.” He notes that the particular problem with porn is that it can be anything, a bunch of different things - and images that are not pornographic can have similar elements. The beach party image may be blocked not because it has more skin on it than the office photo, but because it is on the edge. “This is why it is very difficult to train an image recognition algorithm to do everything at once,” says DeLorge."When the definition becomes difficult for humans, the computer also has difficulty." If people can't agree on what porn is and what not, can a computer even hope to know the difference?

In order to teach an AI to detect porn, the first thing you need to do is feed it porn. Lots of pornography. Where can I get it? Well, the first thing people do is download a bunch of vidos from Pornhub, XVideos, says Dan Shapiro, co-founder of Lemay.ai, a startup that creates AI filters for his clients. "This is one of those gray areas of the legal nature - for example, if you learn from other people's content, does it belong to you?"

After programmers download tons of porn, they cut out non-pornographic footage from the video to make sure the footage used doesn't block the pizza delivery guys. Platforms pay people, mostly outside the US, to tag such content; the job is low-paid and boring, like entering a captcha. They just sit and note: this is porn, this is this. You have to filter a little, because all porn comes out with a label. Learning is better if you use not just photographs, but large data samples.

Promotional video:

“Often times, you don't just have to filter the porn, but rather the accompanying material,” says Shapiro. "Like fake profiles with a girl's photo and phone." He is referring to sex workers looking for clients, but it could be anything that isn't entirely legal. "This is not porn, but this is the kind of thing you don't want to watch on your platform, right?" A good automated moderator learns from millions - if not tens of millions - of sample content, which can save you tons of man-hours.

“You can compare that to the difference between a child and an adult,” says Matt Zeiler, CEO and founder of Clarifai, a computer vision startup that does this kind of image filtering for corporate clients. “I can tell you for sure - a couple of months ago we had a baby. They don't know anything about the world, everything is new for them. " You have to show the child (the algorithm) a lot of things so that he understands something. “Millions and millions of examples. But as adults - when we have created so much context about the world and understood how it works - we can learn something new from just a couple of examples. " (Yes, teaching an AI to filter adult content is like showing a child a lot of porn.) Companies like Clarifai are growing rapidly today. They have a good database of the world, they can tell dogs from cats, dressed from naked. Zeiler's company uses its models to train new algorithms for its clients - since the original model processed a lot of data, personalized versions would only require new datasets to work.

However, it is difficult for the algorithm to get it right. It does well with content that is obviously pornographic; but a classifier might incorrectly mark an underwear ad as off-limits because the picture has more leather than, say, an office. (With bikinis and underwear, according to Zeiler, it is very difficult). This means that the marketers should focus on these edge cases in their work, prioritizing the difficult to classify models.

What's the hardest part?

"Anime porn," Zeiler says. "The first version of our nudity detector did not use cartoon pornography for education." Many times the AI got it wrong because it didn't recognize hentai. “After working on this for the client, we injected a lot of his data into the model and significantly improved the accuracy of the cartoon filter while maintaining the accuracy of real photographs,” says Zeiler.

The technology that has been taught to sniff out porn can be used on other things as well. The technologies behind this system are remarkably flexible. This is more than anime tits. Jigsaw from Alphabet, for example, is widely used as an automatic comment moderator in a newspaper. This software works in a similar way to image classifiers, except that it sorts by toxicity rather than nudity. (Toxicity in textual comments is as difficult to determine as pornography in pictures.) Facebook uses this kind of automatic filtering to detect suicidal messages and terrorism-related content, and has tried to use this technology to detect fake news on its massive platform.

All this still depends on human supervision; we are better at handling ambiguity and ambiguous context. Zeiler says he doesn't think his product has taken anyone's job. It solves the problem of internet scaling. Humans will still train AI by sorting and labeling content so that the AI can distinguish it.

This is the future of moderation: customized, turnkey solutions provided to companies that do their entire business by teaching more and more advanced classifiers more data. Just like Stripe and Square offer out-of-the-box payment solutions for businesses that don't want to process them themselves, startups like Clarifai, Picnix, and Lemay.ai will do online moderation.

Dan Shapiro of Lemay.ai is hopeful. “As with any technology, it is still in the process of being invented. So I don't think we will give in if we fail. " But will AI ever be able to operate autonomously without human oversight? Unclear. “There is no little man in a snuff box filtering every shot,” he says. "You need to get data from everywhere in order to train the algorithm on it."

Zeiler, on the other hand, believes that one day artificial intelligence will moderate everything on its own. In the end, the number of human interventions will be reduced to zero or little effort. Gradually, human efforts will turn into something that AI cannot do now, like high-level reasoning, self-awareness - everything that humans have.

Recognizing pornography is part of that. Identification is a relatively trivial task for humans, but it is much more difficult to train an algorithm to recognize nuances. Determining the threshold when a filter marks an image as pornographic or non-pornographic is also a difficult task, partly mathematical.

Artificial intelligence is an imperfect mirror of how we see the world, just like pornography is a reflection of what happens between people when they are alone. There is some truth in it, but there is no complete picture.

Ilya Khel