Though a bit odd, it's not unheard of for a guy to pretend to be a girl—or vice versa—on the internet. But researchersdeveloped a new algorithm which analyzes the content of tweets and predicts a user's gender.
According to Fast Company, Researchers at the Mitre Corporation found a sample base of users whose gender they were certain of, and then let their algorithm run through the tweets of those users. They ended up with a sample set that was 55% female and 45% male. How well did the algorithm fare?
The Mitre findings become intriguing, though, when the team limited its analysis to tweets alone. By scanning for patterns in all the tweets of a given user, Mitre's program was able to guess the correct gender 75.8% of the time—a 20% improvement over the baseline. And even just by analyzing a single tweet of a user, it was right 65.9% of the time—an over 10% improvement over the baseline.
So exactly what does the algorithm do to attain such accuracy? They used the concept of sociolinguistics, which looks at differences in speech between different groups of people, specifically, how they combine certain words and expressions together.
Mitre found that given certain characters or combinations of characters, the computer could wisely bet on the gender of the tweeter. The mere fact of a tweet containing an exclamation mark or a smiley face meant that odds were a woman was tweeting, for instance.
What's more, is that when the algorithm also looked at the name, user name and bio of a Twitter user, prediction accuracy rose to 92.8%. That said, the researchers were willing to admit that the data was skewed since it only considered the social media culture when analyzing speech patterns. [Fast Company via Atlantic Wire]