The New York Attorney General’s office co-authored new research on algorithms meant to examine millions of Instagram posts.
New York state’s top cops want to use machine-learning algorithms to detect drug dealers on social media networks like Instagram, a trend that “has become a severe problem in recent years,” according to researchers from the University of Rochester and the New York Attorney General’s office.
Using social media to sell drugs began years ago and continues to this day. Newer networks like Tinder have become especially popular with drug dealers because they offer both sellers and customers a deal in close proximity. All of the networks rely on manual user reports to remove the illegal content in what has largely been a losing battle.
The New York Attorney General’s office co-authored new research on algorithms meant to examine millions of Instagram posts, spotlight drug dealers, and only then pass the suspects on to human officers for further investigation.
The process begins with a dictionary of terms related to drug dealing and provided by experts. That could include hashtags like #Weed4Sale, which has been widely used to sell marijuana on Instagram in recent years.
Next, image and text “classifiers” are trained to filter potential drug-related posts based on hashtags or other keywords. Classifers are tools used to separate data into different classes—drug dealer or not drug dealer—based on photos, behavior, text, and other pieces of information. The example above shows the search for “xans” in which Xanax pills are sold on Instagram (another term in the expert dictionary at the foundation of this project).
Looking at just the 30 most recent posts with the word “xans” in them, the classifier identified two drug dealer accounts.
A classifier has to be able to deal with a vast variety of drug-related posts as well as the considerable noise of social media. It also uses search engines like Google Image Search to learn beyond manual labels. (More detailed information on the classifiers’ innards can be found in the full research paper shown below.)
The decisions of the two classifiers are combined. Next in the analysis is timeline data where behavior patterns are scrutinized. If you post about drugs near midnight, for instance, or if a majority of your posts are drug-related, the research suggests you are far more likely to be dealing drugs than simply discussing them.
The increased use of algorithms to investigate crimes adds yet another wrinkle to issues of free speech online. Will social-media users be more scared to discuss controversial topics like drugs because it could land them on a police watch-list?
There’s precedent for worrying about the potential for stifling speech online. Former NSA contractor and whistleblower Edward Snowden’s revelations of vast government online surveillance has a significant chilling effect on the way people learned about extremism on the Internet.
At a moment where the decades-old war on drugs is facing a political reckoning, as legalization and decriminalization of drugs become increasingly popular in the United States, what will this new form of surveillance mean for discussing and learning about drugs on the Internet?
While these questions have yet to be answered, the move toward artificial intelligence in police investigations may net fewer innocent people drawing law enforcement attention due to increased accuracy.
Drug dealer accounts on Instagram are expected to have more key markers. Research shows they’ll have far more followers than accounts they follow. Dealer accounts will also have unspecified “evidence of transactions,” the researchers wrote, that can be found in an account’s self-written bio or in the comment section of posts and pictures.
Combining all that, the classifier spits out the suspected drug dealer accounts to be manually inspected.
The researchers say their algorithm significantly outperformed human experts. With everything added up, the analysis earned an F-score (a measure of the test’s accuracy) of 0.75 out of 1.0. Human experts scored, at their best, a 0.51.
You can read the full research paper below:
Pure, uncut internet. Straight to your inbox.