Web Builder

Fighting misinformation with machine learning

 Feb 20, 2019

""This article was written for application of Master's program at Technical University of Munich, Germany""

Fighting misinformation today is a tedious task of the broader difficulty of abuse of social media platforms. Though not foolproof, the most common method to cope with misinformation is to fact-check claims. The empirical facts show that tremendous facts do not have a massive advantage over low-quality records in on-line social networks [1]. A number of systems, tools, and datasets have been proposed to help research efforts about misinformation. Mitra and Gilbert, for example, proposed CREDBANK, a dataset of tweets with related credibility annotations [2]. Some structures let users visualize the unfold of rumors online. The most notable are TwitterTrails and RumorLens [3,4]. These systems, however, lack monitoring capabilities. The Emergent web site detected unverified claims on the Web, tracking whether or not they were due to this fact verified, and how plenty they have been shared. The approach was based totally on guide curation, and thus did not scale up much. 

Undoubtedly, the improvement towards high quality countermeasures requires a correct perception of the problem, as well as an evaluation of its ever-increasing magnitude. To date, the debate on these issues has been informed through restricted evidence. Studies of news consumption on Facebook disclose that users have a natural tendency to confine their attention on a restricted set of pages by choice or by design [5]. In comparison, the interior investigations performed by the systems themselves appear to be primarily based on comprehensive disaggregated datasets, but lack transparency, owing to the two-fold hazard of jeopardizing the privacy of users and of disclosing inner records that ought to be probably exploited for malicious functions.
Motivated by using these limitations in previous work, it appears pertinent to the truthfulness of a claim. This provides numerous difficulties, essentially scalability! This lets us know, for any given misinform- ation in our corpus, the full picture of how it spreads and competes with subsequent fact-checking, if any. Shao et al.[6] in their study concluded that ‘upward jostle of digital misinformation is calling into question the integrity of our facts ecosystem’. The precedence of disconnected customers is determined by means of rating on the groundwork of different centrality metrics. Moreover, they discussed in detail the context and concluded mainly as follows:
• Most of the claims in the mainstream media are sooner or later verified, many continue to be unverified, and some even flip out to be false. 
• Some instances of misinformation might also see their unfold boosted as a end result of additional exposure on mainstream news outlets. 
• The dynamics of the broader media and information ecosystem is consequently wanted to completely recognize the phenomenon of digital misinformation.
The challenges associated with fighting misinformation are quite evident and concluded above but the next question to be answered is whether machine learning can help us deal with the enormous and humongous amount of fake information bearing different patterns, types, origins, structure and characteristics. No doubt, machine learning is the closest way to have artificially intelligent tools to cope up with this problem.
There can be mainly three types of machine learning based solutions to this problem and we have tried to discuss some of them below
1. Neural Network Model
There can be different ways to train neural networks or machine learning models on available dataset of real and fake information. Recurrent Neural Network is popular in Natural Language Processing. Convolutional neural networks (CNN) are used for text classification tasks[10] . One of the ways is to use generative adversarial neural network for feature based detection and deconvolutional neural networks for pattern based detection where every part of the information is assigned a rating or a probability and then feature extraction is done to evaluate results[7]. Another such example is to use deep recurrent neural network model to perform feature extraction and Deep Diffusive Network Model to detect fake information[8]. 
2. Supervised machine learning models
Supervised machine learning algorithms such as Support Vector machine, K-nearest neighbour, decision tree and Naive Bayes are used to detect and deal with fake information. This needs a dataset to train the model. The basic steps involved are data preprocessing, word-to-vector conversion, attribute selection, feature selection, using supervised machine learning algorithms to train the model and finally testing the model on input and real data[9]. Natural Language Processing (NLP) and semantic analysis use these algorithms to detect fake information as well.
We can see there are several effective ways to fight ,detect, and reduce misinformation and fake news with the use of machine learning tools and algorithms but the question arises over accuracy and efficiency. With the generation of innumerable bytes of data everyday, it’s no less than a grand challenge to have very efficient machine learning tool to combat misinformation. 
One such claim has been done by the team of University of Michigan regarding the development of a tool outperforming humans to detect misinformation [11].The team fed fake as well as real news to the algorithm and it taught itself how to differentiate between the two. After training , the machine/tool was fed with the news from a dataset containing both real and fake news directly from the web and it did gave wrong results for 24% of the cases as compared to humans who couldn’t identify the fake news 30% of the time.
Another popular misinformation detection tool is deepnews.ai which actually focuses on the fake news and journalism. It uses two machine learning models to collect quantifiable signals from html pages of fake news and use text embedding to score the text and generate result respectively[12]
Apart from these, there are other fake information detection tools which leverage the abilities of machine learning and artificial intelligence to generate better results , like FakerFact which is trained on millions of articles, journals, news etc and evaluate the truthfulness of the information on a numerical scale.[12] 
Researches are being carried to develop a better algorithm and solution but we can definitely conclude that machine learning has to deal with a lot of challenges due to the ever-changing structure of misinformation.


1. Ratkiewicz J, Conover M, Meiss M, GoncËalves B, Patil S, Flammini A, et al. Truthy: Mapping the Spreadof Astroturf in Microblog Streams. In: Proceedings of the 20th International Conference Companion onWorld Wide Web. WWW'11. New York, NY, USA: ACM; 2011. p. 249±252. Available from: http://doi.acm.org/10.1145/1963192.1963301. 

2. Mitra T, Gilbert E. CREDBANK: A Large-Scale Social Media Corpus With Associated Credibility Annotations. In: Proc. International AAAI Conference on Web and Social Media. Palo Alto, CA: AAAI; 2015. p. 258±267. Available from: https://www.aaai.org/ocs/index.php/ICWSM/ICWSM15/paper/view/10582. 

3. Metaxas PT, Finn S, Mustafaraj E. Using TwitterTrails.com to Investigate Rumor Propagation. In: Proceedings of the 18th ACM Conference Companion on Computer Supported Cooperative Work & Social Computing. CSCW'15 Companion. New York, NY, USA: ACM; 2015. p. 69±72. Available from: http://doi.acm.org/10.1145/2685553.2702691.

4. Carton S, Park S, Zeffer N, Adar E, Mei Q, Resnick P. Audience Analysis for Competing Memes in Social Media. In: Proc. International AAAI Conference on Web and Social Media. Palo Alto, CA: AAAI; 2015. p. 41±50. Available from: https://www.aaai.org/ocs/index.php/ICWSM/ICWSM15/paper/view/ 10592.

5. Del Vicario M, Bessi A, Zollo F, Petroni F, Scala A, Caldarelli G, et al. The spreading of misinformation online. Proc National Academy of Sciences. 2016; 113(3):554±559. https://doi.org/10.1073/pnas.1517441113

6. Shao C, Hui P-M, Wang L, Jiang X, Flammini A, Menczer F, et al. (2018) Anatomy of an online misinformation network. PLoS ONE 13(4): e0196087. https://doi.org/10.1371/journal.pone.0196087
7. Parsa Yousefi Fake News and the Detection Methods from Psychology to Machine Learning https://hackernoon.com/fake-news-and-the-detection-methods-from-psychology-to-machine-learning-part-1-facbadac3e85
8. Zhang J, LimengCui, YanjieFu, Gouza F. Fake News Detection with Deep Diffusive Network Model. Available from : https://arxiv.org/pdf/1805.08751.pdf
9. Elmurngi E, Gherbi A. An empirical study on detecting fake reviews using machine learning techniques. Available from :https ://www. Researchgate .net/publication /320971538_ An_ empirical_ study_ on_ detecting fake_reviews_using_machine_learning_techniques  
10. Oshikawa R, Qian J, Wang W Y. A Survey on Natural Language Processing for Fake News Detection . Available from :https://arxiv.org/pdf/1811.00770.pdf
11. Fake news detector algorithm works better than a human . Available from : https://news.umich.edu/fake-news-detector-algorithm-works-better-than-a-human/
12. Greenup S. Catalogue of all projects working to solve Misinformation and Disinformation. Available from:https://misinfocon.com/catalogue-of-all-projects-working-to-solve-misinformation-and-disinformation-f85324c6076c