Por: J. Daniel Aromí - IIEP/Baires UBA/Conicet

From the role of expectations in economic cycles to computational linguistics and how investors’ moods change with the ups and downs of the financial market, people’s beliefs play a key role in economics. The need to measure and monitor what citizens think.

Economic dynamics are largely explained by subjective states. For this reason, we can improve our understanding of economic events by better measuring these subjective aspects. For example, in studies of economic fluctuations, understanding how expectations of economic growth evolve is a matter of particular interest. Similarly, explanations for fluctuations in financial markets improve in line with our ability to document how those who participate in these markets assess and evaluate them. In many circumstances, the success of economic policies is the result of adequate understanding and management of the perceptions of economic agents.

Given this interest in subjective issues, opinion polls and expectations surveys are a widely used resource: through them, participants report their beliefs regarding economic issues of interest. These resources are the basis for the well-known consumer confidence indicators[1] and the compendia of macroeconomic forecasts compiled by professional analysts.[2] In this way, subjective information allows us to anticipate economic dynamics and to improve our understanding of the mechanisms that explain those dynamics that are of interest to us.

There is no doubt as to the value of this traditional way of measuring subjective states. However, this evidence can be supplemented by other sources of information: computational linguistics may be able to make a particular contribution to the study of subjective states.


The Automatic Processing of Natural Language

Computational linguistics has developed a set of tools that allow information to be extracted automatically from texts expressed in natural language. It has also made progress on the automatic generation of messages that can be interpreted by humans.

This field of knowledge has grown rapidly in recent years thanks to the availability of large quantities of texts in digital format, the growth in computational capacity, and the development of new techniques that allow information to be automatically extracted and generated. The best-known applications of computational linguistics are text classification, translation, and communication with humans in commercial applications.[3] These tools are also used to measure consumer perceptions in the field of marketing or with regard to the public image of political figures. Perhaps surprisingly, its fields of application also include medicine.[4]

In economics, these tools can improve our understanding of subjective aspects by extracting information on people’s level of attention to certain topics or their perceptions of events or economic agents. For example, using these techniques, we can infer people’s levels of interest in fiscal reform processes, exchange rate policy programs, or commercial integration initiatives. In addition to scrutinizing attention levels, text processing allows us to extract information on the positive or negative opinions people have expressed regarding economic entities or processes of interest. For example, you can build confidence indicators on a country’s economy or certain policy initiatives. Input sources for such exercises can include texts from the media, transcripts of speeches or TV programs, messages on social networks, discussions among policy makers, researcher reports, or documents generated by consultants or corporations.

There are many reasons to believe that these tools increase our capacity for measuring the subjective aspects of economics. In the first place, the availability of large quantities of texts allows us to generate subjective measurements over long periods of time. One noteworthy example is Garcia (2013), who uses texts from the New York Times to generate measures of investor sentiment over a period of 100 years (1905–2005). In addition, automatic text processing allows us to infer aspects of subjective states that cannot be captured through surveys. On the one hand, there is a significant body of psychology and neuroscience literature[5] that indicates that a substantial amount of the mental processes that explain our attitudes and behavior take place outside of our conscious control. It is thus likely that there is a large set of information on subjective states that does not emerge in responses to questionnaires but that can be inferred by summarizing large quantities of text. On the other hand, in some circumstances, subjective reports may include an error factor associated with strategic motivations.[6]


Methodological Issues

The automatic extraction of information from natural language requires that certain significant methodological challenges be solved. Natural language is hard to interpret because of the very different ways in which we humans express our ideas. Unlike formal language, there are no explicit rules, and these unwritten rules change according to context. The methods used must be adapted to the context in which the content was generated, the resources available, and the aim of the exercise.

In some cases, the techniques used to process text automatically involve simple analyses in which the occurrence of a preestablished word or set of words is measured, such as “recession.”[7] The way such exercises summarize information is already preestablished, that is, there is no learning regarding how information is extracted. In contrast, other exercises use frontier artificial intelligence methodologies through which they learn to interpret messages. For example, there are methods which learn to interpret sentences through recursive neural networks.[8] In these cases, the algorithm learns to interpret information that goes beyond any a priori information that has been provided at the start of the exercise.

As noted above, in studies related to economics, the best-known exercises have to do with identifying the topic of a text and inferring whether it contains positive or negative assessments of certain economies, policies, or agents. I will now describe some of the techniques that are most often used to carry out such tasks.

One way of identifying themes in a set of texts is to generate subsets of texts that are grouped by similarity. This usually entails statistical techniques that group texts according to the frequency with which different words appear. The most common example of this type of technique is the latent Dirichlet allocation.[9] When thematic categories have been preestablished, a set of preclassified texts tends to be used first so that an algorithm can learn, by induction, to classify new texts. These learning techniques include the naive Bayes classifier, which learns to classify by aggregating information on the frequency of individual words. An alternative tool known as a support vector machine involves representing each document through vectors and identifying hyperplanes that separate texts into different categories. In some cases, the analysis extends not only to information on words but also to n-grams, that is, sequences of n words.

With regard to inferring assessments or evaluations from texts, one simple method that has been used successfully consists of computing the occurrence of words with positive or negative content. The positive or negative words may be from lexicons that have already been developed by other linguists[11] or may involve new lists generated by the researcher.[12] As was the case with thematic classification, supervised learning methods like the naive Bayes classifier or support vector machines are also used in this method.[13] In some cases, the techniques used process texts globally, trying to infer the meaning of phrases and sentences. For example, this is the case of the study by Socher et al. (2013), which was mentioned above.


Applications in Economics

In economics there are various studies that have made use of automatic text processing and yielded positive results. The most noteworthy of these is the widely recognized study by Gentzkow and Shapiro (2010), which examines biases in print media through its focus on the similarities between the speeches of members of congress and texts from newspapers. Baker et al. (2015) measure the occurrence of references to the “uncertainty of economic policy” in the press and found that this result allows variations in the economic growth rate and unemployment levels to be anticipated. Hansen et al. (2014) use computational linguistics techniques to analyze deliberations over monetary policy. Tetlock (2007) and Garcia (2013), mentioned above, find that measures of optimism in The Wall Street Journal and The New York Times allow changes in the expected return on the stock market over the following days to be anticipated.

A central issue in the study of economics has to do with the explanation for aggregate fluctuations in activity levels. In this regard, a key determinant that is nonetheless difficult to measure is the level of confidence expressed by economic agents. Levels of optimism regarding a country’s economic performance can be approximated using these tools. Another simple approach that has brought about interesting results is computing the number of negative words in texts related to the country in question. Figure 1 presents this type of index for Greece between 1984 and 2013. Articles published in The Wall Street Journal and The Economist were used to create the index. It can be observed that in 2006, around 5% of words had negative content. This value is the lowest in the series, that is, it represents the highest levels of confidence. Three years later there was a violent increase in negative words that coincided with Greece’s economic crisis. By 2010, the frequency of negative words had increased by approximately 80%.


Figure 1. Amount of Negative Words in Texts Related to Greece

Fuente: Elaboración propia.

Source: Compiled by the author.


Beyond the contemporary correlation with economic performance, it is interesting to note that these measurements allow us to improve our understanding of the mechanisms that determine the paths of different economies. According to recent studies, these indices anticipate errors in predictions of economic growth and returns differentials on financial assets.[14] These lagged associations indicate that periods of optimism are followed, on average, by negative surprises with regard to economic growth and the poor performance of financial assets.  Not only is this type of information of academic value, it can also guide economic policy decisions.

In conclusion, the techniques employed in a given field of knowledge are a function of resource availability and people’s beliefs regarding the rules that govern the system. It is reasonable to argue that current conditions are ripe for the intense use of computational linguistics techniques in economic studies, for which there is great potential, as a result of both the resources available and the importance now placed on subjective factors.



Aromí, D. 2015a. “Conventional Views and Asset Prices: What to Expect after Times of Extreme Opinions?” Mimeo, IIEP/Baires UBA/CONICET.

Aromí, D. 2015b. “Evaluating the Efficiency of Growth Forecasts.” Mimeo, IIEP/Baires UBA/CONICET.

Baker, S.R.; Bloom, N.; and Davis, S.J. 2015. Measuring Economic Policy Uncertainty, No.w21633. National Bureau of Economic Research.

Blei, D.M.; Ng, A.; and Jordan, M.I. 2003. “Latent Dirichlet Allocation.” The Journal of Machine Learning Research, 3: 993–1022.

Chapman, W.W.; Nadkarni, P.M.; Hirschman, L.; D’Avolio, L.W.; Savova, G.K.; and Uzuner, O. 2011. “Overcoming Barriers to NLP for Clinical Text: the Role of Shared Tasks and the Need for Additional Creative Solutions.” Journal of the American Medical Informatics Association, 18 (5): 540–543.

Damasio, A.R. 2006. Descartes’ Error. New York: Random House.

Garcia, D. 2013. “Sentiment during Recessions.” The Journal of Finance, 68 (3): 1267–1300.

Gentzkow, M. and Shapiro, J.M. 2010. “What Drives Media Slant? Evidence from US Daily Newspapers.” Econometrica, 78 (1): 35–71.

Hansen, S.; McMahon, M.; and Prat, A. 2014. Transparency and Deliberation within the FOMC: A Computational Linguistics Approach.

Hirschberg, J. and Manning, C.D. 2015. “Advances in Natural Language Processing.” Science, 349 (6245): 261–266.

Kahneman, D. 2011. Thinking, Fast and Slow. Macmillan.

Loughran, T. and McDonald, B. 2011. “When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10‐Ks.” The Journal of Finance, 66 (1): 35–65.

Manning, C D.; Raghavan, P.; and Schütze, H. 2008. Introduction to Information Retrieval, 1(1): 496. Cambridge: Cambridge University Press.

Pang, B.; Lee, L.; and Vaithyanathan, S. 2002. “Thumbs Up?: Sentiment Classification Using Machine Learning Techniques.” Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, 10 (July): 79–86. Association for Computational Linguistics.

Socher, R.; Perelygin, A.; Wu, J.Y.; Chuang, J.; Manning, C.D.; Ng, A.Y.; and Potts, C. 2013. “Recursive Deep Models for Semantic Compositionality over a Sentiment Treebank.” Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) (October): 1631–1642.

Tetlock, P.C. 2007. “Giving Content to Investor Sentiment: The Role of Media in the Stock Market.” The Journal of Finance, 62 (3): 1139–1168.

Tillmann, P. 2011. “Strategic Forecasting on the FOMC.” European Journal of Political Economy, 27(3): 547–553.



[1]See, for example, the University of Michigan’s traditional surveys (http://www.sca.isr.umich.edu/).

[2]See, for example, the forecasts released by the Federal Reserve Bank of Philadelphia  (https://www.philadelphiafed.org/research-and-data/real-time-center/survey-of-professional-forecasters/) and Consensus Economics (http://www.consensuseconomics.com/).

[3]For a recent description of the field of computational linguistics, see Hirschberg and Manning (2015).

[4]See, for example, Chapman et al. (2011).

[5]See, for example, the classic study by Damasio (1993) or the issues discussed in Kahneman (2011).

[6] See, for example, Tillmann (2011).

[7] See http://www.economist.com/blogs/dailychart/2011/09/r-word-index.

[8] See Socher et al. (2013).

[9] Blei et al. (2003).

[10]Manning et al. (2008) provide an accessible explanation of these techniques.

[11]For example, the classic General Inquirer list (http://www.wjh.harvard.edu/~inquirer/homecat.htm) has brought good results.

[12]See, for example, Loughran and McDonald (2011).

[13]A notable example in this sense is Pang et al. (2002), who implement these techniques to infer information from film reviews.

[14]See Aromí (2015a and 2015b).