{"id":2789,"date":"2016-08-21T01:30:18","date_gmt":"2016-08-21T05:30:18","guid":{"rendered":"http:\/\/www.servsig.org\/wordpress\/?p=2789"},"modified":"2025-05-10T02:42:25","modified_gmt":"2025-05-10T06:42:25","slug":"understanding-the-language-behind-big-data","status":"publish","type":"post","link":"https:\/\/www.servsig.org\/wordpress\/2016\/08\/understanding-the-language-behind-big-data\/","title":{"rendered":"Understanding the Language behind Big Data"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" class=\"alignright wp-image-2807\" src=\"http:\/\/www.servsig.org\/wordpress\/wp-content\/uploads\/2016\/06\/Picture3.png\" alt=\"Picture3\" width=\"350\" height=\"198\" srcset=\"https:\/\/www.servsig.org\/wordpress\/wp-content\/uploads\/2016\/06\/Picture3.png 664w, https:\/\/www.servsig.org\/wordpress\/wp-content\/uploads\/2016\/06\/Picture3-300x169.png 300w\" sizes=\"(max-width: 350px) 100vw, 350px\" \/><\/p>\n<p><em>guest article by\u00a0<span class=\"s1\">Francisco Villarroel Ordenes<\/span><\/em><\/p>\n<p>Unstructured text data from emails, SMS, blogs, online reviews and social media is exponentially growing offering organizations unprecedented resources to monitor brand communications and customer experience feedback (Forbes 2016). This has resulted in the development of an emerging class of research methods using text mining, the process of structuring large volumes of text data to discover explicit and implicit meanings. Text mining methods are currently applied on a wide range of business contexts such as automated sentiment detection from social media, speech recognition in call centers and customer\u2019s keyword search patterns (Forrester 2016). As such, latest news report shows that the market size for text mining (i.e., text analytics) is estimated to grow from USD 2.65 Billion in 2015 to 5.93 Billion by 2020 (Markets and Market 2016). Despite the increasing interest and investments on text mining methods its return on investment is still unclear (Altaplana 2014).<\/p>\n<div id=\"attachment_2790\" style=\"width: 640px\" class=\"wp-caption alignnone\"><a href=\"http:\/\/www.servsig.org\/wordpress\/wp-content\/uploads\/2016\/06\/Forrester-2016.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-2790\" class=\"wp-image-2790 size-large\" src=\"http:\/\/www.servsig.org\/wordpress\/wp-content\/uploads\/2016\/06\/Forrester-2016-e1471890619725-1024x496.jpg\" alt=\"Forrester 2016\" width=\"630\" height=\"305\" \/><\/a><p id=\"caption-attachment-2790\" class=\"wp-caption-text\">Graph 1: Text Analytics Adoption Trends\u00a0(Forrester 2016)<\/p><\/div>\n<p>As unstructured text data continues to grow, it becomes cumbersome increasing the utilization of text mining across all business disciplines. In fact, service researchers position text mining as one of the key modeling techniques to make sense of big customer data (Rust and Huang 2014). However, there is scarce evidence regarding its utilization in service studies. In this line, I would like to refer to three important factors that could increase the utilization of text mining methods in service and other business disciplines. First, while advances in computer science are providing more advanced methods to analyze textual data (e.g., deep learning); there is a need for a more in-depth theoretical discussion concerning the use of language as a research input. Second, as the number of text mining software and applications increases, it becomes imperative knowing their different strengths and weaknesses. Finally, the current gap in the business curricula regarding text mining (i.e., analytics) leaves a pressing need to the development of ad-hoc teaching material.<\/p>\n<p><strong>Linguistics lenses in text mining research<\/strong><\/p>\n<p>Text mining is at the intersection of linguistic communication theories and data mining techniques (Manning, Raghavan and Sch\u00fctze 2008). In other words, any attempt to automatically text-mine big datasets requires a good understanding of language in a determined context (e.g., law, business, etc.) and data mining techniques to extract meaningful and reliable metrics. The utilization of text mining in business research has been driven by the implementation of state of the art techniques, yet little emphasis has been given to the linguistic theories regarding online communications between customers and brands. Closing this gap would contribute to increase the implementation of text mining methods in a wider range of business phenomena. For example, the development of a project about \u201cIrony in Online Service Interactions\u201d would require first having a good understanding of why irony is expressed by customers or employees. Second, it would be important distinguishing between different types of irony and how frequently they are used in a determined context. Is there a difference between sarcastic, humorous, and satirical statements? Finally, it would be necessary the identification of language patterns that characterize different types of ironic statements. This can be done by using linguistic lenses such as the grammar (i.e., syntax) and the meaning of the text (i.e., semantics), the style of words or sentences (i.e., rhetoric), and the context of language (i.e., pragmatics).<\/p>\n<p><strong>Available Software and Applications<\/strong><\/p>\n<p>During my research I have tried a number of software and applications that could help service researchers in having a first approach to text mining. Here I will recommend some of the software that has been more valuable for my research:<\/p>\n<ul>\n<li>io (<a href=\"https:\/\/www.import.io\/\">https:\/\/www.import.io\/<\/a>): It is an open source application to collect any type of data from the web. If you are interested in scraping customer reviews or online community interactions, it is feasible to develop an automatic crawler to daily or weekly monitor text and other types of data.<\/li>\n<li>LIWC (<a href=\"http:\/\/liwc.wpengine.com\/\">http:\/\/liwc.wpengine.com\/<\/a>): One of the most used software for text mining in marketing research. It is a psycholinguistics tool (Tausczik and Pennebaker 2010), developed to extract the proportion of different word categories (e.g., cognitive words or first person pronouns) from any type of document. When applied to a single document, the software provides an intensity score per each dictionary category (e.g., cognitive words divided by the total number of words in the document) (see its application in: Ludwig et al. 2014).<\/li>\n<li>SentiStrength (<a href=\"http:\/\/sentistrength.wlv.ac.uk\/\">http:\/\/sentistrength.wlv.ac.uk\/<\/a>): It is an open source tool for automated sentiment analysis. It automatically computes sentiment measures of short texts on a scale ranging from -1 to -5 for negative and 1 to 5 for positive (Thelwall 2010). The tool has a very specific application (only sentiment analysis), but it is very user-friendly and particularly useful for Twitter data (see its application in: Tang, Fang and Wang 2014).<\/li>\n<li>SPSS Modeler: It is a user-friendly software for varying text mining tasks. It does not require programing skills and it incorporates useful features such embedded word dictionaries, the option to easily develop in-house dictionaries, and the alternative to build regular expression rules to extract word patterns (see its application in: Villarroel Ordenes et al. 2014).<\/li>\n<li>Knime (<a href=\"https:\/\/www.knime.org\/\">https:\/\/www.knime.org\/<\/a>): It is an open source alternative and requires basic or intermediate programming skills. It has a number of tools such as collecting data from Twitter, parsing text into sentences, uploading own dictionaries, tagging words into parts of speech (adjective, nouns, etc.), text clustering and \u201cR\u201d integration. It has also an active community of users, which can be very helpful for researchers starring with the software.<\/li>\n<\/ul>\n<div id=\"attachment_2791\" style=\"width: 640px\" class=\"wp-caption alignnone\"><a href=\"http:\/\/www.servsig.org\/wordpress\/wp-content\/uploads\/2016\/06\/KnimeExample.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-2791\" class=\"wp-image-2791 size-large\" src=\"http:\/\/www.servsig.org\/wordpress\/wp-content\/uploads\/2016\/06\/KnimeExample-1024x492.jpg\" alt=\"KnimeExample\" width=\"630\" height=\"303\" srcset=\"https:\/\/www.servsig.org\/wordpress\/wp-content\/uploads\/2016\/06\/KnimeExample-1024x492.jpg 1024w, https:\/\/www.servsig.org\/wordpress\/wp-content\/uploads\/2016\/06\/KnimeExample-300x144.jpg 300w, https:\/\/www.servsig.org\/wordpress\/wp-content\/uploads\/2016\/06\/KnimeExample-768x369.jpg 768w, https:\/\/www.servsig.org\/wordpress\/wp-content\/uploads\/2016\/06\/KnimeExample.jpg 1487w\" sizes=\"(max-width: 630px) 100vw, 630px\" \/><\/a><p id=\"caption-attachment-2791\" class=\"wp-caption-text\">Figure 2: Knime workflow for text classification tasks (Knime 2016)<\/p><\/div>\n<p><strong>Teaching Material for Text Mining <\/strong><\/p>\n<p>Business organizations are demanding students more preparation in research methods such as sentiment analysis or content analytics (Gartner 2015). In fact, as data will be coming less from surveys and more from real online interactions, it has become more important students\u2019 training into the processes of gathering, analyzing and validating unstructured data (Edgington 2011). Overcoming this gap will demand an increasing collaboration across researchers in the development of teaching material regarding the use of text mining. In this line, the interdisciplinary orientation of service researchers opens a good opportunity for more collaboration towards the development of teaching material.<\/p>\n<p>In the age of big data, I believe that the use of automated text mining will continue gaining relevance for business and our discipline. To further expand the use of text mining in service research and teaching I look forward to more opportunities to discuss, share and develop this emerging field.<\/p>\n<p><em><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-2808 size-thumbnail\" src=\"http:\/\/www.servsig.org\/wordpress\/wp-content\/uploads\/2016\/06\/F_Villarroel_Ordenes-150x150.png\" alt=\"F_Villarroel_Ordenes\" width=\"150\" height=\"150\" srcset=\"https:\/\/www.servsig.org\/wordpress\/wp-content\/uploads\/2016\/06\/F_Villarroel_Ordenes-150x150.png 150w, https:\/\/www.servsig.org\/wordpress\/wp-content\/uploads\/2016\/06\/F_Villarroel_Ordenes-144x144.png 144w, https:\/\/www.servsig.org\/wordpress\/wp-content\/uploads\/2016\/06\/F_Villarroel_Ordenes.png 180w\" sizes=\"(max-width: 150px) 100vw, 150px\" \/>Francisco Villarroel\u00a0Ordenes<br \/>\n<\/em><em>Assistant Professor of Marketing<br \/>\n<\/em><em>Isenberg School of Management<\/em><br \/>\n<em>University of Massachusetts Amherst<\/em><\/p>\n<p><a href=\"mailto:fvillarroelo@isenberg.umass.edu\">fvillarroelo@isenberg.umass.edu<\/a><\/p>\n<p>&nbsp;<\/p>\n<p><strong>REFERENCES<\/strong><\/p>\n<p>Altaplana, 2014. (Accessed May 17 2016), [available at <a href=\"http:\/\/www.digitalreasoning.com\/resources\/Text-Analytics-2014-Digital-Reasoning.pdf\">http:\/\/www.digitalreasoning.com\/resources\/Text-Analytics-2014-Digital-Reasoning.pdf<\/a>].<\/p>\n<p>Edgington, Theresa M. (2011), \u201cIntroducing text analytics as a graduate business school course\u201d, Journal of Information Technology Education, 10, 207-234.<\/p>\n<p>Forbes, 2016. (Accessed May 12, 2016), [available at: <a href=\"http:\/\/www.forbes.com\/sites\/opentext\/2016\/05\/05\/meet-the-algorithm-that-knows-how-you-feel\/#2e93b9fc2037\">http:\/\/www.forbes.com\/sites\/opentext\/2016\/05\/05\/meet-the-algorithm-that-knows-how-you-feel\/#2e93b9fc2037<\/a>].<\/p>\n<p>Gartner, 2015. (Accessed May 12, 2016), [available at: <a href=\"https:\/\/www.gartner.com\/doc\/3106118\/hype-cycle-business-intelligence-analytics\">https:\/\/www.gartner.com\/doc\/3106118\/hype-cycle-business-intelligence-analytics<\/a>]<\/p>\n<p>Ludwig, Stephan, Ko De Ruyter, Dominik Mahr, Martin Wetzels, Elisabeth Br\u00fcggen, and Tom De Ruyck (2014), \u201cTake Their Word for It: The Symbolic Role of Linguistic Style Matches in User Communities,\u201d MIS Quarterly, 38(4), 1201-1217.<\/p>\n<p>Manning, Christopher, Prabhakar Raghavan and Hinrich Sch\u00fctze (2008), \u201cIntroduction to information retrieval\u201d, Cambridge university press, Cambridge.<\/p>\n<p>Markets and Markets, 2016. (Accessed May 12, 2016), [available at <a href=\"http:\/\/www.marketsandmarkets.com\/PressReleases\/text-analytics.asp\">http:\/\/www.marketsandmarkets.com\/PressReleases\/text-analytics.asp<\/a>]<\/p>\n<p>Rust, Roland T. and Ming-Hui Huang (2014), \u201cThe service revolution and the transformation of marketing science,\u201d Marketing Science, 33(2), 206-221.<\/p>\n<p>Tang, Tanya, Eric Fang, and Feng Wang, (2014), \u201cIs neutral really neutral? The effects of neutral user-generated content on product sales,\u201d Journal of Marketing, 78(4), 41-58.<\/p>\n<p>Tausczik, Yla R. and James W. Pennebaker (2010), \u201cThe Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods,\u201d Journal of Language and Social Psychology, 29 (1), 24-54.<\/p>\n<p>Thelwall, Mike, Kevan Buckley, Georgios Paltoglou, Di Cai, and Arvid Kappas (2010), \u201cSentiment strength detection in short informal text,\u201d Journal of the American Society for Information Science and Technology, 61(12), 2544-2558.<\/p>\n<p>Villarroel Ordenes, Francisco, Babis Theodoulidis, Jamie Burton, Thorsten Gruber and Mohamed Zaki (2014), \u201cAnalyzing Customer Experience Feedback Using Text Mining: A Linguistics-Based Approach,\u201d Journal of Service Research, 17(3), 278-295.<\/p>\n<p class=\"p1\">\n","protected":false},"excerpt":{"rendered":"<p>guest article by\u00a0Francisco Villarroel Ordenes Unstructured text data from emails, SMS, blogs, online reviews and social media is exponentially growing offering organizations unprecedented resources to monitor brand communications and customer experience feedback (Forbes 2016). This has resulted in the development of an emerging class of research methods using text mining, the process of structuring large [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":2861,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[10,8],"tags":[],"_links":{"self":[{"href":"https:\/\/www.servsig.org\/wordpress\/wp-json\/wp\/v2\/posts\/2789"}],"collection":[{"href":"https:\/\/www.servsig.org\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.servsig.org\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.servsig.org\/wordpress\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.servsig.org\/wordpress\/wp-json\/wp\/v2\/comments?post=2789"}],"version-history":[{"count":11,"href":"https:\/\/www.servsig.org\/wordpress\/wp-json\/wp\/v2\/posts\/2789\/revisions"}],"predecessor-version":[{"id":3277,"href":"https:\/\/www.servsig.org\/wordpress\/wp-json\/wp\/v2\/posts\/2789\/revisions\/3277"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.servsig.org\/wordpress\/wp-json\/wp\/v2\/media\/2861"}],"wp:attachment":[{"href":"https:\/\/www.servsig.org\/wordpress\/wp-json\/wp\/v2\/media?parent=2789"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.servsig.org\/wordpress\/wp-json\/wp\/v2\/categories?post=2789"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.servsig.org\/wordpress\/wp-json\/wp\/v2\/tags?post=2789"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}