I recently started exploring RapidMiner to do sentiment
analysis and text classification of social media data. So I am going to post
some tutorials on RapidMiner based on what I have learned so far on this tool. In this post, I am writing on very basic thing – How to
read, write data and transform cases in RapidMiner. RapidMiner is a free tool and can be downloaded from www.rapid-i.com
. Make sure you have Text Analytics
plugin of RapidMiner installed. Below is the model, I have built in RapidMiner to read
and write text. It includes 5 operators Read
with Read Excel operator. “Read Excel” operator loads
data from MS Excel spreadsheets. This operator is able to reads data from Excel
95, 97, 2000, XP, 2003 and 2007.
Select the excel file from your system which you want to
Excel file which I have uploaded using Read Operator is
Connect it with the “Nominal to Text” operator. This operator replaces all …
SPSS Text Analysis for Surveys from IBM is a survey text analytics tool that provides deep dive analysis of qualitative text (survey responses to open-ended questions). It converts unstructured data into structured data; find hidden patterns, sentiments etc… It helps us in segmenting the responses on different dimensions and then correlates it with the sentiments. The tool combines the linguistic technologies with manual work. But as the tool has very simple drag and drop features, it makes easier to use the tool.
Click on file ->New project
Select the type of file. I am selecting .xls file type
Drag the primary key to “Unique ID” box. In my file, names are unique. You can have tickets ID or survey ID unique. The column having text on which you want to do analysis will go to “Open ended text” box. All the other variables will go to “reference” field.During analysis you can correlate the categories built by you with the variables in reference box.
In my previous post, I wrote on How to read and write data in RapidMiner. In this post, I am covering How to count the
words frequency in text using RapidMiner. The model contains following
operators: Read ExcelNominal to
RapidMiner model is
In Process documents operator, add 3 operators as shown
Tokenize operator splits the text of a document into a
sequence of tokens. Transform cases operator transform the words cases in
desired format. Fiter Stopwords operator removes English stopwords from a
document like and, or, not, is, an etc… Output :
If you are looking for XML of this word frequency model using RapidMiner, leave your email ID in comment box.