In my previous post, I wrote on How to read and write data in RapidMiner. In this post, I am covering How to count the words frequency in text using RapidMiner. The model contains following operators:
- Read Excel
- Nominal to Text
- Process Documents
- Transform cases
- Filter Stopwords
RapidMiner model is shown below:
In Process documents operator, add 3 operators as shown below:
Tokenize operator splits the text of a document into a sequence of tokens.
Transform cases operator transform the words cases in desired format.
Fiter Stopwords operator removes English stopwords from a document like and, or, not, is, an etc…
If you are looking for XML of this word frequency model using RapidMiner, leave your email ID in comment box.