5.2 Turning Tweets into Knowledge: An Introduction to Text Analytics

Quick Question

Given a corpus in R, how many commands do you need to run in R to clean up the irregularities (removing capital letters and punctuation)?

Exercise 1

How many commands do you need to run to stem the document?

Exercise 2

Explanation

In R, you can clean up the irregularities with two lines:

corpus = tm_map(corpus, tolower)

corpus = tm_map(corpus, removePunctuation)

And you can stem the document with one line:

corpus = tm_map(corpus, stemDocument)