A word cloud

I did a short analysis of Winston Churchill’s “Their finest hour”. I did this for a class on data collection and production. Here is a word cloud I created from this analysis:

Word cloud

If you are interested in the code that I used, you can have a look below:

# Data Method: Text mining
# File: textmining1.R
# Theme: Download text data from web and create wordcloud

# Install the easypackages package 

# Load multiple packages using easypackage function "packages"
packages("XML","wordcloud","RColorBrewer","NLP","tm","quanteda", prompt = T)

# Download text data from website
churchLocation <-URLencode("http://www.historyplace.com/speeches/churchill-hour.htm")

# use htmlTreeParse function to read and parse paragraphs
doc.html<- htmlTreeParse(churchLocation, useInternal=TRUE)
church <- unlist(xpathApply(doc.html, '//b', xmlValue))
church <- church[-26] #Getting rid of the unnecessary last document
head(church, 3)

# Vectorize mlk 
words.vec <- VectorSource(church)

# Check the class of words.vec

# Create Corpus object for preprocessing
words.corpus <- Corpus(words.vec)

# Turn all words to lower case
words.corpus <- tm_map(words.corpus, content_transformer(tolower))

# Remove punctuations, numbers
words.corpus <- tm_map(words.corpus, removePunctuation)
words.corpus <- tm_map(words.corpus, removeNumbers)

# How about stopwords, then uniform bag of words created

words.corpus <- tm_map(words.corpus, removeWords, stopwords("english"))

# Create Term Document Matrix

tdm <- TermDocumentMatrix(words.corpus)

m <- as.matrix(tdm)
wordCounts <- rowSums(m)
wordCounts <- sort(wordCounts, decreasing=TRUE)

# Create Wordcloud

wordcloud(names(wordCounts),wordCounts, min.freq=3,random.order=FALSE, max.words=500,scale=c(3,.5), rot.per=0.35,colors=brewer.pal(8,"Dark2"))

# Run the program on Winston Churchill's Finest Hour speech?
# http://www.historyplace.com/speeches/churchill-hour.htm