Thursday, December 30, 2010

Analyzing text documents for phrases using Excel

This was really a question I found at excelforum.com; can you take a list of phrases in excel and calculate in a simple way how many times each term is used in a word document? The answer is yes!

First, you need to save the doc as a txt-file. Then import this into Excel using the text import wizard - import as delimited file without any delimiters defined (remove the checkmark before "tab"). Import this into column A of Sheet2.

Then, place the list of words in column A in Sheet 1. If the first term to look for is placed in cell A1, you put this in B1 to count the number of times the term is used in the text file:

=COUNTIF(Sheet2!A:A,CONCATENATE("*",A1,"*"))
For fun, let's test how many times a few terms occurs in this political article on cnn.com

Update: 
6stringjazzer at excelforums.com reminded me that one should put each word into its own cell to avoid missing counts when the same term is used several times in the same line.

No comments:

Post a Comment