- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi..
Leftover from my days as a Lit major, I wrote this program to do a frequency count of the words in the first chapter of Melville's "Moby Dick".
Interestingly enough, after you eliminate all the articles and prepositions and pronouns, the most frequently used word in the first chapter of Moby Dick is 'sea' (13 times) followed by 'water' (8 times). The words 'ship', 'soul', 'man' and 'whale' each occur 3 times. Anyway, the relevant part of that program is shown below -- I had to get rid of a stray '?' in the chapter, which is why the compress is in the code. Also, I turned everything to lower case, so 'The' and 'the' would get counted the same when I did a frequency on the WORD variable.
cynthia
** now break apart each line into separate lowercase words;
** but keep the word order (wordord) and the original capitalization (origword);
data cnt_chp1(keep=chapter pgno paracnt linenum wordord origword word);
set moby_ch1;
i = 1;
origword = scan(record,i);
word = compress(lowcase(origword),'?');
wordord = i;
do until (origword = ' ');
output;
i + 1;
wordord = i;
origword = scan(record,i);
word = compress(lowcase(origword),'?');
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
do you mean that you have a list of words (a, an, the, and) that you're looking for -- or your want to take a text string and find out the most common words in a string???
This may be a job for Text Miner:
http://support.sas.com/documentation/onlinedoc/txtminer/getstarted31.pdf
but in a Base SAS world, there's always writing out your "words" and then doing PROC FREQ on them.
cynthia
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
> do you mean that you have a list of words (a, an,
> the, and) that you're looking for -- or your want to
> take a text string and find out the most common
> words in a string???
> >
> but in a Base SAS world, there's always writing out
> your "words" and then doing PROC FREQ on them.
>
> cynthia
I would like to take the text string and find the most common words in the string. Is there a way to do that without using the Text Miner? If I have to, I could estimate the common words and use the countw function. Thanks for your help!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Linus
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi..
Leftover from my days as a Lit major, I wrote this program to do a frequency count of the words in the first chapter of Melville's "Moby Dick".
Interestingly enough, after you eliminate all the articles and prepositions and pronouns, the most frequently used word in the first chapter of Moby Dick is 'sea' (13 times) followed by 'water' (8 times). The words 'ship', 'soul', 'man' and 'whale' each occur 3 times. Anyway, the relevant part of that program is shown below -- I had to get rid of a stray '?' in the chapter, which is why the compress is in the code. Also, I turned everything to lower case, so 'The' and 'the' would get counted the same when I did a frequency on the WORD variable.
cynthia
** now break apart each line into separate lowercase words;
** but keep the word order (wordord) and the original capitalization (origword);
data cnt_chp1(keep=chapter pgno paracnt linenum wordord origword word);
set moby_ch1;
i = 1;
origword = scan(record,i);
word = compress(lowcase(origword),'?');
wordord = i;
do until (origword = ' ');
output;
i + 1;
wordord = i;
origword = scan(record,i);
word = compress(lowcase(origword),'?');
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Not the first time or the second time. But by the third time I read it, yes, I did skip the whaling chapters.
cynthia
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Regards,
Linus