Solved
Contributor
Posts: 25

# Which word appears at most

Hi All,
Here is the original dataset:

data m01;

a="CGA  AMZN.COM   CA";output;

a="CHA  AMZN.COM   HK";output;

run;

Output dataset like:
Word

 CGA AMZN.COM CA CHA AMZN.COM HK

Any idea?

Here is mine:

data m02;

set m01;

an=translate(a,"@"," ");

run;

data m02_1;

set m02;

an=tranwrd(an,"@@","@");

run;

%macro REEE;

%do i=1 %to 10;

data m02_1;

set m02_1;

an=tranwrd(an,"@@","@");

run;

%end;

%mend;

%REEE;

data m03;

set m02_1;

bn=an;

No=_N_;

j=0;

do until(j=0);

j=find(bn,"@");

i=1;

Word=substr(bn,i,j-i);

i+j;

bn=substr(bn,i,j-i);

output;

end;

run;

In fact the original dataset contains thousands of merchants description,I want to break it down into words.

Accepted Solutions
Solution
‎02-25-2014 01:26 PM
Regular Contributor
Posts: 180

## Re: Which word appears at most

You can do this in only one step using functions COUNTW and SCAN specifing the blank as separator of words.

data want ;

length word \$ 20;

set m01;

do i = 1 to countw(a,' ');

word=(scan(a,i,' '));

output;

end;

run;

Regards,

All Replies
Valued Guide
Posts: 2,191

## Re: Which word appears at most

more _infile_ magic at http://www2.sas.com/proceedings/sugi28/086-28.pdf

shows a way to use the parsing of the input statement on data that comes from a table or data set variable.

Then you can populate a hash table of word counters, in a single pass.

Solution
‎02-25-2014 01:26 PM
Regular Contributor
Posts: 180

## Re: Which word appears at most

You can do this in only one step using functions COUNTW and SCAN specifing the blank as separator of words.

data want ;

length word \$ 20;

set m01;

do i = 1 to countw(a,' ');

word=(scan(a,i,' '));

output;

end;

run;

Regards,

🔒 This topic is solved and locked.