DATA Step, Macro, Functions and more

Which word appears at most

Accepted Solution Solved
Reply
Contributor
Posts: 25
Accepted Solution

Which word appears at most

Hi All,
Here is the original dataset:

data m01;

a="CGA  AMZN.COM   CA";output;

a="CHA  AMZN.COM   HK";output;

run;

Output dataset like:
Word

CGA
AMZN.COM
CA
CHA
AMZN.COM
HK


Any idea?

Here is mine:

data m02;

set m01;

an=translate(a,"@"," ");

run;

data m02_1;

set m02;

an=tranwrd(an,"@@","@");

run;

%macro REEE;

%do i=1 %to 10;

data m02_1;

set m02_1;

an=tranwrd(an,"@@","@");

run;

%end;

%mend;

%REEE;

data m03;

set m02_1;

bn=an;

No=_N_;

j=0;

do until(j=0);

j=find(bn,"@");

i=1;

Word=substr(bn,i,j-i);

i+j;

bn=substr(bn,i,j-i);

output;

end;

run;

In fact the original dataset contains thousands of merchants description,I want to break it down into words.
What's your suggestion,please?

Thanks in advance.


Accepted Solutions
Solution
‎02-25-2014 01:26 PM
Regular Contributor
Posts: 180

Re: Which word appears at most

You can do this in only one step using functions COUNTW and SCAN specifing the blank as separator of words.

data want ;

length word $ 20;

set m01;

do i = 1 to countw(a,' ');

   word=(scan(a,i,' '));

   output;

end;

run;

Regards,

View solution in original post


All Replies
Valued Guide
Posts: 2,175

Re: Which word appears at most

more _infile_ magic at http://www2.sas.com/proceedings/sugi28/086-28.pdf

shows a way to use the parsing of the input statement on data that comes from a table or data set variable.

Then you can populate a hash table of word counters, in a single pass.

Solution
‎02-25-2014 01:26 PM
Regular Contributor
Posts: 180

Re: Which word appears at most

You can do this in only one step using functions COUNTW and SCAN specifing the blank as separator of words.

data want ;

length word $ 20;

set m01;

do i = 1 to countw(a,' ');

   word=(scan(a,i,' '));

   output;

end;

run;

Regards,

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 183 views
  • 3 likes
  • 3 in conversation