Hi All,
Here is the original dataset:
data m01;
a="CGA AMZN.COM CA";output;
a="CHA AMZN.COM HK";output;
run;
Output dataset like:
Word
CGA |
AMZN.COM |
CA |
CHA |
AMZN.COM |
HK |
Any idea?
Here is mine:
data m02;
set m01;
an=translate(a,"@"," ");
run;
data m02_1;
set m02;
an=tranwrd(an,"@@","@");
run;
%macro REEE;
%do i=1 %to 10;
data m02_1;
set m02_1;
an=tranwrd(an,"@@","@");
run;
%end;
%mend;
%REEE;
data m03;
set m02_1;
bn=an;
No=_N_;
j=0;
do until(j=0);
j=find(bn,"@");
i=1;
Word=substr(bn,i,j-i);
i+j;
bn=substr(bn,i,j-i);
output;
end;
run;
In fact the original dataset contains thousands of merchants description,I want to break it down into words.
What's your suggestion,please?
Thanks in advance.
You can do this in only one step using functions COUNTW and SCAN specifing the blank as separator of words.
data want ;
length word $ 20;
set m01;
do i = 1 to countw(a,' ');
word=(scan(a,i,' '));
output;
end;
run;
Regards,
more _infile_ magic at http://www2.sas.com/proceedings/sugi28/086-28.pdf
shows a way to use the parsing of the input statement on data that comes from a table or data set variable.
Then you can populate a hash table of word counters, in a single pass.
You can do this in only one step using functions COUNTW and SCAN specifing the blank as separator of words.
data want ;
length word $ 20;
set m01;
do i = 1 to countw(a,' ');
word=(scan(a,i,' '));
output;
end;
run;
Regards,
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.