Hello
I am doing the following actions on the raw data set:
1- Split variable transfer_description into multiple variables
2-vertically combine the values from step 1
3- frequency distribution of the resulted step 2
My question:
How can I improve step 2 in order that the code will do it automatically for any number of fields that are created in step 1?
/***Raw data***/
data have;
infile datalines delimiter='|';
informat transfer_description $200.;
input ID transfer_description amount;
datalines;
111|Transfer to my friend Joe Kaplan for his help to build my home|1000
111|Salary IBM|2000
111|Salary WIZZ|3980
111|Transfer to my father|500
333|Transfer to my gf|4000
333|Son help|3000
222|Salary IBM|1500
222|Charity|2500
222|Charity|3000
222|transfer to my friend Jula|8000
;
Run;
/****split string into multiple new varaibles****/
%let max_wrd=;
proc sql noprint;
select max(countw(transfer_description,', ')) into :max_wrd trimmed
from have
;
quit;
%put &max_wrd;
data wanted(drop=_:);
set have;
array AAA[&max_wrd] $15 ;
do _i = 1 to dim(AAA);
AAA[_i] = scan(transfer_description,_i,', ');
if missing(AAA[_i]) then leave;
end;
run;
/****appending the values **/
/***Question: Hoe can I improve this step to automatic code that union by the number of fields that are created in previous step??****/
proc sql;
create table Wanted2(Where=(A ne '')) as
select AAA1 as A
from wanted
UNION ALL
select AAA2 as A
from wanted
UNION ALL
select AAA3 as A
from wanted
UNION ALL
select AAA4 as A
from wanted
UNION ALL
select AAA5 as A
from wanted
UNION ALL
select AAA6 as A
from wanted
UNION ALL
select AAA7 as A
from wanted
UNION ALL
select AAA8 as A
from wanted
UNION ALL
select AAA9 as A
from wanted
UNION ALL
select AAA10 as A
from wanted
UNION ALL
select AAA11 as A
from wanted
UNION ALL
select AAA12 as A
from wanted
UNION ALL
select AAA13 as A
from wanted
;
quit;
/***Frequency distribution***/
proc sql;
select A,
count(*) as nr
from Wanted2
group by A
order by nr desc
;
quit;
how about this code?
data wanted;
set have;
length new_desc $200;
j=count(trim(transfer_description),' ');
do i=0 to j;
new_desc=scan(transfer_description,i+1);
output;
end;
keep new_desc;
run;
proc sql;
select new_desc,
count(*) as nr
from Wanted
group by new_desc
order by nr desc
;
quit;
A beautiful demonstration underlining that long beats wide in most cases.
/***Raw data***/ data have; infile datalines delimiter='|'; informat transfer_description $200.; input ID transfer_description amount; datalines; 111|Transfer to my friend Joe Kaplan for his help to build my home|1000 111|Salary IBM|2000 111|Salary WIZZ|3980 111|Transfer to my father|500 333|Transfer to my gf|4000 333|Son help|3000 222|Salary IBM|1500 222|Charity|2500 222|Charity|3000 222|transfer to my friend Jula|8000 ; Run; /****split string into multiple new varaibles****/ %let max_wrd=; proc sql noprint; select max(countw(transfer_description,', ')) into :max_wrd trimmed from have ; quit; %put &max_wrd; data wanted(drop=_:); set have; array AAA[&max_wrd] $15 ; do _i = 1 to dim(AAA); AAA[_i] = scan(transfer_description,_i,', '); if missing(AAA[_i]) then leave; end; n+1; /*<------*/ run; proc transpose data=wanted out=wanted2(rename=(col1=A) where=(A is not missing)) ; by n; var AAA:; run; /***Frequency distribution***/ proc sql; select A, count(*) as nr from Wanted2 group by A order by nr desc ; quit;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.