Hello
I am doing the following actions on the raw data set:
1- Split variable transfer_description into multiple variables
2-vertically combine the values from step 1
3- frequency distribution of the resulted step 2
My question:
How can I improve step 2 in order that the code will do it automatically for any number of fields that are created in step 1?
/***Raw data***/
data have;
infile datalines delimiter='|';
informat transfer_description $200.;
input ID transfer_description amount;
datalines;
111|Transfer to my friend Joe Kaplan for his help to build my home|1000
111|Salary IBM|2000
111|Salary WIZZ|3980
111|Transfer to my father|500
333|Transfer to my gf|4000
333|Son help|3000
222|Salary IBM|1500
222|Charity|2500
222|Charity|3000
222|transfer to my friend Jula|8000
;
Run;
/****split string into multiple new varaibles****/
%let max_wrd=;
proc sql noprint;
select max(countw(transfer_description,', ')) into :max_wrd trimmed
from have
;
quit;
%put &max_wrd;
data wanted(drop=_:);
set have;
array AAA[&max_wrd] $15 ;
do _i = 1 to dim(AAA);
AAA[_i] = scan(transfer_description,_i,', ');
if missing(AAA[_i]) then leave;
end;
run;
/****appending the values **/
/***Question: Hoe can I improve this step to automatic code that union by the number of fields that are created in previous step??****/
proc sql;
create table Wanted2(Where=(A ne '')) as
select AAA1 as A
from wanted
UNION ALL
select AAA2 as A
from wanted
UNION ALL
select AAA3 as A
from wanted
UNION ALL
select AAA4 as A
from wanted
UNION ALL
select AAA5 as A
from wanted
UNION ALL
select AAA6 as A
from wanted
UNION ALL
select AAA7 as A
from wanted
UNION ALL
select AAA8 as A
from wanted
UNION ALL
select AAA9 as A
from wanted
UNION ALL
select AAA10 as A
from wanted
UNION ALL
select AAA11 as A
from wanted
UNION ALL
select AAA12 as A
from wanted
UNION ALL
select AAA13 as A
from wanted
;
quit;
/***Frequency distribution***/
proc sql;
select A,
count(*) as nr
from Wanted2
group by A
order by nr desc
;
quit;
how about this code?
data wanted;
set have;
length new_desc $200;
j=count(trim(transfer_description),' ');
do i=0 to j;
new_desc=scan(transfer_description,i+1);
output;
end;
keep new_desc;
run;
proc sql;
select new_desc,
count(*) as nr
from Wanted
group by new_desc
order by nr desc
;
quit;
A beautiful demonstration underlining that long beats wide in most cases.
/***Raw data***/ data have; infile datalines delimiter='|'; informat transfer_description $200.; input ID transfer_description amount; datalines; 111|Transfer to my friend Joe Kaplan for his help to build my home|1000 111|Salary IBM|2000 111|Salary WIZZ|3980 111|Transfer to my father|500 333|Transfer to my gf|4000 333|Son help|3000 222|Salary IBM|1500 222|Charity|2500 222|Charity|3000 222|transfer to my friend Jula|8000 ; Run; /****split string into multiple new varaibles****/ %let max_wrd=; proc sql noprint; select max(countw(transfer_description,', ')) into :max_wrd trimmed from have ; quit; %put &max_wrd; data wanted(drop=_:); set have; array AAA[&max_wrd] $15 ; do _i = 1 to dim(AAA); AAA[_i] = scan(transfer_description,_i,', '); if missing(AAA[_i]) then leave; end; n+1; /*<------*/ run; proc transpose data=wanted out=wanted2(rename=(col1=A) where=(A is not missing)) ; by n; var AAA:; run; /***Frequency distribution***/ proc sql; select A, count(*) as nr from Wanted2 group by A order by nr desc ; quit;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.