Hello
I am doing the following actions on the raw data set:
1- Split variable transfer_description into multiple variables
2-vertically combine the values from step 1
3- frequency distribution of the resulted step 2
My question:
How can I improve step 2 in order that the code will do it automatically for any number of fields that are created in step 1?
/***Raw data***/
data have;
infile datalines delimiter='|';
informat transfer_description $200.;
input ID transfer_description amount;
datalines;
111|Transfer to my friend Joe Kaplan for his help to build my home|1000
111|Salary IBM|2000
111|Salary WIZZ|3980
111|Transfer to my father|500
333|Transfer to my gf|4000
333|Son help|3000
222|Salary IBM|1500
222|Charity|2500
222|Charity|3000
222|transfer to my friend Jula|8000
;
Run;
/****split string into multiple new varaibles****/
%let max_wrd=;
proc sql noprint;
select max(countw(transfer_description,', ')) into :max_wrd trimmed
from have
;
quit;
%put &max_wrd;
data wanted(drop=_:);
set have;
array AAA[&max_wrd] $15 ;
do _i = 1 to dim(AAA);
AAA[_i] = scan(transfer_description,_i,', ');
if missing(AAA[_i]) then leave;
end;
run;
/****appending the values **/
/***Question: Hoe can I improve this step to automatic code that union by the number of fields that are created in previous step??****/
proc sql;
create table Wanted2(Where=(A ne '')) as
select AAA1 as A
from wanted
UNION ALL
select AAA2 as A
from wanted
UNION ALL
select AAA3 as A
from wanted
UNION ALL
select AAA4 as A
from wanted
UNION ALL
select AAA5 as A
from wanted
UNION ALL
select AAA6 as A
from wanted
UNION ALL
select AAA7 as A
from wanted
UNION ALL
select AAA8 as A
from wanted
UNION ALL
select AAA9 as A
from wanted
UNION ALL
select AAA10 as A
from wanted
UNION ALL
select AAA11 as A
from wanted
UNION ALL
select AAA12 as A
from wanted
UNION ALL
select AAA13 as A
from wanted
;
quit;
/***Frequency distribution***/
proc sql;
select A,
count(*) as nr
from Wanted2
group by A
order by nr desc
;
quit;
how about this code?
data wanted;
set have;
length new_desc $200;
j=count(trim(transfer_description),' ');
do i=0 to j;
new_desc=scan(transfer_description,i+1);
output;
end;
keep new_desc;
run;
proc sql;
select new_desc,
count(*) as nr
from Wanted
group by new_desc
order by nr desc
;
quit;
A beautiful demonstration underlining that long beats wide in most cases.
/***Raw data***/
data have;
infile datalines delimiter='|';
informat transfer_description $200.;
input ID transfer_description amount;
datalines;
111|Transfer to my friend Joe Kaplan for his help to build my home|1000
111|Salary IBM|2000
111|Salary WIZZ|3980
111|Transfer to my father|500
333|Transfer to my gf|4000
333|Son help|3000
222|Salary IBM|1500
222|Charity|2500
222|Charity|3000
222|transfer to my friend Jula|8000
;
Run;
/****split string into multiple new varaibles****/
%let max_wrd=;
proc sql noprint;
select max(countw(transfer_description,', ')) into :max_wrd trimmed
from have
;
quit;
%put &max_wrd;
data wanted(drop=_:);
set have;
array AAA[&max_wrd] $15 ;
do _i = 1 to dim(AAA);
AAA[_i] = scan(transfer_description,_i,', ');
if missing(AAA[_i]) then leave;
end;
n+1; /*<------*/
run;
proc transpose data=wanted out=wanted2(rename=(col1=A) where=(A is not missing)) ;
by n;
var AAA:;
run;
/***Frequency distribution***/
proc sql;
select A,
count(*) as nr
from Wanted2
group by A
order by nr desc
;
quit;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.