Hi Community,
I am trying to solve a unique requirement and need your help if an efficient way exists to get this done.
I have >10 datasets in my library and each have an ID. I need to create a dataset to as a summary of if an ID is present in each dataset.
data CLASS(label='Student Data');
infile datalines dsd truncover;
input ID:$2. Sex:$1. Age:2. Height:3.2 Weight:2.2;
datalines;
1, M, 14, 69, 112.5
2, F, 13, 56.5, 84
3, F, 13, 65.3, 98
4, F, 14, 62.8, 102.5
5, M, 14, 63.5, 102.5
;
run;
data SUBCLASS1(label='Student Data');
infile datalines dsd truncover;
input ID:$2. Sex:$1. Age:2. Height:3.2 Weight:2.2;
datalines;
1, M ,14, 69 ,112.5
3, F ,13 ,65.3, 98
5, M ,14 ,63.5, 102.5
;
run;
data SUBCLASS2(label='Student Data');
infile datalines dsd truncover;
input ID:$2. Sex:$1. Age:2. Height:3.2 Weight:2.2;
datalines;
2, F, 13, 56.5, 84
3, F, 13, 65.3, 98
5, M, 14, 63.5, 102.5
;
run;
Based on the datasets created above.
I need my output to look like this:
ID | CLASS | SUBCLASS1 | SUBCLASS2 |
1 | Class | Subclass1 | |
2 | Class | . | Subclass2 |
3 | Class | Subclass1 | Subclass2 |
4 | Class | . | . |
5 | Class | Subclass1 | Subclass2 |
Thank you for your time and effort.
Best,
SC.
Study this code:
data CLASS(label='Student Data');
infile datalines dsd truncover;
input ID:$2. Sex:$1. Age:2. Height:3.2 Weight:2.2;
datalines;
1, M, 14, 69, 112.5
2, F, 13, 56.5, 84
3, F, 13, 65.3, 98
4, F, 14, 62.8, 102.5
5, M, 14, 63.5, 102.5
;
data SUBCLASS1(label='Student Data');
infile datalines dsd truncover;
input ID:$2. Sex:$1. Age:2. Height:3.2 Weight:2.2;
datalines;
1, M ,14, 69 ,112.5
3, F ,13 ,65.3, 98
5, M ,14 ,63.5, 102.5
;
data SUBCLASS2(label='Student Data');
infile datalines dsd truncover;
input ID:$2. Sex:$1. Age:2. Height:3.2 Weight:2.2;
datalines;
2, F, 13, 56.5, 84
3, F, 13, 65.3, 98
5, M, 14, 63.5, 102.5
;
proc sql noprint;
select catx(".",libname,memname) into :datasets separated by " "
from dictionary.tables
where libname = "WORK";
quit;
data all / view=all;
length inname $41 dsname dname $32;
set &datasets. indsname=inname;
dsname = scan(inname,2,".");
dname = dsname;
keep id dsname dname;
run;
proc sort
data=all
out=want
nodupkey
;
by id dsname;
run;
proc report data=want;
column id dsname,dname n;
define id /group;
define dsname / "" display;
define dname / "" across;
define n / noprint;
run;
Study this code:
data CLASS(label='Student Data');
infile datalines dsd truncover;
input ID:$2. Sex:$1. Age:2. Height:3.2 Weight:2.2;
datalines;
1, M, 14, 69, 112.5
2, F, 13, 56.5, 84
3, F, 13, 65.3, 98
4, F, 14, 62.8, 102.5
5, M, 14, 63.5, 102.5
;
data SUBCLASS1(label='Student Data');
infile datalines dsd truncover;
input ID:$2. Sex:$1. Age:2. Height:3.2 Weight:2.2;
datalines;
1, M ,14, 69 ,112.5
3, F ,13 ,65.3, 98
5, M ,14 ,63.5, 102.5
;
data SUBCLASS2(label='Student Data');
infile datalines dsd truncover;
input ID:$2. Sex:$1. Age:2. Height:3.2 Weight:2.2;
datalines;
2, F, 13, 56.5, 84
3, F, 13, 65.3, 98
5, M, 14, 63.5, 102.5
;
proc sql noprint;
select catx(".",libname,memname) into :datasets separated by " "
from dictionary.tables
where libname = "WORK";
quit;
data all / view=all;
length inname $41 dsname dname $32;
set &datasets. indsname=inname;
dsname = scan(inname,2,".");
dname = dsname;
keep id dsname dname;
run;
proc sort
data=all
out=want
nodupkey
;
by id dsname;
run;
proc report data=want;
column id dsname,dname n;
define id /group;
define dsname / "" display;
define dname / "" across;
define n / noprint;
run;
Using the INDSNAME option on a set command will deal with this in a single data step. If you have a short list of datasets then they can be literally entered into the code.
data CLASS(label='Student Data');
infile datalines dsd truncover;
input ID:$2. Sex:$1. Age:2. Height:3.2 Weight:2.2;
datalines;
1, M, 14, 69, 112.5
2, F, 13, 56.5, 84
3, F, 13, 65.3, 98
4, F, 14, 62.8, 102.5
5, M, 14, 63.5, 102.5
run;
data SUBCLASS1(label='Student Data');
infile datalines dsd truncover;
input ID:$2. Sex:$1. Age:2. Height:3.2 Weight:2.2;
datalines;
1, M ,14, 69 ,112.5
3, F ,13 ,65.3, 98
5, M ,14 ,63.5, 102.5
run;
data SUBCLASS2(label='Student Data');
infile datalines dsd truncover;
input ID:$2. Sex:$1. Age:2. Height:3.2 Weight:2.2;
datalines;
2, F, 13, 56.5, 84
3, F, 13, 65.3, 98
5, M, 14, 63.5, 102.5
run;
data want (drop=i);
do until (last.id);
set class subclass1 subclass2 indsname=dsn;
by id;
array dsn_dummys {*} $1 subclass1 subclass2 ;
array dsn_names {2} $20 _temporary_ ('WORK.SUBCLASS1','WORK.SUBCLASS2');
if dsn='WORK.CLASS' then continue;
i=whichc(dsn,of dsn_names{*});
dsn_dummys{i}='Y';
end;
do i=1 to dim(dsn_dummys);
if dsn_dummys{i}^='Y' then dsn_dummys{i}='N';
end;
run;
Or you can put the dataset list (and corresponding dummy vars) in macrovars:
%let csv_list='WORK.SUBCLASS1','WORK.SUBCLASS2';
%LET var_list=subclass1 subclass2 ;
%let n_dsn=%sysfunc(countw(&var_list));
data want (drop=i);
do until (last.id);
set class &var_list indsname=dsn;
by id;
array dsn_dummys {*} $1 &var_list ;
array dsn_names {&n_dsn} $20 _temporary_ (&csv_list);
if dsn='WORK.CLASS' then continue;
i=whichc(dsn,of dsn_names{*});
dsn_dummys{i}='Y';
end;
do i=1 to dim(dsn_dummys);
if dsn_dummys{i}^='Y' then dsn_dummys{i}='N';
end;
run;
And, as @Kurt_Bremser showed, you can use proc sql to create the macrovars. This, slightly modified code, could prefix the code above, replacing the %LET statements:
proc sql noprint;
select quote(catx(".",libname,memname)) into :csv_list separated by ","
from dictionary.tables
where libname = "WORK" and memname^='CLASS';
select memname into :var_list separated by ' '
from dictionary.tables
where libname = "WORK" and memname^='CLASS';
quit;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.