Hi Community,
I am trying to solve a unique requirement and need your help if an efficient way exists to get this done.
I have >10 datasets in my library and each have an ID. I need to create a dataset to as a summary of if an ID is present in each dataset.
data CLASS(label='Student Data');
infile datalines dsd truncover;
input ID:$2. Sex:$1. Age:2. Height:3.2 Weight:2.2;
datalines;
1, M, 14, 69, 112.5
2, F, 13, 56.5, 84
3, F, 13, 65.3, 98
4, F, 14, 62.8, 102.5
5, M, 14, 63.5, 102.5
;
run;
data SUBCLASS1(label='Student Data');
infile datalines dsd truncover;
input ID:$2. Sex:$1. Age:2. Height:3.2 Weight:2.2;
datalines;
1, M ,14, 69 ,112.5
3, F ,13 ,65.3, 98
5, M ,14 ,63.5, 102.5
;
run;
data SUBCLASS2(label='Student Data');
infile datalines dsd truncover;
input ID:$2. Sex:$1. Age:2. Height:3.2 Weight:2.2;
datalines;
2, F, 13, 56.5, 84
3, F, 13, 65.3, 98
5, M, 14, 63.5, 102.5
;
run;
Based on the datasets created above.
I need my output to look like this:
ID | CLASS | SUBCLASS1 | SUBCLASS2 |
1 | Class | Subclass1 | |
2 | Class | . | Subclass2 |
3 | Class | Subclass1 | Subclass2 |
4 | Class | . | . |
5 | Class | Subclass1 | Subclass2 |
Thank you for your time and effort.
Best,
SC.
Study this code:
data CLASS(label='Student Data');
infile datalines dsd truncover;
input ID:$2. Sex:$1. Age:2. Height:3.2 Weight:2.2;
datalines;
1, M, 14, 69, 112.5
2, F, 13, 56.5, 84
3, F, 13, 65.3, 98
4, F, 14, 62.8, 102.5
5, M, 14, 63.5, 102.5
;
data SUBCLASS1(label='Student Data');
infile datalines dsd truncover;
input ID:$2. Sex:$1. Age:2. Height:3.2 Weight:2.2;
datalines;
1, M ,14, 69 ,112.5
3, F ,13 ,65.3, 98
5, M ,14 ,63.5, 102.5
;
data SUBCLASS2(label='Student Data');
infile datalines dsd truncover;
input ID:$2. Sex:$1. Age:2. Height:3.2 Weight:2.2;
datalines;
2, F, 13, 56.5, 84
3, F, 13, 65.3, 98
5, M, 14, 63.5, 102.5
;
proc sql noprint;
select catx(".",libname,memname) into :datasets separated by " "
from dictionary.tables
where libname = "WORK";
quit;
data all / view=all;
length inname $41 dsname dname $32;
set &datasets. indsname=inname;
dsname = scan(inname,2,".");
dname = dsname;
keep id dsname dname;
run;
proc sort
data=all
out=want
nodupkey
;
by id dsname;
run;
proc report data=want;
column id dsname,dname n;
define id /group;
define dsname / "" display;
define dname / "" across;
define n / noprint;
run;
Study this code:
data CLASS(label='Student Data');
infile datalines dsd truncover;
input ID:$2. Sex:$1. Age:2. Height:3.2 Weight:2.2;
datalines;
1, M, 14, 69, 112.5
2, F, 13, 56.5, 84
3, F, 13, 65.3, 98
4, F, 14, 62.8, 102.5
5, M, 14, 63.5, 102.5
;
data SUBCLASS1(label='Student Data');
infile datalines dsd truncover;
input ID:$2. Sex:$1. Age:2. Height:3.2 Weight:2.2;
datalines;
1, M ,14, 69 ,112.5
3, F ,13 ,65.3, 98
5, M ,14 ,63.5, 102.5
;
data SUBCLASS2(label='Student Data');
infile datalines dsd truncover;
input ID:$2. Sex:$1. Age:2. Height:3.2 Weight:2.2;
datalines;
2, F, 13, 56.5, 84
3, F, 13, 65.3, 98
5, M, 14, 63.5, 102.5
;
proc sql noprint;
select catx(".",libname,memname) into :datasets separated by " "
from dictionary.tables
where libname = "WORK";
quit;
data all / view=all;
length inname $41 dsname dname $32;
set &datasets. indsname=inname;
dsname = scan(inname,2,".");
dname = dsname;
keep id dsname dname;
run;
proc sort
data=all
out=want
nodupkey
;
by id dsname;
run;
proc report data=want;
column id dsname,dname n;
define id /group;
define dsname / "" display;
define dname / "" across;
define n / noprint;
run;
Using the INDSNAME option on a set command will deal with this in a single data step. If you have a short list of datasets then they can be literally entered into the code.
data CLASS(label='Student Data');
infile datalines dsd truncover;
input ID:$2. Sex:$1. Age:2. Height:3.2 Weight:2.2;
datalines;
1, M, 14, 69, 112.5
2, F, 13, 56.5, 84
3, F, 13, 65.3, 98
4, F, 14, 62.8, 102.5
5, M, 14, 63.5, 102.5
run;
data SUBCLASS1(label='Student Data');
infile datalines dsd truncover;
input ID:$2. Sex:$1. Age:2. Height:3.2 Weight:2.2;
datalines;
1, M ,14, 69 ,112.5
3, F ,13 ,65.3, 98
5, M ,14 ,63.5, 102.5
run;
data SUBCLASS2(label='Student Data');
infile datalines dsd truncover;
input ID:$2. Sex:$1. Age:2. Height:3.2 Weight:2.2;
datalines;
2, F, 13, 56.5, 84
3, F, 13, 65.3, 98
5, M, 14, 63.5, 102.5
run;
data want (drop=i);
do until (last.id);
set class subclass1 subclass2 indsname=dsn;
by id;
array dsn_dummys {*} $1 subclass1 subclass2 ;
array dsn_names {2} $20 _temporary_ ('WORK.SUBCLASS1','WORK.SUBCLASS2');
if dsn='WORK.CLASS' then continue;
i=whichc(dsn,of dsn_names{*});
dsn_dummys{i}='Y';
end;
do i=1 to dim(dsn_dummys);
if dsn_dummys{i}^='Y' then dsn_dummys{i}='N';
end;
run;
Or you can put the dataset list (and corresponding dummy vars) in macrovars:
%let csv_list='WORK.SUBCLASS1','WORK.SUBCLASS2';
%LET var_list=subclass1 subclass2 ;
%let n_dsn=%sysfunc(countw(&var_list));
data want (drop=i);
do until (last.id);
set class &var_list indsname=dsn;
by id;
array dsn_dummys {*} $1 &var_list ;
array dsn_names {&n_dsn} $20 _temporary_ (&csv_list);
if dsn='WORK.CLASS' then continue;
i=whichc(dsn,of dsn_names{*});
dsn_dummys{i}='Y';
end;
do i=1 to dim(dsn_dummys);
if dsn_dummys{i}^='Y' then dsn_dummys{i}='N';
end;
run;
And, as @Kurt_Bremser showed, you can use proc sql to create the macrovars. This, slightly modified code, could prefix the code above, replacing the %LET statements:
proc sql noprint;
select quote(catx(".",libname,memname)) into :csv_list separated by ","
from dictionary.tables
where libname = "WORK" and memname^='CLASS';
select memname into :var_list separated by ' '
from dictionary.tables
where libname = "WORK" and memname^='CLASS';
quit;
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: