Solved: Summary of ID presence across all the datasets in a library

shasank · Posted 03-25-2021 09:14 AM

Hi Community,

I am trying to solve a unique requirement and need your help if an efficient way exists to get this done.

I have >10 datasets in my library and each have an ID. I need to create a dataset to as a summary of if an ID is present in each dataset.

 data CLASS(label='Student Data');
   infile datalines dsd truncover;
   input ID:$2. Sex:$1. Age:2. Height:3.2 Weight:2.2;
 datalines;
 1, M, 14, 69, 112.5
 2, F, 13, 56.5, 84
 3, F, 13, 65.3, 98
 4, F, 14, 62.8, 102.5
 5, M, 14, 63.5, 102.5
 ;
 run;


  data SUBCLASS1(label='Student Data');
   infile datalines dsd truncover;
   input ID:$2. Sex:$1. Age:2. Height:3.2 Weight:2.2;
 datalines;
 1, M ,14, 69 ,112.5
 3, F ,13 ,65.3, 98
 5, M ,14 ,63.5, 102.5
 ;
 run;

   data SUBCLASS2(label='Student Data');
   infile datalines dsd truncover;
   input ID:$2. Sex:$1. Age:2. Height:3.2 Weight:2.2;
 datalines;
 2, F, 13, 56.5, 84
 3, F, 13, 65.3, 98
 5, M, 14, 63.5, 102.5
 ;
 run;

Based on the datasets created above.

I need my output to look like this:

ID	CLASS	SUBCLASS1	SUBCLASS2
1	Class	Subclass1
2	Class	.	Subclass2
3	Class	Subclass1	Subclass2
4	Class	.	.
5	Class	Subclass1	Subclass2

Thank you for your time and effort.

Best,

SC.

Kurt_Bremser · Posted 03-25-2021 09:42 AM

Study this code:

data CLASS(label='Student Data');
infile datalines dsd truncover;
input ID:$2. Sex:$1. Age:2. Height:3.2 Weight:2.2;
datalines;
1, M, 14, 69, 112.5
2, F, 13, 56.5, 84
3, F, 13, 65.3, 98
4, F, 14, 62.8, 102.5
5, M, 14, 63.5, 102.5
;

data SUBCLASS1(label='Student Data');
infile datalines dsd truncover;
input ID:$2. Sex:$1. Age:2. Height:3.2 Weight:2.2;
datalines;
1, M ,14, 69 ,112.5
3, F ,13 ,65.3, 98
5, M ,14 ,63.5, 102.5
;

data SUBCLASS2(label='Student Data');
infile datalines dsd truncover;
input ID:$2. Sex:$1. Age:2. Height:3.2 Weight:2.2;
datalines;
2, F, 13, 56.5, 84
3, F, 13, 65.3, 98
5, M, 14, 63.5, 102.5
;
 
proc sql noprint;
select catx(".",libname,memname) into :datasets separated by " "
from dictionary.tables
where libname = "WORK";
quit;

data all / view=all;
length inname $41 dsname dname $32;
set &datasets. indsname=inname;
dsname = scan(inname,2,".");
dname = dsname;
keep id dsname dname;
run;

proc sort
  data=all
  out=want
  nodupkey
;
by id dsname;
run;

proc report data=want;
column id dsname,dname n;
define id /group;
define dsname / "" display;
define dname / "" across;
define n / noprint;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Kurt_Bremser · Posted 03-25-2021 09:42 AM

Study this code:

data CLASS(label='Student Data');
infile datalines dsd truncover;
input ID:$2. Sex:$1. Age:2. Height:3.2 Weight:2.2;
datalines;
1, M, 14, 69, 112.5
2, F, 13, 56.5, 84
3, F, 13, 65.3, 98
4, F, 14, 62.8, 102.5
5, M, 14, 63.5, 102.5
;

data SUBCLASS1(label='Student Data');
infile datalines dsd truncover;
input ID:$2. Sex:$1. Age:2. Height:3.2 Weight:2.2;
datalines;
1, M ,14, 69 ,112.5
3, F ,13 ,65.3, 98
5, M ,14 ,63.5, 102.5
;

data SUBCLASS2(label='Student Data');
infile datalines dsd truncover;
input ID:$2. Sex:$1. Age:2. Height:3.2 Weight:2.2;
datalines;
2, F, 13, 56.5, 84
3, F, 13, 65.3, 98
5, M, 14, 63.5, 102.5
;
 
proc sql noprint;
select catx(".",libname,memname) into :datasets separated by " "
from dictionary.tables
where libname = "WORK";
quit;

data all / view=all;
length inname $41 dsname dname $32;
set &datasets. indsname=inname;
dsname = scan(inname,2,".");
dname = dsname;
keep id dsname dname;
run;

proc sort
  data=all
  out=want
  nodupkey
;
by id dsname;
run;

proc report data=want;
column id dsname,dname n;
define id /group;
define dsname / "" display;
define dname / "" across;
define n / noprint;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

mkeintz · Posted 03-25-2021 11:53 AM

Using the INDSNAME option on a set command will deal with this in a single data step. If you have a short list of datasets then they can be literally entered into the code.

data CLASS(label='Student Data');
   infile datalines dsd truncover;
   input ID:$2. Sex:$1. Age:2. Height:3.2 Weight:2.2;
 datalines;
 1, M, 14, 69, 112.5
 2, F, 13, 56.5, 84
 3, F, 13, 65.3, 98
 4, F, 14, 62.8, 102.5
 5, M, 14, 63.5, 102.5
 run;


  data SUBCLASS1(label='Student Data');
   infile datalines dsd truncover;
   input ID:$2. Sex:$1. Age:2. Height:3.2 Weight:2.2;
 datalines;
 1, M ,14, 69 ,112.5
 3, F ,13 ,65.3, 98
 5, M ,14 ,63.5, 102.5
run;

   data SUBCLASS2(label='Student Data');
   infile datalines dsd truncover;
   input ID:$2. Sex:$1. Age:2. Height:3.2 Weight:2.2;
 datalines;
 2, F, 13, 56.5, 84
 3, F, 13, 65.3, 98
 5, M, 14, 63.5, 102.5
run;


data want (drop=i);
  do until (last.id);
    set class subclass1 subclass2 indsname=dsn;
    by id;
    array dsn_dummys {*} $1  subclass1 subclass2 ;
    array dsn_names  {2} $20  _temporary_ ('WORK.SUBCLASS1','WORK.SUBCLASS2');
    if dsn='WORK.CLASS' then continue;
    i=whichc(dsn,of dsn_names{*});
    dsn_dummys{i}='Y';
  end;
  do i=1 to dim(dsn_dummys);
    if dsn_dummys{i}^='Y' then dsn_dummys{i}='N';
  end;
run;

Or you can put the dataset list (and corresponding dummy vars) in macrovars:

%let csv_list='WORK.SUBCLASS1','WORK.SUBCLASS2';
%LET var_list=subclass1 subclass2 ;
%let n_dsn=%sysfunc(countw(&var_list));

data want (drop=i);
  do until (last.id);
    set class &var_list indsname=dsn;
    by id;
    array dsn_dummys {*} $1  &var_list ;
    array dsn_names  {&n_dsn} $20  _temporary_ (&csv_list);
    if dsn='WORK.CLASS' then continue;
    i=whichc(dsn,of dsn_names{*});
    dsn_dummys{i}='Y';
  end;
  do i=1 to dim(dsn_dummys);
    if dsn_dummys{i}^='Y' then dsn_dummys{i}='N';
  end;
run;

And, as @Kurt_Bremser showed, you can use proc sql to create the macrovars. This, slightly modified code, could prefix the code above, replacing the %LET statements:

proc sql noprint;
  select quote(catx(".",libname,memname)) into :csv_list  separated by ","
    from dictionary.tables
    where libname = "WORK" and memname^='CLASS';
  select memname into :var_list separated by ' '
    from dictionary.tables
    where libname = "WORK" and memname^='CLASS';
quit;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Summary of ID presence across all the datasets in a library

Re: Summary of ID presence across all the datasets in a library

Re: Summary of ID presence across all the datasets in a library

Re: Summary of ID presence across all the datasets in a library

Summary of ID presence across all the datasets in a library

Re: Summary of ID presence across all the datasets in a library

Re: Summary of ID presence across all the datasets in a library

Re: Summary of ID presence across all the datasets in a library

Ready to join fellow brilliant minds for the SAS Hackathon?

Click image to register for webinar

Classroom Training Available!