Hi all,
I'm trying to merge multiple sas file with a SAS file containing unique ID
libname have "C:\Users\fkvn\Desktop\SAS files"
libname want "C:\Users\fkvn\Desktop\overview"
In my have library I have these files: id, SMS, u10_clucose, u28_tgly, y7_toothdata, u28_toothdata, u28_chol - this folder expands over time so more tables will be added to the folder. So the code has to be automatic and generic? Probably some macro variables containing all datasets name my folder "Have".
I want an output table showing me which tables each id exists in? The new column name should be the name of the dataset with 0/1 values - where 1 indicates that the ID exist in the table and 0 if not.
Output table should look like this:
ID | u10_clucose | u28_tgly | y7_toothdata | u28_toothdata | y7_toothdata u28_chol |
1 | 1 | 1 | 0 | 0 | 1 |
2 | 1 | 1 | 1 | 1 | 0 |
3 | 1 | 1 | 1 | 0 | 0 |
Kind regards
Frank
Yet another approach (I dare to say even more generic) would be to "stack" datasets one -atop-another and then do the transpose (of course if you don't have problem having missing (".") instead zeros ("0") in the data.
Test data ( thanks @ErikLund_Jensen )
options dlcreatedir;
libname have "%sysfunc(pathname(work))/data";
/* Test data */
data have.u10_clucose; do id = 2 to 20 by 7; output; end; run;
data have.u28_tgly; do id = 1 to 15 by 2; output; end; run;
data have.u28_toothdata; do id = 1 to 15 by 3; output; end; run;
Code:
/* set metadata */
proc contents NOPRINT
data=have._all_
out =work.output(where=(UPCASE(NAME) = "ID")); /* Add: `AND TYPE=1` to select numeric only */
run;
/* do the "stack" with names */
data _null_;
call execute('data work.all_id; set ');
do until(EOF);
set work.output end=EOF;
call execute( cats(libname, '.', memname, '(keep=id)') );
end;
call execute('indsname = _inds_; indsname = scan(_inds_, -1, "."); is_in=1; run;');
stop;
run;
/* output form call execute is:
1 + data work.all_id; set
2 + HAVE.U10_CLUCOSE(keep=id)
3 + HAVE.U28_TGLY(keep=id)
4 + HAVE.U28_TOOTHDATA(keep=id)
5 + indsname = _inds_; indsname = scan(_inds_, -1, "."); is_in=1; run;
*/
/* sort and transpose */
proc sort
data=work.all_id
;
by id indsname;
run;
proc transpose
data = work.all_id
out = work.want(drop=_name_)
;
by id;
id indsname;
var is_in;
run;
All the best
Bart
Didn't someone already ask this question?
Are the datasets already sorted by ID? If so you just need to run code like:
data want;
merge
u10_clucose(keep=id in=in1)
u28_tgly(keep=id in=in2)
...
;
by id;
if first.id;
u10_clucose=in1;
u28_tgly=in2;
...
run;
Which you can easily generate from the output of PROC CONTENTS;
proc contents data=have._all_ noprint out=contents; run;
data members;
set contents;
where upcase(name)='ID';
memnum + 1;
keep libname memname memnum ;
run;
filename code temp;
data _null_;
file code;
put @3 'merge';
do while (not eof1);
set members end=eof1;
put @5 libname +(-1) '.' memname '(keep=id in=in' memnum ')';
end;
put @3 ';'
/ @3 'by id;'
/ @3 'if first.id;'
;
do while (not eof2);
set members end=eof2;
put @3 memname '=in' memnum ';' ;
end;
put 'run;' ;
run;
%include code / source2;
Before we go very far down this path is the variable, ID, when present always of the same variable type in every single one of the data sets? If the ID variable is character is it always of the same length?
I ask because the best ways to create a report such as you describe requires appending values from data sets together which if the variable is not always the same in all data sets mean that this is likely not going to be easy to automate as you need address that issue with ID first. Second, if the length varies then you have a possibility of truncating data which means errors in the report.
Is the data from the data set ID to be in this report? How about SMC (you didn't show it in the report)?
I would use a different approach. Instead of merging the data sets i think it is easier to extract ID's from all datasets and use statistics to reform output. Here is a working example that handles one to many input tables, I think the only requirements is that all tables contains a variable named ID, and this variable is of the same type in all datasets.
libname have 'c:\temp\testdata';
/* Test data */
data have.u10_clucose; do id = 2 to 20 by 7; output; end; run;
data have.u28_tgly; do id = 1 to 5 by 2; output; end; run;
data have.u28_toothdata; do id = 1 to 5 by 3; output; end; run;
/* Collect information from all datasets */
proc contents data=have._all_ noprint short out=x;
run;
proc sql noprint;
select memname into :memlist separated by ' '
from x;
quit;
data collect;
length id &memlist 8.;
stop;
run;
%macro m;
%do i = 1 %to &sqlobs;
%let thismem = %scan(&memlist,&i,%str( ));
data tmp; set have.&thismem (keep=ID);
&thismem = 1;
run;
data collect; set collect tmp;
run;
%end;
%mend;
%m;
/* Process information to report */
proc sort data=collect; by id;
run;
proc means data=collect noprint max;
output out=idlist (where=(_stat_='MAX') drop= _type_ _freq_);
by id;
var &memlist;
run;
And another idea, useful, if you want the result as report:
proc sql noprint;
select cats(LibName, '.', MemName, '(keep= Id)')
into :members separated by ' '
from sashelp.vtable
where LibName = 'HAVE'
;
quit;
data work.combined;
set &members. indsname= _dsName;
by Id;
Dataset = scan(_dsName, 2, '.');
run;
option missing = '0';
proc tabulate data= work.combined;
class Id Dataset;
table Id= '', Dataset*n='' / box= 'Id';
run;
options missing= '.';
Yet another approach (I dare to say even more generic) would be to "stack" datasets one -atop-another and then do the transpose (of course if you don't have problem having missing (".") instead zeros ("0") in the data.
Test data ( thanks @ErikLund_Jensen )
options dlcreatedir;
libname have "%sysfunc(pathname(work))/data";
/* Test data */
data have.u10_clucose; do id = 2 to 20 by 7; output; end; run;
data have.u28_tgly; do id = 1 to 15 by 2; output; end; run;
data have.u28_toothdata; do id = 1 to 15 by 3; output; end; run;
Code:
/* set metadata */
proc contents NOPRINT
data=have._all_
out =work.output(where=(UPCASE(NAME) = "ID")); /* Add: `AND TYPE=1` to select numeric only */
run;
/* do the "stack" with names */
data _null_;
call execute('data work.all_id; set ');
do until(EOF);
set work.output end=EOF;
call execute( cats(libname, '.', memname, '(keep=id)') );
end;
call execute('indsname = _inds_; indsname = scan(_inds_, -1, "."); is_in=1; run;');
stop;
run;
/* output form call execute is:
1 + data work.all_id; set
2 + HAVE.U10_CLUCOSE(keep=id)
3 + HAVE.U28_TGLY(keep=id)
4 + HAVE.U28_TOOTHDATA(keep=id)
5 + indsname = _inds_; indsname = scan(_inds_, -1, "."); is_in=1; run;
*/
/* sort and transpose */
proc sort
data=work.all_id
;
by id indsname;
run;
proc transpose
data = work.all_id
out = work.want(drop=_name_)
;
by id;
id indsname;
var is_in;
run;
All the best
Bart
To print out results with "zeros" instead "missings" use @andreas_lds idea (thanks for reminding about option `missing`!):
option missing = '0';
proc print data=work.want;
run;
option missing = '.';
Bart
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.