Solved: Re: Merge multiple sas files from one folder with file containing ID

Frank_johannes · Posted 01-23-2023 09:45 AM

Hi all,

I'm trying to merge multiple sas file with a SAS file containing unique ID

libname have "C:\Users\fkvn\Desktop\SAS files"

libname want "C:\Users\fkvn\Desktop\overview"

In my have library I have these files: id, SMS, u10_clucose, u28_tgly, y7_toothdata, u28_toothdata, u28_chol - this folder expands over time so more tables will be added to the folder. So the code has to be automatic and generic? Probably some macro variables containing all datasets name my folder "Have".

I want an output table showing me which tables each id exists in? The new column name should be the name of the dataset with 0/1 values - where 1 indicates that the ID exist in the table and 0 if not.

Output table should look like this:

ID	u10_clucose	u28_tgly	y7_toothdata	u28_toothdata	y7_toothdata u28_chol
1	1	1	0	0	1
2	1	1	1	1	0
3	1	1	1	0	0

Kind regards

Frank

yabwon · Posted 01-24-2023 07:55 AM

Yet another approach (I dare to say even more generic) would be to "stack" datasets one -atop-another and then do the transpose (of course if you don't have problem having missing (".") instead zeros ("0") in the data.

Test data ( thanks @ErikLund_Jensen )

options dlcreatedir;
libname have "%sysfunc(pathname(work))/data";

/* Test data */
data have.u10_clucose; do id = 2 to 20 by 7; output; end; run;
data have.u28_tgly; do id = 1 to 15 by 2; output; end; run;
data have.u28_toothdata; do id = 1 to 15 by 3; output; end; run;

Code:

/* set metadata */
proc contents NOPRINT
  data=have._all_ 
  out =work.output(where=(UPCASE(NAME) = "ID")); /* Add: `AND TYPE=1` to select numeric only */
run;

/* do the "stack" with names */
data _null_;
  call execute('data work.all_id; set ');
  do until(EOF);
    set work.output end=EOF;
    call execute( cats(libname, '.', memname, '(keep=id)') );
  end; 
  call execute('indsname = _inds_; indsname = scan(_inds_, -1, "."); is_in=1; run;');
  stop;
run;
/* output form call execute is:
1   + data work.all_id; set
2   + HAVE.U10_CLUCOSE(keep=id)
3   + HAVE.U28_TGLY(keep=id)
4   + HAVE.U28_TOOTHDATA(keep=id)
5   + indsname = _inds_; indsname = scan(_inds_, -1, "."); is_in=1; run;
*/

/* sort and transpose */
proc sort 
  data=work.all_id
;
  by id indsname;
run;
proc transpose 
  data = work.all_id 
  out = work.want(drop=_name_)
;
  by id;
  id indsname;
  var is_in;
run;

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

View solution in original post

Tom · Posted 01-23-2023 09:58 AM

Didn't someone already ask this question?
Are the datasets already sorted by ID? If so you just need to run code like:

data want;
  merge 
    u10_clucose(keep=id in=in1)
    u28_tgly(keep=id in=in2)	
   ...
  ;
  by id;
  if first.id;
  u10_clucose=in1;
  u28_tgly=in2;
  ...
run;

Which you can easily generate from the output of PROC CONTENTS;

proc contents data=have._all_ noprint out=contents; run;
data members;
  set contents;
  where upcase(name)='ID';
  memnum + 1;
  keep libname memname memnum ;
run;
filename code temp;
data _null_;
  file code;
  put @3 'merge';
  do while (not eof1);
     set members end=eof1;
     put @5 libname +(-1) '.' memname '(keep=id in=in' memnum ')';
  end;
  put @3 ';' 
    / @3 'by id;'
    / @3 'if first.id;'
  ;
  do while (not eof2);
    set members end=eof2;
    put @3 memname '=in' memnum ';' ;
  end;
  put 'run;' ;
run;
%include code / source2;

ballardw · Posted 01-23-2023 10:28 AM

Before we go very far down this path is the variable, ID, when present always of the same variable type in every single one of the data sets? If the ID variable is character is it always of the same length?

I ask because the best ways to create a report such as you describe requires appending values from data sets together which if the variable is not always the same in all data sets mean that this is likely not going to be easy to automate as you need address that issue with ID first. Second, if the length varies then you have a possibility of truncating data which means errors in the report.

Is the data from the data set ID to be in this report? How about SMC (you didn't show it in the report)?

Frank_johannes · Posted 01-24-2023 01:56 AM

Hi Bella. Good point. ID is text with the same length across the different tables. SMC is also included. Sorry my bad.
Regards Frank

ErikLund_Jensen · Posted 01-24-2023 06:41 AM

Hi @Frank_johannes

I would use a different approach. Instead of merging the data sets i think it is easier to extract ID's from all datasets and use statistics to reform output. Here is a working example that handles one to many input tables, I think the only requirements is that all tables contains a variable named ID, and this variable is of the same type in all datasets.

libname have 'c:\temp\testdata';

/* Test data */
data have.u10_clucose; do id = 2 to 20 by 7; output; end; run;
data have.u28_tgly; do id = 1 to 5 by 2; output; end; run;
data have.u28_toothdata; do id = 1 to 5 by 3; output; end; run;


/* Collect information from all datasets */
proc contents data=have._all_ noprint short out=x;
run;

proc sql noprint;
  select memname into :memlist separated by ' '
  from x;
quit;

data collect;
  length id &memlist 8.;
  stop;
run;

%macro m;
  %do i = 1 %to &sqlobs;
    %let thismem = %scan(&memlist,&i,%str( ));
    data tmp; set have.&thismem (keep=ID);
      &thismem = 1;
    run;

    data collect; set collect tmp;
    run;
  %end;
%mend;
%m;

/* Process information to report */
proc sort data=collect; by id;
run;

proc means data=collect noprint max;
  output out=idlist (where=(_stat_='MAX') drop= _type_ _freq_);
  by id;
  var &memlist;
run;

andreas_lds · Posted 01-24-2023 07:22 AM

And another idea, useful, if you want the result as report:

proc sql noprint;
   select cats(LibName, '.', MemName, '(keep= Id)')
      into :members separated by ' '
      from sashelp.vtable 
         where LibName = 'HAVE'
   ;
quit;


data work.combined;
   set &members. indsname= _dsName;
   by Id;
   
   Dataset = scan(_dsName, 2, '.');
run;

option missing = '0';

proc tabulate data= work.combined;
   class Id Dataset;
   table Id= '', Dataset*n='' / box= 'Id';
run;

options missing= '.';

yabwon · Posted 01-24-2023 07:55 AM

Yet another approach (I dare to say even more generic) would be to "stack" datasets one -atop-another and then do the transpose (of course if you don't have problem having missing (".") instead zeros ("0") in the data.

Test data ( thanks @ErikLund_Jensen )

options dlcreatedir;
libname have "%sysfunc(pathname(work))/data";

/* Test data */
data have.u10_clucose; do id = 2 to 20 by 7; output; end; run;
data have.u28_tgly; do id = 1 to 15 by 2; output; end; run;
data have.u28_toothdata; do id = 1 to 15 by 3; output; end; run;

Code:

/* set metadata */
proc contents NOPRINT
  data=have._all_ 
  out =work.output(where=(UPCASE(NAME) = "ID")); /* Add: `AND TYPE=1` to select numeric only */
run;

/* do the "stack" with names */
data _null_;
  call execute('data work.all_id; set ');
  do until(EOF);
    set work.output end=EOF;
    call execute( cats(libname, '.', memname, '(keep=id)') );
  end; 
  call execute('indsname = _inds_; indsname = scan(_inds_, -1, "."); is_in=1; run;');
  stop;
run;
/* output form call execute is:
1   + data work.all_id; set
2   + HAVE.U10_CLUCOSE(keep=id)
3   + HAVE.U28_TGLY(keep=id)
4   + HAVE.U28_TOOTHDATA(keep=id)
5   + indsname = _inds_; indsname = scan(_inds_, -1, "."); is_in=1; run;
*/

/* sort and transpose */
proc sort 
  data=work.all_id
;
  by id indsname;
run;
proc transpose 
  data = work.all_id 
  out = work.want(drop=_name_)
;
  by id;
  id indsname;
  var is_in;
run;

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

yabwon · Posted 01-24-2023 08:04 AM

To print out results with "zeros" instead "missings" use @andreas_lds idea (thanks for reminding about option `missing`!):

option missing = '0';
proc print data=work.want;
run;
option missing = '.';

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

Registration is open

SAS Training: Just a Click Away