BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Frank_johannes
Calcite | Level 5

Hi all, 

 

I'm trying to merge multiple sas file with a SAS file containing unique ID 

 

libname have "C:\Users\fkvn\Desktop\SAS files"

libname want "C:\Users\fkvn\Desktop\overview"

 

In my have library I have these files: id, SMS, u10_clucose, u28_tgly, y7_toothdata, u28_toothdata, u28_chol - this folder expands over time so more tables will be added to the folder. So the code has to be automatic and generic? Probably some macro variables containing all datasets name my folder "Have". 

 

I want an output table showing me which tables each id exists in? The new column name should be the name of the dataset with 0/1 values - where 1 indicates that the ID exist in the table and 0 if not. 

 

Output table should look like this: 

ID u10_clucose u28_tglyy7_toothdatau28_toothdatay7_toothdata u28_chol
111001
211110
311100

 

Kind regards 

Frank 

1 ACCEPTED SOLUTION

Accepted Solutions
yabwon
Onyx | Level 15

Yet another approach (I dare to say even more generic) would be to "stack" datasets one -atop-another and then do the transpose (of course if you don't have problem having missing (".") instead zeros ("0") in the data. 

 

Test data ( thanks @ErikLund_Jensen )

options dlcreatedir;
libname have "%sysfunc(pathname(work))/data";

/* Test data */
data have.u10_clucose; do id = 2 to 20 by 7; output; end; run;
data have.u28_tgly; do id = 1 to 15 by 2; output; end; run;
data have.u28_toothdata; do id = 1 to 15 by 3; output; end; run;

Code:

/* set metadata */
proc contents NOPRINT
  data=have._all_ 
  out =work.output(where=(UPCASE(NAME) = "ID")); /* Add: `AND TYPE=1` to select numeric only */
run;

/* do the "stack" with names */
data _null_;
  call execute('data work.all_id; set ');
  do until(EOF);
    set work.output end=EOF;
    call execute( cats(libname, '.', memname, '(keep=id)') );
  end; 
  call execute('indsname = _inds_; indsname = scan(_inds_, -1, "."); is_in=1; run;');
  stop;
run;
/* output form call execute is:
1   + data work.all_id; set
2   + HAVE.U10_CLUCOSE(keep=id)
3   + HAVE.U28_TGLY(keep=id)
4   + HAVE.U28_TOOTHDATA(keep=id)
5   + indsname = _inds_; indsname = scan(_inds_, -1, "."); is_in=1; run;
*/

/* sort and transpose */
proc sort 
  data=work.all_id
;
  by id indsname;
run;
proc transpose 
  data = work.all_id 
  out = work.want(drop=_name_)
;
  by id;
  id indsname;
  var is_in;
run;

 

All the best

Bart

 

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



View solution in original post

7 REPLIES 7
Tom
Super User Tom
Super User

Didn't someone already ask this question?
Are the datasets already sorted by ID?  If so you just need to run code like:

data want;
  merge 
    u10_clucose(keep=id in=in1)
    u28_tgly(keep=id in=in2)	
   ...
  ;
  by id;
  if first.id;
  u10_clucose=in1;
  u28_tgly=in2;
  ...
run;

Which you can easily generate from the output of PROC CONTENTS;

proc contents data=have._all_ noprint out=contents; run;
data members;
  set contents;
  where upcase(name)='ID';
  memnum + 1;
  keep libname memname memnum ;
run;
filename code temp;
data _null_;
  file code;
  put @3 'merge';
  do while (not eof1);
     set members end=eof1;
     put @5 libname +(-1) '.' memname '(keep=id in=in' memnum ')';
  end;
  put @3 ';' 
    / @3 'by id;'
    / @3 'if first.id;'
  ;
  do while (not eof2);
    set members end=eof2;
    put @3 memname '=in' memnum ';' ;
  end;
  put 'run;' ;
run;
%include code / source2;

 

ballardw
Super User

Before we go very far down this path is the variable, ID, when present always of the same variable type in every single one of the data sets? If the ID variable is character is it always of the same length?

 

I ask because the best ways to create a report such as you describe requires appending values from data sets together which if the variable is not always the same in all data sets mean that this is likely not going to be easy to automate as you need address that issue with ID first. Second, if the length varies then you have a possibility of truncating data which means errors in the report.

 

Is the data from the data set ID to be in this report? How about SMC (you didn't show it in the report)?

Frank_johannes
Calcite | Level 5
Hi Bella. Good point. ID is text with the same length across the different tables. SMC is also included. Sorry my bad.
Regards Frank
ErikLund_Jensen
Rhodochrosite | Level 12

Hi @Frank_johannes 

 

I would use a different approach. Instead of merging the data sets i think it is easier to extract ID's from all datasets and use statistics to reform output. Here is a working example that handles one to many input tables, I think the only requirements is that all tables contains a variable named ID, and this variable is of the same type in all datasets. 

 

libname have 'c:\temp\testdata';

/* Test data */
data have.u10_clucose; do id = 2 to 20 by 7; output; end; run;
data have.u28_tgly; do id = 1 to 5 by 2; output; end; run;
data have.u28_toothdata; do id = 1 to 5 by 3; output; end; run;


/* Collect information from all datasets */
proc contents data=have._all_ noprint short out=x;
run;

proc sql noprint;
  select memname into :memlist separated by ' '
  from x;
quit;

data collect;
  length id &memlist 8.;
  stop;
run;

%macro m;
  %do i = 1 %to &sqlobs;
    %let thismem = %scan(&memlist,&i,%str( ));
    data tmp; set have.&thismem (keep=ID);
      &thismem = 1;
    run;

    data collect; set collect tmp;
    run;
  %end;
%mend;
%m;

/* Process information to report */
proc sort data=collect; by id;
run;

proc means data=collect noprint max;
  output out=idlist (where=(_stat_='MAX') drop= _type_ _freq_);
  by id;
  var &memlist;
run;
andreas_lds
Jade | Level 19

And another idea, useful, if you want the result as report:

 

proc sql noprint;
   select cats(LibName, '.', MemName, '(keep= Id)')
      into :members separated by ' '
      from sashelp.vtable 
         where LibName = 'HAVE'
   ;
quit;


data work.combined;
   set &members. indsname= _dsName;
   by Id;
   
   Dataset = scan(_dsName, 2, '.');
run;

option missing = '0';

proc tabulate data= work.combined;
   class Id Dataset;
   table Id= '', Dataset*n='' / box= 'Id';
run;

options missing= '.';
yabwon
Onyx | Level 15

Yet another approach (I dare to say even more generic) would be to "stack" datasets one -atop-another and then do the transpose (of course if you don't have problem having missing (".") instead zeros ("0") in the data. 

 

Test data ( thanks @ErikLund_Jensen )

options dlcreatedir;
libname have "%sysfunc(pathname(work))/data";

/* Test data */
data have.u10_clucose; do id = 2 to 20 by 7; output; end; run;
data have.u28_tgly; do id = 1 to 15 by 2; output; end; run;
data have.u28_toothdata; do id = 1 to 15 by 3; output; end; run;

Code:

/* set metadata */
proc contents NOPRINT
  data=have._all_ 
  out =work.output(where=(UPCASE(NAME) = "ID")); /* Add: `AND TYPE=1` to select numeric only */
run;

/* do the "stack" with names */
data _null_;
  call execute('data work.all_id; set ');
  do until(EOF);
    set work.output end=EOF;
    call execute( cats(libname, '.', memname, '(keep=id)') );
  end; 
  call execute('indsname = _inds_; indsname = scan(_inds_, -1, "."); is_in=1; run;');
  stop;
run;
/* output form call execute is:
1   + data work.all_id; set
2   + HAVE.U10_CLUCOSE(keep=id)
3   + HAVE.U28_TGLY(keep=id)
4   + HAVE.U28_TOOTHDATA(keep=id)
5   + indsname = _inds_; indsname = scan(_inds_, -1, "."); is_in=1; run;
*/

/* sort and transpose */
proc sort 
  data=work.all_id
;
  by id indsname;
run;
proc transpose 
  data = work.all_id 
  out = work.want(drop=_name_)
;
  by id;
  id indsname;
  var is_in;
run;

 

All the best

Bart

 

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



yabwon
Onyx | Level 15

To print out results with "zeros" instead "missings" use @andreas_lds  idea (thanks for reminding about option `missing`!):

option missing = '0';
proc print data=work.want;
run;
option missing = '.';

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 797 views
  • 2 likes
  • 6 in conversation