BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
cosmid
Lapis Lazuli | Level 10
 

Hi,

 

Sorry, the subject isn't very clear. I wasn't sure how to word it. So here is the scenario:

 

I have a bunch of datasets in two different directories on UNIX, like this:

/user/dir1/: example1.sas7bdat, example2.sas7bdat, example3.sas7bdat

/user/dir2/: example1.sas7bdat, example3.sas7bdat

 

dir1 datasets has ID column, VarA, VarB, etc.

dir2 datasets has ID column, ColA, ColB, etc.

 

My objective is to check the values of VarA against the values of ColA for each dataset that matches. So it will check only for example1.sas7bdat and example3.sas7bdat. 

 

Question 1: How do I do that if I have dozens of directories and each directory have dozens of datasets?

Question 2: If the above doesn't work, how do I do it with 2 directories at a time? It is just no efficient to manually type in the datasets because some directories have dozens of datasets.

 

Thanks!

 

 

 

 

 

 
1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@cosmid 

As suggested by others first make things work for you for a single use case creating some fully functional static code. Only then amend the code and make it dynamic.

 

In the following some sample code which hopefully will give you some ideas.

 

1. Create some sample data

/* create sample Master table to compare against */
data work.master;
  set sashelp.class;
  idVar=_n_;
  colA=name;
run;

/* create sample ds for comparison agains master */
options dlcreatedir;
%let other_folder=%sysfunc(pathname(work))\other_folder;
libname sampleDs "&other_folder";
data sampleDs.ds1 
     sampleDs.ds2(drop=varA) 
     sampleDs.ds3;
  set master(rename=(colA=varA));
  output sampleDs.ds1;
  output sampleDs.ds2;
  if _n_=3 then varA='XXXX';
  output sampleDs.ds3;
run;

 

2. Code for comparison against a single table. Make this code work for you up to the point where you get exactly what you want. 

If you can't be sure that all tables will be pre-sorted by id then also already implement here the required logic to avoid any potential issues with not pre-sorted tables.

/* compare ds agains master */
proc compare 
  base=work.master
  comp=sampleDs.ds3
  novalues
  briefsummary
  ;
  var colA;
  with varA;
  id idVar;
run;
 

 

3. Only once you've got above 100% working start to make the code dynamic. You can see below how much this complicates the code so you don't want to have to debug issues in dynamic code which you can already resolve in the static code version.

/* compare ds agains master */
%macro comp_ds(
    masterDS=work.master, 
    masterCol=colA, 
    compDir=&other_folder, 
    compCol=varA,
    idVar=id
    );

  libname compdir "&other_folder" access=readonly;

  %local ds_list iter;
  %let ds_list=;
  %let iter=0;
  proc sql noprint;
    select 
      memname,
      put(count(*), 32. -l)
        into 
          :ds_list separated by ' ',
          :iter
    from dictionary.columns
    where 
      libname="COMPDIR"
      and upcase(name)=%upcase("&compCol")
      ;
  quit;
  %put &=iter;
  %put &=ds_list;

  %local compDS;
  %do i=1 %to &iter;
    %let compDS=%scan(&ds_list,&i);
    title1 "Base: &masterDS, Comp: compdir.&compDS";
    proc compare 
      base=&masterDS
      comp=compdir.&compDS
      novalues
      briefsummary
      ;
      var &masterCol;
      with &compCol;
      id &idVar;
    run;
    title;
  %end;
  libname compdir clear;
%mend;

%comp_ds(
    masterDS=work.master, 
    masterCol=colA, 
    compDir=&other_folder, 
    compCol=varA,
    idVar=idVar
    );

 

Besides of using Proc Compare there are also other ways how this could be done (i.e. using a hash lookup table). What's suitable for your case will depend on the details of what you have (data volumes included) and what you need.

View solution in original post

8 REPLIES 8
Reeza
Super User

My objective is to check the values of VarA against the values of ColA for each dataset that matches

How are you determining what matches?

 

In general, get it working for one manual step, show that code and we can help you automate it from there. 

 

 

 

Here's some references for you, not sure what level your macro skills are at:

 

UCLA introductory tutorial on macro variables and macros

https://stats.idre.ucla.edu/sas/seminars/sas-macros-introduction/

Tutorial on converting a working program to a macro

This method is pretty robust and helps prevent errors and makes it much easier to debug your code. Obviously biased, because I wrote it 🙂 https://github.com/statgeek/SAS-Tutorials/blob/master/Turning%20a%20program%20into%20a%20macro.md

Examples of common macro usage

https://communities.sas.com/t5/SAS-Communities-Library/SAS-9-4-Macro-Language-Reference-Has-a-New-Ap...

 

cosmid
Lapis Lazuli | Level 10
Thank you for your reply Reeza. At the time of posting, I only have a half working code. Basically, step 2 from Ballard's reply. I appreciate the links you have attached. I will read them when I get a chance.
ballardw
Super User

1) assign a different LIBRARY to each folder or directory if you have duplicated data set names. If you "know beyond a any shadow of a doubt" that data set names are not repeated anywhere you can assign a single library name to multiple folders, check the documentation.

 

2) You can use DICTIONARY.Columns (proc sql) or sashelp.vcolumn (data step) to get lists of data set library and names that have the variables of interest from the libraries of interest.

Pseudo code:

Proc sql;
   create table variablelist as
   select libname, memname, name 
   from dictionary.columns
   where libname in ('THISLIB' 'THATLIB')
      and upcase(name) in ('THISVAR' 'THATVAR' 'OTHERVAR')
   ;
quit;

the LIBNAME value stored in the SAS metadata in the dictionary tables is stored in upper case. All of your currently assigned libraries will appear. The Name of the variables are not stored in any standard case so I upcase the variable to compare to the list of variables I may be interested in.

 

3) Proc Compare will compare explicit variables with others using the VAR and WITH statements.

When using proc compare it is a good idea to sort the data sets by common variables first. This is if you are comparing values by an idea. If you just need to compare the distinct values of any given variable without caring which record has the values you should say so. There are lots of ways to "compare" and the type of comparison is needed for specific suggestions.

cosmid
Lapis Lazuli | Level 10
Hi ballardw, thank you for the reply. I was able to get to step 2. It was step 3 that got me stuck. I wasn't sure how to loop through the datasets I have created. But after reading Patrick's code, I have the code working. Your code was also very helpful. I appreciate your help!
Patrick
Opal | Level 21

@cosmid 

As suggested by others first make things work for you for a single use case creating some fully functional static code. Only then amend the code and make it dynamic.

 

In the following some sample code which hopefully will give you some ideas.

 

1. Create some sample data

/* create sample Master table to compare against */
data work.master;
  set sashelp.class;
  idVar=_n_;
  colA=name;
run;

/* create sample ds for comparison agains master */
options dlcreatedir;
%let other_folder=%sysfunc(pathname(work))\other_folder;
libname sampleDs "&other_folder";
data sampleDs.ds1 
     sampleDs.ds2(drop=varA) 
     sampleDs.ds3;
  set master(rename=(colA=varA));
  output sampleDs.ds1;
  output sampleDs.ds2;
  if _n_=3 then varA='XXXX';
  output sampleDs.ds3;
run;

 

2. Code for comparison against a single table. Make this code work for you up to the point where you get exactly what you want. 

If you can't be sure that all tables will be pre-sorted by id then also already implement here the required logic to avoid any potential issues with not pre-sorted tables.

/* compare ds agains master */
proc compare 
  base=work.master
  comp=sampleDs.ds3
  novalues
  briefsummary
  ;
  var colA;
  with varA;
  id idVar;
run;
 

 

3. Only once you've got above 100% working start to make the code dynamic. You can see below how much this complicates the code so you don't want to have to debug issues in dynamic code which you can already resolve in the static code version.

/* compare ds agains master */
%macro comp_ds(
    masterDS=work.master, 
    masterCol=colA, 
    compDir=&other_folder, 
    compCol=varA,
    idVar=id
    );

  libname compdir "&other_folder" access=readonly;

  %local ds_list iter;
  %let ds_list=;
  %let iter=0;
  proc sql noprint;
    select 
      memname,
      put(count(*), 32. -l)
        into 
          :ds_list separated by ' ',
          :iter
    from dictionary.columns
    where 
      libname="COMPDIR"
      and upcase(name)=%upcase("&compCol")
      ;
  quit;
  %put &=iter;
  %put &=ds_list;

  %local compDS;
  %do i=1 %to &iter;
    %let compDS=%scan(&ds_list,&i);
    title1 "Base: &masterDS, Comp: compdir.&compDS";
    proc compare 
      base=&masterDS
      comp=compdir.&compDS
      novalues
      briefsummary
      ;
      var &masterCol;
      with &compCol;
      id &idVar;
    run;
    title;
  %end;
  libname compdir clear;
%mend;

%comp_ds(
    masterDS=work.master, 
    masterCol=colA, 
    compDir=&other_folder, 
    compCol=varA,
    idVar=idVar
    );

 

Besides of using Proc Compare there are also other ways how this could be done (i.e. using a hash lookup table). What's suitable for your case will depend on the details of what you have (data volumes included) and what you need.

cosmid
Lapis Lazuli | Level 10
Hi Patrick,

Thank you for your reply. I really appreciate you went through it step by step. I spent the entire weekend studying it. I learned more from your code than a SAS book. I still have some questions about the code.

1. put(count(*), 32. -l), this is same as count(*) except it defines it as a long integer with 32 digits?
2. Is there a way to use a %do loop outside a %macro definition? like just in a regular data step
3. In step 3, you used a macro variable for the where statement,
where
libname="COMPDIR"
and upcase(name)=%upcase("&compCol")
The macro variable worked here in this code. However, when I tried something similar it stopped working:( Here's the code I wrote:
data test1;
input x $ a;
datalines;
a 1
b 2
c 3
;

proc sql;
select name
from dictionary.columns
where libname='WORK' and upcase(memname) = 'TEST1';
quit;

The code is working so far, but as soon as I replace the TEST1 inside the PROC SQL with a macro variable such as &temp, it stopped working and says no rows were selected. Here's the code with &temp:

%let temp=TEST1;

proc sql;
select name
from dictionary.columns
where libname='WORK' and upcase(memname) = '&temp';
quit;

%put &temp;

The put statement even shows that temp is TEST1. So I'm not sure why this isn't working anymore.
Patrick
Opal | Level 21

@cosmid 

Great that this was so helpful to you.

 

To answer your questions:

1. put(count(*), 32. -l), this is same as count(*) except it defines it as a long integer with 32 digits?

It WRITES the numerical value as a left aligned string with as many digits as required for the number (which is certainly not 32)


2. Is there a way to use a %do loop outside a %macro definition? like just in a regular data step

No, it isn't. Very recent SAS versions allow you now to write simple %if %then %else statements in open code but not %do loops as far as I recall. 

As a rule of thumb: Only use macro code if you can't do it with regular SAS code.


3. In step 3, you used a macro variable for the where statement,

Macro variables in SAS code must be in double quotes to resolve. You would use single quotes if you would want to actually pass the string &temp to the where condition.
where libname='WORK' and upcase(memname) = "&temp";

 

Thanks,

Patrick

cosmid
Lapis Lazuli | Level 10
Thank you for answering my questions! I appreciate it. The "&temp" is also working now.

Thank you again!

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 2145 views
  • 4 likes
  • 4 in conversation