Data comparison

Reply
Occasional Contributor
Posts: 17

Data comparison

[ Edited ]

Hi,

 

I need to compare two datasets - every record in dataset-1 against every record in dataset-2. If some criteria match, I need to perform a computation and record it.

 

First data set contains 300000 records and second one contains 1 million records. They don't have common columns. 

 

My code is:

 

%Macro A;

     DATA _NULL_;

         SET DATASET1;

             %B(variable1,variable2,..., variable5); /* variables 1-5 are the variables in DATASET1 */

         RUN;

%Mend ;

 

%Macro B (variable1,....,variable5);

     DATA results;

          SET DATASET2;

               IF (variable1 = var1) then DO; <ACTION> ; END;  /* var1 is the variable in DATASET2 */

               IF (variable2 = var2) then DO; <ACTION> ; END;  /* var2 is the variable in DATASET2 */

               . . . . . . . .

               IF (variable5 = var5) then DO; <ACTION> ; END;  /* var5 is the variable in DATASET2 */

          RUN;

 %MEND;

 

However, I can't pass the values of columns variable1-variable5 to Macro B as B takes the strings 'variable1' etc as values. Can't use Macro variables as they will be activated after RUN statement in Macro A. Can't use PROC SQL SEPARATED BY to create Macro variables as the Macro variable length exceeds. Tried CALL EXECUTE, but no luck. 

 

Any help is appreciated.

Super User
Posts: 19,052

Re: Data comparison

I'm not following. Especially your macro step and your issue with passing a macro variable.  

 

You also realize thats 300,000 * 1,000,000 = 300, 000, 000, 000 + (300 billion) comparisons that will probably take a long time. I'm sincerely hoping you're on a server. 

 

Can you explain furtehr what you're trying to do?

Occasional Contributor
Posts: 17

Re: Data comparison

Hi Reeza,

 

I don't want to do the entire data in a single go. I plan to do subsets. However, the problem as I mentioned is - unable to pass the values of the data set variables to the Macro B. I can't do the comparison and subsequent action. 

 

 

Super User
Posts: 19,052

Re: Data comparison

A macro variable is a text string. That's it.

 

If you have a list of variables that you need to process you have to loop through it.

 

But it sounds like you really have multiple conditions - use in different places in different ways so you'll have to manipulate it to be the form you want. 

 

 

Super User
Posts: 11,113

Re: Data comparison

Your general approach would fail any way. The macro code generated would terminate the first data step afte Set dataset1 because the macro B starts with a data statement and is a boundary to end datasteps.

 

Are you sure you want a single data set with 300,000,000,000 records? You might want to estimate required storage before doing that.

 

Since what you say is a cartesian join then the "natural" way would be

 

proc sql;

   create table big as

   select dataset1.*, dataset2.*

   from dataset1 join dataset2

   ;

quit;

 

depending on what your "action" may be then you would have a group of CASE statements.

 

I would suggest working out your logic with smaller versions of your data say 300 rows from set 1 and 1000 from set 2 to see if you're getting close to what you want.

 

Valued Guide
Posts: 858

Re: Data comparison

ballardw, can you explain what you mean with this example?  I'm not familiar with what you are saying:

 

data one;
infile cards dsd;
input a$ b$ c$;
cards;
a,b,c
;run;

data two;
infile cards dsd;
input a$ b$ c$;
cards;
1,2,3
;run;


proc sql;
create table big as
select one.*,two.*
from one join two;

Super User
Posts: 5,382

Re: Data comparison

If you wish to work with and manipulate data stored in tables, macro is usually not the way forward. Macro can dynamically generate SAS code.

 

Can you define what <ACTION> is?

Also, you compare 5 variables. If non of them are equal, do you wish to store the result?

if not, you have at least join criteria, using OR. How large would the hit rate be for such a case?

Data never sleeps
Occasional Contributor
Posts: 17

Re: Data comparison

I don't want to do the entire data in a single go. I plan to do subsets. However, the problem as I mentioned is - unable to pass the values of the data set variables to the Macro B. I can't do the comparison and subsequent action. 

 

Super User
Posts: 5,382

Re: Data comparison

I Don't understand.

 

Please post some representative sample data, desired output, and fill in the <ACTION> part in your code.

Data never sleeps
Super User
Posts: 11,113

Re: Data comparison


KrisNori wrote:

I don't want to do the entire data in a single go. I plan to do subsets. However, the problem as I mentioned is - unable to pass the values of the data set variables to the Macro B. I can't do the comparison and subsequent action. 

 


Since you want to look at subsets then the first thing would be to define the subsets. Those conditions could go into Where clauses in either a data step or proc sql.

And as I mentioned before your basic approach doesn't work because of the data step boundary.

 

Before coding anything with a macro you really need to have code that works in base SAS. Show us one or two cases as done in base SAS and we may be able to provide help with a macro or non-macro approach. And example input data with output is very helpful.

Ask a Question
Discussion stats
  • 9 replies
  • 563 views
  • 0 likes
  • 5 in conversation