DATA Step, Macro, Functions and more

Data Step instead of Left Join?

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 19
Accepted Solution

Data Step instead of Left Join?

Hey guys, 

Im a student from Germany (so sorry for my bad english) and Im having a Code with, containing a SQL step with Inner Join, but the program runs tooooo slowly. I guess there could be a faster version (e.g. with a data step) for the following code:

 

proc sql;

create table work.Result as select distinct a.*, b.Value from 

work.Table1 As a left join work.Table2 As B on (a.id = b.id and a.date = b.date);

quit;

 

The problem is that my program contains a macro and the code runs very often, so every second counts Smiley Happy

 

I´d appreciate it so much if you could help me with that.

 

Thanks in advance,

Alex


Accepted Solutions
Solution
‎04-07-2018 02:45 PM
Trusted Advisor
Posts: 1,831

Re: Data Step instead of Left Join?

Posted in reply to mrzlatan91

@mrzlatan91, I'm quoting few of your statemwnts:

Table 1 contains a lot of observations, table 2 has always a part of it, lets say less than 1% 
 Every Date/Value Combination of Table 2 is Unique.

Try next solution:

proc sort data=table1 out=temp1 nodupkey;
    by date id;
run;
/* if table1 is already sorted then replace above sort with:
       data temp1;
         set table1;
              by date value;
                   if last.value;  /* output distinct values */
      run;
******************************/
proc sort data=table2; by date id; run; /* verify that table is sorted */

data want;
merge table1(in=in1)
           table2(keep=value rename=(value=newvalue) in=in2);
  by date id;
       if in1;
run;

View solution in original post


All Replies
Trusted Advisor
Posts: 1,831

Re: Data Step instead of Left Join?

[ Edited ]
Posted in reply to mrzlatan91

1) The macro has no effect on run time.

 

2) If the relation between the 2 tables is many to many,

    then SQL is almost the only relevant to be used.

    Alternative way is using hash method - keeping one of the tables in memeory.

 

   if the relation is 1:N (or N:1)  it is better to sort both tables and MERGE them in a sas datastep.

 

3) Long time may be the result of big tables with a lot of variables.

    If you don't need all variables keep the relevant ones only.
    

Super User
Posts: 6,637

Re: Data Step instead of Left Join?

Posted in reply to mrzlatan91

Many ways ... a hash table would be faster, a format would be faster, it's even possible that sort+merge would be faster.  Here's a key related question that influences whether these other approaches would be suitable:

 

Does Table2 contain just a single observation for each combination of ID and DATE?

Occasional Contributor
Posts: 19

Re: Data Step instead of Left Join?

Posted in reply to Astounding

Wow, that sounds good. 

 

Table 1 contains a lot of observations, table 2 has always a part of it, lets say less than 1% (it depends on the iteration)

Unfortunately, I don´t know how to use the Hash Function to solve this problem.

Could anyone provide me a solution for my posted problem?

 

 

Super User
Posts: 13,338

Re: Data Step instead of Left Join?

Posted in reply to mrzlatan91

@mrzlatan91 wrote:

Wow, that sounds good. 

 

Table 1 contains a lot of observations, table 2 has always a part of it, lets say less than 1% (it depends on the iteration)

Unfortunately, I don´t know how to use the Hash Function to solve this problem.

Could anyone provide me a solution for my posted problem?

 

 


Provide some example data in the form of data steps if you want tested code.

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

 

I realize that small data sets won't have the same time of execution issue but code suggestions have a much better chance of actually working if provided. Any sensitive data values should be masked to protect the guilty.

Occasional Contributor
Posts: 19

Re: Data Step instead of Left Join?

Sorry, I should have done this before. I´ve made a simplified version, because the size of my dataset is around 40GB.

Both Tables have the same structure, but Table 1 is much bigger than Table 2.

 

data table1;
input date $ value $ id $ fundid $;
datalines;
01OCT18 50 1234 10000
01OCT18 52 1234 10001
01OCT19 50 1234 10000
01OCT19 53 1234 10001
01OCT19 54 1234 10002
01NOV19 60 1236 10000
01NOV19 61 1236 10001
01NOV19 62 1236 10002
01OCT19 54 1234 10007
01NOV19 55 1236 10008
01NOV19 35 1236 10009
01NOV19 66 1236 10010
01DEC19 40 1236 10008
01DEC19 42 1236 10009
01DEC19 53 1236 10010
;
run;

data table2;
input date $ value $ id $ fundid $;
datalines;
01OCT18 50 1234 10000
01OCT18 52 1234 10000
01OCT19 50 1234 10000
;
run;

proc sql;
create table work.result as select distinct a.*, b.value as newValue from
work.table1 as a left join work.table2 as b on (a.id=b.id and a.date = b.date);
quit;

Super User
Posts: 9,919

Re: Data Step instead of Left Join?

Posted in reply to mrzlatan91

From your example data, I can see no double entries for combinations of id and date. If this is true for the whole dataset, you can use a sort + data step merge. And the distinct in the SQL would not be needed.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Occasional Contributor
Posts: 19

Re: Data Step instead of Left Join?

Posted in reply to KurtBremser

Oh **bleep**, I think this was the reason why I´ve made the distinct clause.

 

What would be the solution if the distinct part is needed? Hash?

Super User
Posts: 9,919

Re: Data Step instead of Left Join?

Posted in reply to mrzlatan91

The next question would be the size of table2. If sizeof(id + date + value) * nobs can fit into memory, creating a format or loading into a hash table would make sorting unnecessary.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Occasional Contributor
Posts: 19

Re: Data Step instead of Left Join?

Posted in reply to KurtBremser

It depends on the Iteration, because I analyze every fund in my dataset.

Some funds have 100,000 Observations, some funds only 100.

Could you provide me a Hash-Solution?

Super User
Posts: 6,637

Re: Data Step instead of Left Join?

Posted in reply to mrzlatan91

Take a look at the data you posted.  Both data sets contain 2 observations for this combination:

 

ID=1234 

date=01OCT2018

 

Looking at the data, how do you know which observations to match?  How do you know which value to use from Table 2?  Do you need to match on not just ID and DATE, but also on FUNDID?

Occasional Contributor
Posts: 19

Re: Data Step instead of Left Join?

Posted in reply to Astounding

Oh sorry,

this was a mistake. Every Date/Value Combination of Table 2 is Unique.

 

I think I should explain it a little bit. Table 1 contains Valuations (Value) for Stocks/Bonds (IDs), that are made by some Funds (Fundid) in a specific month (date).

Table 2 contains Valuations for Stocks of ONE specific Fund, so every Date/Value Combination is unique.

 

 

Solution
‎04-07-2018 02:45 PM
Trusted Advisor
Posts: 1,831

Re: Data Step instead of Left Join?

Posted in reply to mrzlatan91

@mrzlatan91, I'm quoting few of your statemwnts:

Table 1 contains a lot of observations, table 2 has always a part of it, lets say less than 1% 
 Every Date/Value Combination of Table 2 is Unique.

Try next solution:

proc sort data=table1 out=temp1 nodupkey;
    by date id;
run;
/* if table1 is already sorted then replace above sort with:
       data temp1;
         set table1;
              by date value;
                   if last.value;  /* output distinct values */
      run;
******************************/
proc sort data=table2; by date id; run; /* verify that table is sorted */

data want;
merge table1(in=in1)
           table2(keep=value rename=(value=newvalue) in=in2);
  by date id;
       if in1;
run;
Occasional Contributor
Posts: 19

Re: Data Step instead of Left Join?

Thank you very much!

Super User
Posts: 9,919

Re: Data Step instead of Left Join?

Posted in reply to mrzlatan91

Also decide if the distinct is really needed. It necessitates a sort by all variables contained in the output.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 15 replies
  • 203 views
  • 0 likes
  • 5 in conversation