Solved: Re: Simulate Proc sql nested query in Datastep

bnarang · Posted 01-11-2013 02:45 AM

Hi All

If we have two set of data let's say A and B. Both of them contains the list of ID. I would like to select list of accounts in A which are in B

To do so, I can write sql query like this:

Proc sql;

Create Table temp as

Select acct from

A

where acct in ( Select acc from B);

quit;

The problem is now, I do have an existing sas script which I need to modify where large number of nested conditions are specified in datasteps. How can I apply this condition in same Datastep.

1 possibility is to have all the list of accounts in B as in macro variable and can apply where acct in (&acc_list) but this will not suffice in my case since the number of accounts are in millions and macro variable value has some limitations.

Any suggestions.

LinusH · Posted 01-11-2013 03:49 AM

Since I don't have access to your SAS script, I'm not sure if I will give you the best advice.

If you want to mimic left joins in the data step (assuming tha ID in table B is unique), you can use MERGE BY-logic, combined with data set option IN=, like

data c;

merge a(in=a)

b(in=b);

by acc;

if a and b;

....

run;

Another option could by to import the ID:s from b into a hash table. Probably better performance, but requires some more programming.

A third option is to build a format on ID on b, with a specific label on "other", the us that in a where clause:

where put(acc,acc_fmt.) not = 'ACC does not exist';

In this case assumed ACC is numeric.

Data never sleeps

View solution in original post

LinusH · Posted 01-11-2013 03:49 AM

Since I don't have access to your SAS script, I'm not sure if I will give you the best advice.

If you want to mimic left joins in the data step (assuming tha ID in table B is unique), you can use MERGE BY-logic, combined with data set option IN=, like

data c;

merge a(in=a)

b(in=b);

by acc;

if a and b;

....

run;

Another option could by to import the ID:s from b into a hash table. Probably better performance, but requires some more programming.

A third option is to build a format on ID on b, with a specific label on "other", the us that in a where clause:

where put(acc,acc_fmt.) not = 'ACC does not exist';

In this case assumed ACC is numeric.

Data never sleeps

bnarang · Posted 01-29-2013 09:46 PM

You' re right. But so far I am not aware of Hash merging. The problem is with the existing code is already there and I would just have to enhance it without changing the structure really. So, I was looking for options basically.

NagendraKumarK · Posted 01-17-2013 11:01 AM

Hi bnarang,

I am not sure, what is u r request. If your request is select list of accounts from A and which are in B table. Please check this below..

proc sql;

create table res from

(select A.acct from A,B where A.acct=B.acct);

run;

Florent · Posted 01-30-2013 01:08 PM

Hi Bnarang,

What kind of nested conditions are specified in these datasteps that you want to include in your proc sql ?

I think you might simply add them under the 'where' or 'having' clause, depending whether their scope is simple or aggregated data.

Proc sql;

Create Table temp as

Select acct

from A

where acct in (Select acc from B where <condition on table B>)

and <condition on table A>;

Quit;

Kind regards,

Florent

Simulate Proc sql nested query in Datastep

Re: Simulate Proc sql nested query in Datastep

Re: Simulate Proc sql nested query in Datastep

Re: Simulate Proc sql nested query in Datastep

Re: Simulate Proc sql nested query in Datastep

Re: Simulate Proc sql nested query in Datastep

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away