DATA Step, Macro, Functions and more

Question about Proc sql in merging two big dataset

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 7
Accepted Solution

Question about Proc sql in merging two big dataset

I have a data a:

with a lot variables a b c d e...etc. for example, I want to combine with dataset b with variable named "X". but one row of b might have around 30,000 matched "X" in a.

my SAS code is:

proc sql noprint;

create table ds as

select a.*, b.momstatefips

from a left join b

on a.momstatefips1=b.momstatefips;

quit;

Is this code right? It always takes a long time to run this code and give me the note that "out of resource". Does that make sense???


Accepted Solutions
Solution
‎06-28-2015 11:17 PM
Respected Advisor
Posts: 4,935

Re: Question about Proc sql in merging two big dataset

Maybe you are trying to subset dataset a to extract the rows that are mentioned in dataset b? If so, use

proc sql;

create table ds as

select a.*

from a inner join b

on a.momstatefips1 = b.momstatefips;

quit;

PG

PG

View solution in original post


All Replies
Super User
Super User
Posts: 7,078

Re: Question about Proc sql in merging two big dataset

What are you actually trying to do?  Your current step is not adding any information since you are only selecting that variable that already matches the existing variable.

If you are doing a 1 to Many merge then use data step with a merge statement instead.

Below is the data step equivalent of your left join, plus I added the variable FOUND to indicate if there was a match. 

Note that it will only work if your data is sorted.


data DS ;

  merge MOMSTATEFIPS (in=in1 rename=(MOMSTATEFIPS1 =  MOMSTATEFIPS)) B (keep=MOMSTATEFIPS in=in2) ;

  by MOMSTATEFIPS;

  if in1 ;

  FOUND= in2;

run;

Solution
‎06-28-2015 11:17 PM
Respected Advisor
Posts: 4,935

Re: Question about Proc sql in merging two big dataset

Maybe you are trying to subset dataset a to extract the rows that are mentioned in dataset b? If so, use

proc sql;

create table ds as

select a.*

from a inner join b

on a.momstatefips1 = b.momstatefips;

quit;

PG

PG
Contributor
Posts: 21

Re: Question about Proc sql in merging two big dataset

Hi Experts,

Using an index in table momstatefips1 is one of options to reduce the processing time.

An index can reduce the time required to locate a set of rows, especially for a large data file.


proc sql noprint;

  create index x on work.momstatefips1(x);

  create table ds as

  select a.*

  from a inner join b

  on a.momstatefips1 = b.momstatefips;

quit;

Br, Amit

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 216 views
  • 0 likes
  • 4 in conversation