Proc sql = select distinct from multiple datasets ?

Reply
Occasional Contributor
Posts: 10

Proc sql = select distinct from multiple datasets ?

[ Edited ]

I currently have a proc sql code that I have been using for one year of data. In the first step, I select the distinct pairs from "DatasetA2012" that I need. Following, I run two proc sql codes to create tables (one from "DatasetA2012 and one from "DatasetB2012" with just the demographic variables I need to (left) join back up to the distinct pairs down the line.

 

Now, I need to update this code so it includes the prior 2 years of data in all steps (so 3 years total) in the final table. The only thing that needs to change in the code, aside from including the new data, is I would like to add a year indicator. I thought about running the code below 3 times  and then merging, but this could potentially count some "distinct" pairs multiple times so I would be grateful for any advice. I have pasted the proc sql code that works for 2012 below.

 

proc sql;

create table distinct as

select distinct ID, NUM

from oar.datasetA2012

order by NUM;

quit;

 

/*Create tables with just the demographic variables.*/

proc sql;

create table dem1 as

select ID, NUM, DOB, GNDR

from oar.datasetA2012;

quit;

 

proc sql;

create table dem2 as

select ID, MOS

from oar.datasetB2012;

quit;

Grand Advisor
Posts: 17,406

Re: Proc sql = select distinct from multiple datasets ?

How big are your datasets for each year?

Occasional Contributor
Posts: 10

Re: Proc sql = select distinct from multiple datasets ?

~6-7 million observations each

Esteemed Advisor
Posts: 5,198

Re: Proc sql = select distinct from multiple datasets ?

Not sure I understand the logic with a separate distinct table, and then join back variables from the originating table.

 

Never mind, I think a macro would be suitable for this. The you would create a %do loop per year. By doing so you could easily assign a column the current value of  the %do loop variable:

 

select &LoopYear. as Year...

At the en of each loop insert into/proc append to the final table.

Data never sleeps
Ask a Question
Discussion stats
  • 3 replies
  • 407 views
  • 1 like
  • 3 in conversation