SQL Join with Two Tables, Need to Pull in Extra Data

emmytheduck · Posted 07-15-2015 05:11 PM

New to SAS. I'm working on an attrition table for membership. I need to see which members stayed, left, or were new from 2014 to 2015. I have a table that is sorted by their ID in year 14 and year 15. I've been able to get it to a point where there is a flag for NEW, TERM, and RETAIN. However, the only data that I can see is the ID and their flagged status.

So I thought, I'll SQL a table to join actual data (i.e. name, dob, address, etc.) to the table above. For some reason, when I pull member data into the table above, it deletes their info for about 10,000 lines.

My code for the first table looks like this:

proc sql;

create table combine as

select a.id as id14, b.idas id15, coalesce(a.id,b.id) as id_all

from _2014active a

full join _2015active b on a.id= b.id;

quit;

data everything14to15;

set combine;

if id14=id15then flag = 'Ret';

if id14='' then flag = 'New';

if id15='' then flag = 'Ter';

run;

THIS WORKS.

To join "everything14to15" I have the following, and it does not work....

proc summary data = sas.memberdata nway missing;

class name dob address;

output out = everything1 (drop = _type_ _freq_);

run;

proc sql;

create table everything2 as

select a.*, b.*

from everything14to15 a

left join everything1 b on a.id15=b.id;

quit;

I apologize if this is confusing, but any help is appreciated. I need the first table and the second table to join together to show member information - not just their ID and status.

PGStats · Posted 07-15-2015 05:33 PM

Seems to me that merging your datasets would be simpler :

data members;

length status $3;

merge _2014active (in=in2014) _2015active (in=in2015);

by id;

if in2014 and in2015 then status="Ret";

else if in2014 then status="Ter";

else status="New";

run;

(untested)

PG

emmytheduck · Posted 07-16-2015 11:33 AM

PG - this was very helpful. I've never used merge, and at our firm we typically stick with PROC SQL to put our tables together...to keep everything consistent we all use it. I'm going to have to look more into this. It worked perfectly!

emmytheduck · Posted 07-16-2015 01:45 PM

I do have a couple of questions. The (in=in2014) - is that naming a field associated with id? Because 'in2014' is not a field in the _2014active table. I'm just trying to understand the step. It works so well and I'd like to use it in the future!

PGStats · Posted 07-16-2015 04:23 PM

I use SQL a lot myself, but for some purposes the datastep simply works better. IN=in2015 is a dataset option that creates a boolean variable named in2015 which indicates if _2015active contributed to the merge. Thus, in the statement merge _2014active (in=in2014) _2015active (in=in2015), in2015 tells you if id was present in dataset _2015active.

By default, automatic variables created with the in= option are not kept in output datasets.

PG

emmytheduck · Posted 07-16-2015 05:03 PM

Thank you very much for the clarification!

PGStats · Posted 07-16-2015 10:18 PM

If you really need to do this with SQL then try:

proc sql;

create table members as

select * from

(select *,

"Ter" as status

from _2014active

where id not in (select id from _2015active) )

union all

(select *,

case when id in (select id from _2014active) then "Ret" else "New" end as status

from _2015active )

order by id;

quit;

PG

emmytheduck · Posted 07-17-2015 10:12 AM

Yeah, I was able to get most of it done with SQL, but I prefer the merge in this scenario. So thank you again for the knowledge!

SQL Join with Two Tables, Need to Pull in Extra Data

Re: SQL Join with Two Tables, Need to Pull in Extra Data

Re: SQL Join with Two Tables, Need to Pull in Extra Data

Re: SQL Join with Two Tables, Need to Pull in Extra Data

Re: SQL Join with Two Tables, Need to Pull in Extra Data

Re: SQL Join with Two Tables, Need to Pull in Extra Data

Re: SQL Join with Two Tables, Need to Pull in Extra Data

Re: SQL Join with Two Tables, Need to Pull in Extra Data

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away