07-15-2015 05:11 PM
New to SAS. I'm working on an attrition table for membership. I need to see which members stayed, left, or were new from 2014 to 2015. I have a table that is sorted by their ID in year 14 and year 15. I've been able to get it to a point where there is a flag for NEW, TERM, and RETAIN. However, the only data that I can see is the ID and their flagged status.
So I thought, I'll SQL a table to join actual data (i.e. name, dob, address, etc.) to the table above. For some reason, when I pull member data into the table above, it deletes their info for about 10,000 lines.
My code for the first table looks like this:
create table combine as
select a.id as id14, b.idas id15, coalesce(a.id,b.id) as id_all
from _2014active a
full join _2015active b on a.id= b.id;
if id14=id15then flag = 'Ret';
if id14='' then flag = 'New';
if id15='' then flag = 'Ter';
To join "everything14to15" I have the following, and it does not work....
proc summary data = sas.memberdata nway missing;
class name dob address;
output out = everything1 (drop = _type_ _freq_);
create table everything2 as
select a.*, b.*
from everything14to15 a
left join everything1 b on a.id15=b.id;
I apologize if this is confusing, but any help is appreciated. I need the first table and the second table to join together to show member information - not just their ID and status.
07-15-2015 05:33 PM
Seems to me that merging your datasets would be simpler :
length status $3;
merge _2014active (in=in2014) _2015active (in=in2015);
if in2014 and in2015 then status="Ret";
else if in2014 then status="Ter";
07-16-2015 11:33 AM
PG - this was very helpful. I've never used merge, and at our firm we typically stick with PROC SQL to put our tables together...to keep everything consistent we all use it. I'm going to have to look more into this. It worked perfectly!
07-16-2015 01:45 PM
I do have a couple of questions. The (in=in2014) - is that naming a field associated with id? Because 'in2014' is not a field in the _2014active table. I'm just trying to understand the step. It works so well and I'd like to use it in the future!
07-16-2015 04:23 PM
I use SQL a lot myself, but for some purposes the datastep simply works better. IN=in2015 is a dataset option that creates a boolean variable named in2015 which indicates if _2015active contributed to the merge. Thus, in the statement merge _2014active (in=in2014) _2015active (in=in2015), in2015 tells you if id was present in dataset _2015active.
By default, automatic variables created with the in= option are not kept in output datasets.
07-16-2015 10:18 PM
If you really need to do this with SQL then try:
create table members as
select * from
"Ter" as status
where id not in (select id from _2015active) )
case when id in (select id from _2014active) then "Ret" else "New" end as status
from _2015active )
order by id;
07-17-2015 10:12 AM
Yeah, I was able to get most of it done with SQL, but I prefer the merge in this scenario. So thank you again for the knowledge!