New to SAS. I'm working on an attrition table for membership. I need to see which members stayed, left, or were new from 2014 to 2015. I have a table that is sorted by their ID in year 14 and year 15. I've been able to get it to a point where there is a flag for NEW, TERM, and RETAIN. However, the only data that I can see is the ID and their flagged status.
So I thought, I'll SQL a table to join actual data (i.e. name, dob, address, etc.) to the table above. For some reason, when I pull member data into the table above, it deletes their info for about 10,000 lines.
My code for the first table looks like this:
proc sql;
create table combine as
select a.id as id14, b.idas id15, coalesce(a.id,b.id) as id_all
from _2014active a
full join _2015active b on a.id= b.id;
quit;
data everything14to15;
set combine;
if id14=id15then flag = 'Ret';
if id14='' then flag = 'New';
if id15='' then flag = 'Ter';
run;
THIS WORKS.
To join "everything14to15" I have the following, and it does not work....
proc summary data = sas.memberdata nway missing;
class name dob address;
output out = everything1 (drop = _type_ _freq_);
run;
proc sql;
create table everything2 as
select a.*, b.*
from everything14to15 a
left join everything1 b on a.id15=b.id;
quit;
I apologize if this is confusing, but any help is appreciated. I need the first table and the second table to join together to show member information - not just their ID and status.
Seems to me that merging your datasets would be simpler :
data members;
length status $3;
merge _2014active (in=in2014) _2015active (in=in2015);
by id;
if in2014 and in2015 then status="Ret";
else if in2014 then status="Ter";
else status="New";
run;
(untested)
PG
PG - this was very helpful. I've never used merge, and at our firm we typically stick with PROC SQL to put our tables together...to keep everything consistent we all use it. I'm going to have to look more into this. It worked perfectly!
I do have a couple of questions. The (in=in2014) - is that naming a field associated with id? Because 'in2014' is not a field in the _2014active table. I'm just trying to understand the step. It works so well and I'd like to use it in the future!
I use SQL a lot myself, but for some purposes the datastep simply works better. IN=in2015 is a dataset option that creates a boolean variable named in2015 which indicates if _2015active contributed to the merge. Thus, in the statement merge _2014active (in=in2014) _2015active (in=in2015), in2015 tells you if id was present in dataset _2015active.
By default, automatic variables created with the in= option are not kept in output datasets.
PG
Thank you very much for the clarification!
If you really need to do this with SQL then try:
proc sql;
create table members as
select * from
(select *,
"Ter" as status
from _2014active
where id not in (select id from _2015active) )
union all
(select *,
case when id in (select id from _2014active) then "Ret" else "New" end as status
from _2015active )
order by id;
quit;
PG
Yeah, I was able to get most of it done with SQL, but I prefer the merge in this scenario. So thank you again for the knowledge!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.