BookmarkSubscribeRSS Feed
Weberian
Fluorite | Level 6

Here comes a question I cant figure out. To be classified as retired, an individual needs to have atleast 80% of the annual income from pension - two years in a row.

 

I have a longitudinal dataset (short format) with the percentages every year, variables: 2005, 2006, 2007.... 2014. Lets use Frank and Anna as an example:

 

Frank: 2005 (82%), 2006 (76%), 2007 (81%), 2008 (98%), 2009 (100%), 2010 (100%)...  which could be witten as [1 0 1 1 1 1]. In other words: Frank should be classified as retired 2007 since both 2007 and 2008 are above 80%. 

 

Anna: 2005 (0%), 2006 (0%), 2007 (0%), 2008 (0%), 2009 (100%), 2010 (100%) [0 0 0 0 1 1]. Anna should be classified as retired 2009. 

 

How do I handle this? The problem is the "two years in a row" thing. Thanks!

 

8 REPLIES 8
Astounding
PROC Star

You haven't exactly described your data in a way that makes sense as a SAS data set.  I'll illustrate how to go about this, but on some pretty stiff assumptions.  If your data doesn't match this, you may need to adjust the program considerably.

 

First, assume you have three fields for each observation:  name, year, and percent.

 

Second, assume your percent variable is on a scale of 0 to 100, not 0 to 1.  So a value of 50 = 50%.

 

Third, assume your observations are in sorted order BY NAME YEAR.

 

Finally, assume you want to add a fourth variable to each observation, with a value of Y or N to indicate retirement year.

 

data want;

set have;

by name;

set have (firstobs=2 keep=percent rename=(percent=next_percent))

have (keep=percent rename=(percent=next_percent));

if first.name then ever_retired='N';

retain ever_retired;

if ever_retired='N' and percent >= 80 and next_percent >= 80 and last.name=0 then do;

   ever_retired='Y';

   retired='Y';

end;

drop next_percent ever_retired;

run;

 

It's untested code, but looks about right.

Weberian
Fluorite | Level 6

Thank you Astounding. Let me try and I will come back ASAP with more details if it works or not!


 

Thansk again!

s_lassen
Meteorite | Level 14

@Weberian:

Astounding's solution is "about right", but I think it drops the final observation.

Here is another suggestion;

data want;
  set have;
  by name;
  if first.name then
    retired='N';
  retain retired;
  if percent>80 then do;
    if retired='N' and not last.name then do;
      _N_=_N_+1;
      set have (keep=percent rename=(percent=next_percent)) point=_N_;
      if next_percent>80 then
        retired='Y';
      end;
    end;
  else 
    retired='N';
  drop next_percent;
run;

This solution is setting the RETIRED variable to 'Y' for all years that are in a series of years with more than 80% of their income coming from pensions. (meaning that if somebody is retired, and then has a year of less than 80% of income coming from pensions, that person switches back to RETIRED='N'). Is that what you want?

Or are you just looking for the first year in such a series for each person, meaning that retirement is a once in a lifetime event? then something like this may work:

data want;
  set have;
  by name;
  if percent>80 and lag(percent)>80 and not first.name then do;
    retirement_year=year-1;
    output;
    end;
  keep name retirement_year;
run;

Like the previous example, it assumes that the data are sorted by NAME and YEAR, and that the years are consecutive.

Weberian
Fluorite | Level 6

Thank you s_lassen for your help. Very appreciated!!!

 

Since I am about to do survival analysis, I am interesting in duration (time to retirement) and event (retired y/n). So it is as you wrote in last sentence - I am only interested in one-life event.

 

I am using several datasets and merge them into one, currently in short-format-style (see attached illustrative Excel). Since my goal is to only have 1 variable with retirement year, for example. ID 4 become retired 2004, retirement_year=2004. Then I may drop the other information and only use the exact retirement date as in the Excel-example.

 

  • Since I only have one variable for retirement year, it doesnt matter wether I use long or short format?
  • But maybe I have do make it LONG-format to be able to sort by year (percentage) and use the program you sent me?

 

 

Thanks!

 

If the Excel doesn't work: https://imgur.com/a/31naK

Astounding
PROC Star

While my solution is untested, I think it should be fine.  It won't drop the final observation because the second SET statement mentions HAVE twice.  So the first SET statement determines when the DATA step ends.  At any rate, all of that assumes there is nothing else I overlooked (such as failing to set retired to "N" which would be a possible desired outcome instead of blank vs. "Y").

 

@s_lassen you might want to take a look at what happens if you have three observations in a row with percent > 80.  I'm not sure your logic is complete for that.  Looks like a viable approach, though.

Weberian
Fluorite | Level 6

Thank you both for helping me! I've tried both of your codes and the only I get to work is the last proposed by @s_lassen.

 

And it works! it only select years with >80% with the next year also >80%. BUT!

 

It do not select the FIRST year so in cases were individuals have many years according to the criteria it can look as this below: 

 

IDReirement_event
122782005
122782006
122942002
122942003
122942004
123172002
123172003
123172004
123172005
123172006

 

 

Can I add some "Select minimum year" for every ID into the program below? 

This also happens in cases were people 2003 have 81%, then 79% in 2004 and then 82% in 2005 and 2006. Were there becomes a gap (but this is not a problem)

IDReirement_event
123222003
123222005
123222006

 

 

data LIB1.Test2;
set LIB1.Test1;
by ID;
if percent>80 and lag(percent)>80 and not first.ID then do;
retirement_event=Retirement_year-1;
output;
end;
keep ID retirement_event ;
run; 

 

mkeintz
PROC Star

Any year that is part of consecutive percent >=80 is retire='Y',  otherwise retire='N'.

 

 

data want (drop=nxt_pct);
  set have (keep=name);
  by name;

  merge have
        have (firstobs=2 keep=percent rename=(percent=nxt_pct));

  retire=  ifc(min(lag(percent),percent)>=80 and first.name=0,'Y','N');
  if min(percent,nxt_pct)>=80 then retire='Y';

run;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Weberian
Fluorite | Level 6

Thank you all for helping me. It works now.

 

@Astounding @s_lassen @mkeintz 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 764 views
  • 2 likes
  • 4 in conversation