I've never expected the difficulty to perform a simple procuders with retain and first. statement.
My data look like this:
id | jobnum | farm_ever | startyear | endyear |
1004 | 1 | 0 | 1955 | 1959 |
1004 | 2 | 0 | 1959 | 1962 |
1004 | 3 | 1 | 1962 | 1977 |
1008 | 1 | 0 | 1951 | 1978 |
1008 | 2 | 0 | 1978 | 1981 |
1011 | 1 | 1 | 1954 | 1998 |
1012 | 1 | 0 | 1965 | 1966 |
1012 | 2 | 1 | 1966 | 1968 |
1012 | 3 | 0 | 1968 | 1972 |
1012 | 4 | 1 | 1972 | 1975 |
1012 | 5 | 1 | 1975 | 2000 |
1014 | 1 | 0 | 1959 | 1963 |
1014 | 2 | 1 | 1963 | 1965 |
1014 | 3 | 0 | 1965 | 1967 |
1014 | 4 | 1 | 1967 | 1970 |
1014 | 5 | 1 | 1970 | 1977 |
1014 | 6 | 1 | 1978 | 1998 |
1014 | 7 | 1 | 1998 | 1999 |
I need to calculate duration (endyear - startyear) when farm_ever = 1. When farm_ever = 1 continuously, I need duration across all jobnum. For example, when id=1014, farm_ever = 1 at jobnum 4, 5, 6, 7, the duration should be 1999-1967. I tried to use retain and first. statement :
proc sort data = jobhist ; by id jobnum farm_ever ; run ;
data jobhist ;
set jobhist ;
by id jobnum farm_ever ;
retain start ;
if first.farm_ever then start = startyear ;
run ;
But the results did not get retained because first.farm_ever is 1 for every entry. Then I tries proc sort without jobnum, but when farm_ever = 0, it went up to first, then the wrong year got retained.
Could anyone please help me out? Is there any function to fix this, or is there any other statement to do this apart from retain?
Thank you so much!
The sorted order is correct. RETAIN is also correct. Here's how you might add to those pieces:
data want;
set jobhist;
by id farm_ever notsorted;
retain start;
if farm_ever=1 then do;
if first.farm_ever then start = startyear;
if last.farm_ever then duration = endyear - start;
end;
drop start;
run;
NOTSORTED permits the BY statement, even though the data are not in order by FARM_EVER. You can see how the program utilizes first. and last. from that point.
The sorted order is correct. RETAIN is also correct. Here's how you might add to those pieces:
data want;
set jobhist;
by id farm_ever notsorted;
retain start;
if farm_ever=1 then do;
if first.farm_ever then start = startyear;
if last.farm_ever then duration = endyear - start;
end;
drop start;
run;
NOTSORTED permits the BY statement, even though the data are not in order by FARM_EVER. You can see how the program utilizes first. and last. from that point.
What about ID 1008 where farm_ever is never 1?
If the LAST value for Farm_ever going to be equal to 0 when it has been 1 in earlier jobnum values?
This may get close to the duration depending on responses to those questions.
data want ;
set jobhist ;
by id jobnum farm_ever ;
retain start ;
if first.id then start= -999;
if farm_ever=1 and start=-999 then start=startyear;
else if farm_ever=0 then start=-999;
if last.id and start ne -999 then duration = endyear-start;
run ;
BTW I recommend extreme caution with the
Data Jobhist;
set Jobhist;
code. It is entirely too easy to end up with bad data if you recode any of your original variables. Multiple passes can recode multiple times.
Thank you for your suggestion about data jobhist; set jobhist; I'll be more careful of using it.
If for ID 1008 that farm_ever has never been 1, then no need to deal with it. The tricky part is to that LAST value for Farm_ever going to be equal to 0 when it has been 1 in earlier jobnum values.
The code you provided did pretty much most of the calculation but some tricky ones were missed. It was pretty close, though. Thank you for sharing your insight!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.