@Deester Good morning, Just to clarify a couple of points before I modify the code,
-->"I realized that for some locations in my dataset, park_area values are only available for some of the years but not for all years within 2000-2005.
My assumption: This could mean the base year may not be year 2000 and the code basically needs to check and pick the first available value as the base year in the sequence and compare value with subsequent years? is my assumption right?
Is there a way to revise the code so that the values of park_area in later years are set equal to the first year that is available for that location (if values(in plural) in later years are greater than the first year)?"
My assumption: Can values increase or decrease, so for example, for year 2000 the value may be 250, then 280, then dropping to 240? This may not show a clear increasing trend
If you have a few mins, can you modify your sample HAVE and show an expected OUTPUT(WANT) sample covering the scenarios please?
Good morning!!
To answer your questions:
1. Yes your assumption is correct.
2. I want to compare the first two values at each location. So the data looks like this:
My dataset looks like:
year location park_area
2001 1 1002
2003 1 1021
2004 1 1020
2000 2 990
2002 2 890
2003 2 892
2005 2 905
So for location 1, I want to compare 2001 (first available year) with 2003 (second available year). If park_area in 2001< park_area in 2003, then 2003 and 2004 park_area are set to be equal to the 2001 park_area value. If park_area in 2001> park_area in 2003, 2004 value is set to be equal to the 2003 value.
For location2, I want to compare 2000 and 2002. If park_area in 2000< park_area in 2002, then 2002, 2003, and 2005 park_area are set to be equal to the 2000 value. If park_area in 2000> park_area in 2002, then 2003 and 2005 park_area are set to be equal to the 2002 value.
Hope this clarifies things. Thank you!
data have ;
infile datalines dlm=" ";
input year 4. location park_area;
datalines;
2001 1 1002
2003 1 1021
2004 1 1020
2000 2 990
2002 2 890
2003 2 892
2005 2 905
;
run;
proc sort data= have out=_have;
by location year;
run;
data want;
set _have;
by location year;
retain _p;
if first.location then _p=park_area;
else _p=ifn(park_area>_p,_p,park_area);
if _p then park_area=_p;
drop _p;
run;
Or a better bet:
data want;
call missing(of __f,__p);
do _n_=1 by 1 until(last.location);
set _have;
by location year;
if _n_=1 then __p=park_area;
else if _n_=2 and not __f then do;
__p=ifn(park_area>__p,__p,park_area);
__f=1;
end;
if __p then park_area=__p;
output;
end;
drop __:;
run;
Thank you!! The first code worked. It's very similar to the one I had but I made some small mistakes.
Hope this didn't take too much of your time! I really appreciate your help!!
No biggie. You are welcome!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.