I have a panel dataset of park area in 3000 different locations from 2000 to 2005. I want to find out which locations have an increase in park area after 2000. i.e., park_area2001 > park_area2000.
If park_area2001>park_area2000, I want to assign values of park_area in 2001-2005 to be the same as park_area2000;
if park_area2001<park_area2000, then values of park_area in 2001-2005 are the same as value of park_area2001.
My dataset looks like:
year location park_area
2000 1 1000
2001 1 1002
2002 1 1010
2003 1 1021
2004 1 1020
2005 1 1020
2000 2 990
2001 2 890
2002 2 890
2003 2 892
2004 2 900
2005 2 905
What are some ways for me to code this?
Thank you very much.
data have ;
infile datalines dlm=" ";
input year 4. location park_area;
datalines;
2000 1 1000
2001 1 1002
2002 1 1010
2003 1 1021
2004 1 1020
2005 1 1020
2000 2 990
2001 2 890
2002 2 890
2003 2 892
2004 2 900
2005 2 905
;
run;
data want;
set have;
by location year;
retain _p;
if first.location then call missing(_p);
if year=2000 then _p=park_area;
else do;
if year=2001 then _p=ifn(park_area>lag(park_area),_p,park_area);
if _p then park_area=_p;end;
drop _:;
run;
Answer for this depends on how your data looks like. Please provide some sample data. Here is a link for creating a sample data https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...
Check this out:
data have ;
infile datalines dlm=" ";
input year 4. location park_area;
datalines;
2000 1 1000
2001 1 1002
2002 1 1010
2003 1 1021
2004 1 1020
2005 1 1020
2000 2 990
2001 2 890
2002 2 890
2003 2 892
2004 2 900
2005 2 905
;
run;
proc sort data=have;
by location year ;
data want(drop=lag);
retain park_area;
set have(rename=(park_area=area));
lag=lag(area);
if year=2000 then park_area=area;
if year=2001 and area>lag then park_area=lag ;
if year=2001 and area<lag then park_area=area;
run;
Thank you SuryaKiran!! Now I learned how to use lag. 🙂
data have ;
infile datalines dlm=" ";
input year 4. location park_area;
datalines;
2000 1 1000
2001 1 1002
2002 1 1010
2003 1 1021
2004 1 1020
2005 1 1020
2000 2 990
2001 2 890
2002 2 890
2003 2 892
2004 2 900
2005 2 905
;
run;
data want;
set have;
by location year;
retain _p;
if first.location then call missing(_p);
if year=2000 then _p=park_area;
else do;
if year=2001 then _p=ifn(park_area>lag(park_area),_p,park_area);
if _p then park_area=_p;end;
drop _:;
run;
Thank you!!
Hi novinosrin, I am trying to replicate your code but I have some questions. What does "retain _p" do here? What does the underscore "_" do?
I also have trouble to understand this line:
if first.location then call missing(_p)
Thank you!
_p is a flag or temp variable to hold the value from the previous record. _ is a fancy naming convention i am used to. Don't take much notice.
The call missing erases the held value of _p when sas processes a new by group to avoid any ambiguity. Hope that helps.
PS I will prolly leave a nicer well explained notes tomorrow morning(Chicago time) when i get back to my college. I really do apologize for NOT so great explanation as I am racing against time to complete a personal assignment of mine that's due in tonight with just 2 hours to go. Hang in there. Cheers!
Thank you very much. The explanation you provided already helps a ton.
Sorry for taking your time when you have things due that are much more important than this. Hope you get it finished in time!
Good morning, Here are the Notes:
1. What does "retain _p" do here?
SAS initializes all new variables created using an assignment statement to missing when it processes every record of a dataset. However we need to hold the value of year 2000 to look up when we process the record with year 2001. Therefore we use retain to override the default behavior to hold the record across observations.
2. Since we retained the value across observations, we need some or other way to let go of that value when we begin process a new by group set which in your case location happens to be the by group. So when SAS reads the first record of the new by group, the call missing overwrites the held value of _p from the previous by group assigning it with a missing value as mentioned earlier to avoid any ambiguity.
3. The flag variable is eventually dropped once our look up operation is complete at the end of the data-step.
Hi novinosrin,sorry for bugging you again but I am running into a new problem.
I realized that for some locations in my dataset, park_area values are only available for some of the years but not for all years within 2000-2005.
Is there a way to revise the code so that the values of park_area in laters years are set equal to the first year that is available for that location (if values in later years are greater than the first year)?
Here is the code I revised but it is not working:
data have;
set want;
by location year;
retain _p;
if first.location then call missing(_p);
if year=first.year then _p=park_area;
else do;
if year ne first.year then _p=ifn(park_area>lag(park_area), _p, park_area);
if _p then park_area=_p; end;
drop _:;
run;
Thank you!
@Deester Sure. Is it very urgent or can I give you the solution tomorrow chicago time at around 11am from my college lab?. If urgent i may have to sit just after supper. It's 9:10pm here in chicago right now.
Not urgent at all. I actually just work it out myself, but I'd love to see how you approach it if you don't mind spending time on it.
What I did was:
1. I selected two top variables from each location group.
2. Created a flag variable indicating if the second observation is greater than the first observation.
3. Joining this new dataset with the original dataset and assigned values to later years.
Thank you again!!
Absolutely no problem. Will respond to you in the morning without fail soon as I get to college. Enjoy your weekend and take care.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.