BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Deester
Obsidian | Level 7

I have a panel dataset of park area in 3000 different locations from 2000 to 2005. I want to find out which locations have an increase in park area after 2000. i.e., park_area2001 > park_area2000. 

If park_area2001>park_area2000, I want to assign values of park_area in 2001-2005 to be the same as park_area2000;

if park_area2001<park_area2000, then values of park_area in 2001-2005 are the same as value of park_area2001.

 

My dataset looks like:

year      location     park_area

2000      1                  1000

2001      1                  1002

2002      1                  1010

2003      1                  1021

2004      1                  1020

2005      1                  1020

2000      2                   990

2001      2                   890

2002      2                   890

2003      2                   892

2004      2                   900

2005      2                   905

 

What are some ways for me to code this?

 

Thank you very much.

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20
data have ;
infile datalines dlm=" ";
input year 4. location     park_area;
datalines;
2000 1 1000
2001 1 1002
2002 1 1010
2003 1 1021
2004 1 1020
2005 1 1020
2000 2 990
2001 2 890
2002 2 890
2003 2 892
2004 2 900
2005 2 905
;
run;

data want;
set have;
by location year;
retain _p;
if first.location then call missing(_p);
if year=2000 then _p=park_area;
else do;
if year=2001 then _p=ifn(park_area>lag(park_area),_p,park_area);
if _p then park_area=_p;end;
drop _:;
run;

View solution in original post

20 REPLIES 20
SuryaKiran
Meteorite | Level 14

Answer for this depends on how your data looks like. Please provide some sample data. Here is a link for creating a sample data https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

 

Thanks,
Suryakiran
SuryaKiran
Meteorite | Level 14

Check this out:

data have ;
infile datalines dlm=" ";
input year 4. location     park_area;
datalines;
2000 1 1000
2001 1 1002
2002 1 1010
2003 1 1021
2004 1 1020
2005 1 1020
2000 2 990
2001 2 890
2002 2 890
2003 2 892
2004 2 900
2005 2 905
;
run;
proc sort data=have;
by location year ;

data want(drop=lag);
retain park_area;
set have(rename=(park_area=area));
lag=lag(area);
if year=2000 then park_area=area;
if  year=2001 and area>lag then park_area=lag ;
if year=2001 and area<lag then park_area=area;
run;
Thanks,
Suryakiran
Deester
Obsidian | Level 7

Thank you SuryaKiran!! Now I learned how to use lag. 🙂

novinosrin
Tourmaline | Level 20
data have ;
infile datalines dlm=" ";
input year 4. location     park_area;
datalines;
2000 1 1000
2001 1 1002
2002 1 1010
2003 1 1021
2004 1 1020
2005 1 1020
2000 2 990
2001 2 890
2002 2 890
2003 2 892
2004 2 900
2005 2 905
;
run;

data want;
set have;
by location year;
retain _p;
if first.location then call missing(_p);
if year=2000 then _p=park_area;
else do;
if year=2001 then _p=ifn(park_area>lag(park_area),_p,park_area);
if _p then park_area=_p;end;
drop _:;
run;
Deester
Obsidian | Level 7

Thank you!!

Deester
Obsidian | Level 7

Hi 

 

 

Thank you! 

novinosrin
Tourmaline | Level 20

_p is a flag or temp variable to hold the value from the previous record. _ is a fancy naming convention i am used to. Don't take much notice. 

 

The call missing erases the held value of _p when sas processes a new by group to avoid any ambiguity. Hope that helps. 

 

PS I will prolly leave a nicer well explained notes tomorrow morning(Chicago time) when i get back to my college. I really do apologize for NOT so great explanation as I am racing against time to complete a personal assignment of mine that's due in tonight with just 2 hours to go. Hang in there. Cheers!

Deester
Obsidian | Level 7

Thank you very much. The explanation you provided already helps a ton. 

Sorry for taking your time when you have things due that are much more important than this. Hope you get it finished in time!

 

novinosrin
Tourmaline | Level 20

Good morning, Here are the Notes:

1. What does "retain _p" do here?

SAS initializes all new variables created using an assignment statement to missing when it processes every record of a dataset. However we need to hold the value of year 2000 to look up when we process the record with year 2001. Therefore we use retain to override the default behavior to hold the record across observations.

2. Since we retained the value across observations, we need some or other way to let go of that value when we begin process a new by group set which in your case location happens to be the by group. So when SAS reads the first record of the new by group, the call missing overwrites the held value of _p from the previous by group assigning it with a missing value as mentioned earlier to avoid any ambiguity.

3. The flag variable is eventually dropped once our look up operation is complete at the end of the data-step.

Deester
Obsidian | Level 7
Thank you very much! Now I understand every step!
Deester
Obsidian | Level 7

Hi novinosrin

 

 

 

novinosrin
Tourmaline | Level 20

@Deester Sure. Is it very urgent or can I give you the solution tomorrow chicago time at around 11am from my college lab?. If urgent i may have to sit just after supper. It's 9:10pm here in chicago right now.

Deester
Obsidian | Level 7

Not urgent at all.  I actually just work it out myself, but I'd love to see how you approach it if you don't mind spending time on it.

What I did was:

1. I selected two top variables from each location group.

2. Created a flag variable indicating if the second observation is greater than the first observation.

3. Joining this new dataset with the original dataset and assigned values to later years.

 

Thank you again!!

 

novinosrin
Tourmaline | Level 20

Absolutely no problem. Will respond to you in the morning without fail soon as I get to college. Enjoy your weekend and take care.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 20 replies
  • 1253 views
  • 5 likes
  • 3 in conversation