Need some help writing a program that selects the smallest and largest values when there are multiple ID entries.
My data looks a bit like this (actual sample attached)
ID Date Value
1 1/1/2007 100
1 1/8/2007 80
1 3/9/2008 70
1 2/2/2010 50
2 3/3/2008 60
2 6/3/2009 65
3 1/1/2010 150
3 1/2/2011 120
3 3/6/2012 80
For each ID, I want to pick up the values on the first date and the last date. For example, for loan ID 1, I want to identify the value 100 as the earliest value and dump this into a table. My aim is to calculate the change in value between the first and last dates. I need to preserve the dates for reference.
Many Thanks for any help!
Attached is some sample data, I'm on verion 7.12
proc sort data=have;
by id date;
run;
data want;
set have;
by id;
if first.id or last.id;
run;
data change;
set want;
by id;
value_prev=lag(value);
if last.id then do;
diff = value - value_prev;
end;
run;
OR
Does the following do what you are after-
data have;
input ID Date : mmddyy10. Value;
format date mmddyy10.;
datalines;
1 1/1/2007 100
1 1/8/2007 80
1 3/9/2008 70
1 2/2/2010 50
2 3/3/2008 60
2 6/3/2009 65
3 1/1/2010 150
3 1/2/2011 120
3 3/6/2012 80
;
proc sql;
create table want as
select *
from have
group by id
having date=min(date) or date=max(date);
quit;
proc sort data=have;
by id date;
run;
data want;
set have;
by id;
if first.id or last.id;
run;
data change;
set want;
by id;
value_prev=lag(value);
if last.id then do;
diff = value - value_prev;
end;
run;
OR
Hi,
I modified the Rezza's code and put the information in a row by id:
proc sort data=have;
by id date;
run;
data change;
retain id firstdate firstvalue lastdate lastvalue diff;
format firstdate lastdate ddmmyy10.;
set have;
by id;
if first.id then
do;
firstdate = date;
firstvalue = value;
diff= value;
end;
if last.id then
do;
lastdate = date;
lastvalue = value;
diff = diff - value;
output;
end;
drop date value;
run;
Here's another way to get the min and max dates and the value associated. Another data step on the output would allow the calculation be the min and max values.
proc summary data=have nway; class id; var date; id value; output out=want (drop= _:) max(date)=date_max idgrp(max(date) out(value)=maxvalue) min(date)=date_min idgrp(min(date) out(value)=minvalue); run;
Minor advantage, no sort needed.
Great !, but needs one more step to create the difference!
@sotojcr wrote:
Great !, but needs one more step to create the difference!
Agreed and mentioned. However by the time you get to having 8 or 9 variables that you want the id for a mix of min and max (or median) the number of variables to keep and reset in a data step approach can get cumbersome.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.