Hi, Need help solving this-
My dataset looks like this:
data have;
input id value1 value2 value3 value4 value5 dt1 :date9. dt2 :date9. dt3 :date9. dt4 :date9. dt5 :date9.;
format dt1 dt2 dt3 dt4 dt5 date9.;
datalines;
1 300 400 60 . 40 01feb2015 03mar2016 21apr2016 . 21may2016
2 1200 300 40 40 . 05jan2012 04feb2012 15apr2013 23apr2014 .
3 30 60 100 . 200 09dec2011 10jan2011 15feb2011 . 16apr2013
4 . . 40 30 200 . . 12feb2011 16mar2011 17apr2012
5 200 200 200 20 20 10sep2010 18oct2010 13dec2010 23apr2013 26oct2014
;
run;
dt1 corresponds to value1, date2 corresponds to value2 etc..
I need to pull last two non-missing 'value' variables if they are under a certain value -say 100 and then their corresponding dates and calculate difference period in days between them
This is what I want:
For ids 3 and 4: they don't qualify as atleast one of the last two non-missing values is greater than 100.
id lastnonmiss last2ndnonmiss dt_lastnonmiss dt_last2ndnonmiss timediffindays
1 40 60 21may2016 21apr2016 30
2 40 40 23apr2014 15apr2013 373
5 20 20 26oct2014 23apr2013 551
I tried using Coalesce function (which helped me pick the last non-missing and last 2nd nonmissing values) but unable to pick corresponding dates. Also, needed to include the initial condition that the last and 2nd last non-missing values should be less than 100. Please help ASAP! Much appreciated!!
Keep your VALUES in one array.
Keep your DATES in another array.
Have an array sized to 2 for holding the selected VALUES.
Similarly use an array sized to 2 to hold the corresponding DATES.
data have;
input id value1 value2 value3 value4 value5 dt1 :date9. dt2 :date9. dt3 :date9.
dt4 :date9. dt5 :date9.;
format dt1 dt2 dt3 dt4 dt5 date9.;
datalines;
1 300 400 60 . 40 01feb2015 03mar2016 21apr2016 . 21may2016
2 1200 300 40 40 . 05jan2012 04feb2012 15apr2013 23apr2014 .
3 30 60 100 . 200 09dec2011 10jan2011 15feb2011 . 16apr2013
4 . . 40 30 200 . . 12feb2011 16mar2011 17apr2012
5 200 200 200 20 20 10sep2010 18oct2010 13dec2010 23apr2013 26oct2014
;
run;
data want;
set have;
array v value1 - value5;
array d dt1 - dt5;
array tv[2] _temporary_;
array td[2] _temporary_;
count = 0;
do i = dim(v) to 1 by -1;
if v[i] = . then continue;
if v[i] Ge 100 then leave;
count + 1;
tv[count] = v[i];
td[count] = d[i];
if count = 2 then do;
datediff = td[1] - td[2];
output;
leave;
end;
end;
drop count i;
run;
The output is:
Can you show us the code you have tried so far?
Here is a brute force approach
data want;
format lastnonmiss last2ndnonmiss dt_lastnonmiss dt_last2ndnonmiss timediffindays;
set have;
array v {5} value5-value1;
array d {5} dt5-dt1;
lastnonmiss=coalesce(of v[*]);
dt_lastnonmiss=d[whichn(lastnonmiss, of v[*])];
v[whichn(lastnonmiss, of v[*])]=.;
last2ndnonmiss=coalesce(of v[*]);
dt_last2ndnonmiss=d[whichn(last2ndnonmiss, of v[*])];
if lastnonmiss <= 100 & last2ndnonmiss <= 100;
timediffindays=dt_lastnonmiss-dt_last2ndnonmiss;
format dt: date9.;
keep lastnonmiss last2ndnonmiss dt_lastnonmiss dt_last2ndnonmiss timediffindays;
run;
Result:
lastnonmiss last2ndnonmiss dt_lastnonmiss dt_last2ndnonmiss timediffindays 40 60 21MAY2016 21APR2016 30 40 40 23APR2014 15APR2013 373 20 20 26OCT2014 23APR2013 551
Keep your VALUES in one array.
Keep your DATES in another array.
Have an array sized to 2 for holding the selected VALUES.
Similarly use an array sized to 2 to hold the corresponding DATES.
data have;
input id value1 value2 value3 value4 value5 dt1 :date9. dt2 :date9. dt3 :date9.
dt4 :date9. dt5 :date9.;
format dt1 dt2 dt3 dt4 dt5 date9.;
datalines;
1 300 400 60 . 40 01feb2015 03mar2016 21apr2016 . 21may2016
2 1200 300 40 40 . 05jan2012 04feb2012 15apr2013 23apr2014 .
3 30 60 100 . 200 09dec2011 10jan2011 15feb2011 . 16apr2013
4 . . 40 30 200 . . 12feb2011 16mar2011 17apr2012
5 200 200 200 20 20 10sep2010 18oct2010 13dec2010 23apr2013 26oct2014
;
run;
data want;
set have;
array v value1 - value5;
array d dt1 - dt5;
array tv[2] _temporary_;
array td[2] _temporary_;
count = 0;
do i = dim(v) to 1 by -1;
if v[i] = . then continue;
if v[i] Ge 100 then leave;
count + 1;
tv[count] = v[i];
td[count] = d[i];
if count = 2 then do;
datediff = td[1] - td[2];
output;
leave;
end;
end;
drop count i;
run;
The output is:
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.