Hi guys! I have a new question.
I have a dataset that looks like (observation=108):
var1 var1_date num2 num2_date cd4_3 cd4_3date var4 date4 ....... var16 date16
200 08/09/16 300 05/02/16 400 06/10/15 500 01/20/15 ....... 350 02/15/05
250 07/06/16 200 04/10/14
100 01/02/17 150 06/01/15 550 02/10/15
.
.
.
I used the min function to find the lowest numerical value for each observation. Now, I need to find the date that correspond to minumum numerical value (e.g for obs 1 it would be 08/09/16, obs 2 would be 04/10/14...). How can I proceed to do that??
Thank you!!
use VVALUEX() funtion.
data have;
input var1 var1_date : mmddyy10. num2 num2_date : mmddyy10. cd4_3 cd4_3date : mmddyy10.;
format var1_date num2_date cd4_3date mmddyy10.;
cards;
200 08/09/16 300 05/02/16 400 06/10/15 500 01/20/15 ....... 350 02/15/05
250 07/06/16 200 04/10/14 . .
100 01/02/17 150 06/01/15 550 02/10/15
;
run;
data want;
set have;
array x{*} var1 num2 cd4_3;
idx=whichn(min(of x{*}),of x{*});
min_date=vvaluex(cats(vname(x{idx}),'_date'));
run;
If at any point earlier in your code you can keep the variables named with numeric suffixes, this will make the problem much easier.
var1 vardate1 var2 vardate2, etc.
Any chance you can easily make that happen?
Also, can there be ties in there and how would you want to treat them? That is, what if the min appears twice but the date value is different?
Yes I can create new variables and just make them equal to the values in the original variables.
For ties, I would want to count both the date values if they are different. Is that possible?
Thanks!
@michan22 wrote:
Yes I can create new variables and just make them equal to the values in the original variables.
For ties, I would want to count both the date values if they are different. Is that possible?
Thanks!
Count? How would that look? Please post sample data and output that reflects your situation.
And how many ties could occur? Could you have the same value for the var with different dates in every case?
My thought was a new array to ouput the dates that match, but realistically how many possible ties would you need to account for?
Sorry, I meant that I would want the new variable to return the date (that correspond to the lowest numerical value) for each observation, even if there are multiple observations with the same lowest numerical value. I don't expect there to be a lot of ties, but there will be a few observations with the same (lowest) numerical values.
The solution offered here for max will work for min as well, change the function to MIN instead.
array vars(*) var1-var16;
array dates(*) date1-date16;
index_largest = whichn(max(of dates(*)), of dates(*));
value = vars(index_largest);
Any thought on ties? This will return the first, if I'm reading the documentation correctly.
Here's one possibility on how to deal with ties. It assumes that you could have a constant value for your "var" array, so all would be ties.
data have;
format date1 - date5 date9.;
informat date1 - date5 date9.;
input var1 - var5 date1 - date5;
datalines;
1 3 0 4 5 01JAN2017 02JAN2017 03JAN2017 04JAN2017 05JAN2017
1 3 0 0 5 01JAN2017 02JAN2017 03JAN2017 04JAN2017 05JAN2017
0 3 0 4 5 01JAN2017 02JAN2017 03JAN2017 04JAN2017 05JAN2017
;
data want;
set have;
array var{*} var:;
array date{*} date:;
/* With no ties, this is an ideal solution. */
reeza = date(whichn(min(of var(*)), of var(*)));
array match{5} match1 - match5;
do i = 1 to 5;
if var(i) = min(of var(*)) then match(i) = date(i);
end;
/* This sorts the matches ascending, so . comes first. */
call sortn(of match{*});
/* This reverses the sort, so you end up with desceding sort. */
array rev{*} match5 - match1;
call sortn(of rev{*});
format reeza match1 - match5 date9.;
/* Drop extraneous variables as as necessary... */
run;
@collinelliot A variable named reeza? I'm flattered
Credit where credit is due!
Also, see @Astounding 's solution for the reordering of columns in this post:
https://communities.sas.com/t5/SAS-Procedures/Ordering-Dates-Horizontal-Data/m-p/346838#M63589
Here a combination of @Reeza'a and your code which also identifies ties.
data smallest ties;
row_id=_n_;
format date1 - date5 date9.;
informat date1 - date5 date9.;
input var1 - var5 date1 - date5;
array vars{*} var:;
array dates{*} date:;
/* first lowest value */
index_smallest = whichn(min(of vars(*)), of vars(*));
format date date9.;
date = dates(index_smallest);
value = vars(index_smallest);
output;
/* ties? */
do _i=index_smallest+1 to dim(vars);
if value=vars(_i) then
do;
date = dates(_i);
output ties;
end;
end;
drop _i;
datalines;
4 3 0 1 5 01JAN2017 02JAN2017 03JAN2017 04JAN2017 05JAN2017
1 3 0 0 5 01JAN2017 02JAN2017 03JAN2017 04JAN2017 05JAN2017
3 0 0 4 5 01JAN2017 02JAN2017 03JAN2017 04JAN2017 05JAN2017
;
run;
Thank you for the advice!
I am relatively new to SAS and can you explain what this line does?
reeza = date(whichn(min(of var(*)), of var(*)));
In order to find the date with the lowest value, I have to first tell SAS that var1 is linked to date1 and var2 linked to date2 and so on right? is that what the array match does?
That's not correct.
You tell SAS var1 is linked to date1 by having a consistent index. The array definitions ensure that the variables line up.
The rest is functions that are nested.
Min/max -> find the minimum or maximum value, excluding missing values.
WHICHN -> searches through an array for a specified value, and returns the first index of where it's found.
Of var(*) -> short cut notation to reference an array.
Date() -> returns the value from the date array at specified value.
Now combine that all to solve your problem. You may want to break it out step by step.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.