I was using the answer to this question, in order to return the column name with the latest date. My data is in this format:
DATA have;
INPUT id away :DATE9. home :DATE9. travel :DATE9.;
FORMAT away home travel DATE9.;
DATALINES;
1 . . 01JAN2020
2 03JAN2020 . 03FEB2020
3 04FEB2020 01JAN2020 12FEB2020
;
Using the solution to the other question I have this code (I didn't want to include the ID in the array, so I had to drop it, then bring it back in):
DATA want;
SET have(drop=id);
ARRAY dates[*] _numeric_;
SET have;
latest = max(of dates[*]);
maxv = vname(dates[whichn(latest, of dates(*))]);
RUN;
Everything runs fun if I leave out the line maxv = vname..., but leaving it in gives me an error:
ERROR: Array subscript out of range at line 1068 column 18.
What is causing this error to come up in the last line of the code?
HI @MB_Analyst Your code successfully ran when i tested. Here is the log
1716 DATA have;
1717 INPUT id away :DATE9. home :DATE9. travel :DATE9.;
1718 FORMAT away home travel DATE9.;
1719 DATALINES;
NOTE: The data set WORK.HAVE has 3 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
1723 ;
1724
1725 DATA want;
1726 SET have(drop=id);
1727 ARRAY dates[*] _numeric_;
1728 SET have;
1729 latest = max(of dates[*]);
1730 maxv = vname(dates[whichn(latest, of dates(*))]);
1731 RUN;
NOTE: There were 3 observations read from the data set WORK.HAVE.
NOTE: There were 3 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 3 observations and 6 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
Btw, this makes me wonder whether your sample data is a good representative of your real?
Also does any of your records have all missings that eventually makes the MAX value also as missing? Please verify
I initially did this so I could use _NUMERIC_ to define the array, without using the ID variable. I still need the ID variable - is there a better way to achieve this? The data set I'm using has many variables for the array, and are likely to change. Should I force the ID as a character instead?
HI @MB_Analyst Your code successfully ran when i tested. Here is the log
1716 DATA have;
1717 INPUT id away :DATE9. home :DATE9. travel :DATE9.;
1718 FORMAT away home travel DATE9.;
1719 DATALINES;
NOTE: The data set WORK.HAVE has 3 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
1723 ;
1724
1725 DATA want;
1726 SET have(drop=id);
1727 ARRAY dates[*] _numeric_;
1728 SET have;
1729 latest = max(of dates[*]);
1730 maxv = vname(dates[whichn(latest, of dates(*))]);
1731 RUN;
NOTE: There were 3 observations read from the data set WORK.HAVE.
NOTE: There were 3 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 3 observations and 6 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
Btw, this makes me wonder whether your sample data is a good representative of your real?
Also does any of your records have all missings that eventually makes the MAX value also as missing? Please verify
Good and I am glad. You are not alone. I have had that issue many times. The thumb rule is always to be very cognizant of missing values and its propagation.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.