excel formula is doing the same thing
=IF(G4<3,IFERROR(INDEX($B$1:$F$1,MATCH(3,B4:F4,0)),IF(G4<2,IFERROR(INDEX($B$1:$D$1,MATCH(2,B4:E4,0)),""),"")),"")
Is this correct:
Considering only months may to september. If all states are constant 1 1 1 1 1 or 2 2 2 2 2, then the month should be blank.
If not, then the month should be the month where the state has the highest value. If there are multiple states that has the same value, month should be the first month where this value occurs.
Correct?
yes you are right..
This implements that logic
data have;
input id Stage Date :date9.;
format Date date9.;
datalines;
22313 2 31MAY2019
22314 1 31MAY2019
22315 3 31MAY2019
22316 3 31MAY2019
22317 1 31MAY2019
22318 2 31MAY2019
22313 2 30JUN2019
22314 1 30JUN2019
22315 3 30JUN2019
22316 3 30JUN2019
22317 1 30JUN2019
22318 2 30JUN2019
22313 2 31JUL2019
22314 2 31JUL2019
22315 3 31JUL2019
22316 2 31JUL2019
22317 1 31JUL2019
22318 2 31JUL2019
22313 1 31AUG2019
22314 2 31AUG2019
22315 2 31AUG2019
22316 2 31AUG2019
22317 1 31AUG2019
22318 2 31AUG2019
22313 1 30SEP2019
22314 1 30SEP2019
22315 2 30SEP2019
22316 2 30SEP2019
22317 1 30SEP2019
22318 2 30SEP2019
22313 1 31OCT2019
22314 1 31OCT2019
22315 2 31OCT2019
22316 2 31OCT2019
22317 2 31OCT2019
22318 2 31OCT2019
;
data temp;
set have;
month_name=put(Date, monname. -l);
run;
proc sort data=temp;
by id;
run;
proc transpose data=temp out=temp2(drop=_:);
by id;
id month_name;
var Stage;
run;
data want;
set temp2;
array m{*} may--September;
month=vname(m[whichn(max(of m[*]), of m[*])]);
if range(of m[*])=0 & m[1] in (1, 2) then call missing(month);
run;
Result
id May June July August September October month 22313 2 2 2 1 1 1 May 22314 1 1 2 2 1 1 July 22315 3 3 3 2 2 2 May 22316 3 3 2 2 2 2 May 22317 1 1 1 1 1 2 22318 2 2 2 2 2 2
last piece of code i got below error.
ERROR: Array subscript out of range at line 27 column 17.
id=xxxxx May=. June=. July=. August=. September=. October=1 month= _ERROR_=1 _N_=1380
i could see from the source data...this particular ID started from 31OCT2019 itself.
could you please help how to overcome such scenarios
Try this
data have;
input id Stage Date :date9.;
format Date date9.;
datalines;
22313 2 31MAY2019
22314 1 31MAY2019
22315 3 31MAY2019
22316 3 31MAY2019
22317 1 31MAY2019
22318 2 31MAY2019
22313 2 30JUN2019
22314 1 30JUN2019
22315 3 30JUN2019
22316 3 30JUN2019
22317 1 30JUN2019
22318 2 30JUN2019
22313 2 31JUL2019
22314 2 31JUL2019
22315 3 31JUL2019
22316 2 31JUL2019
22317 1 31JUL2019
22318 2 31JUL2019
22313 1 31AUG2019
22314 2 31AUG2019
22315 2 31AUG2019
22316 2 31AUG2019
22317 1 31AUG2019
22318 2 31AUG2019
22313 1 30SEP2019
22314 1 30SEP2019
22315 2 30SEP2019
22316 2 30SEP2019
22317 1 30SEP2019
22318 2 30SEP2019
22313 1 31OCT2019
22314 1 31OCT2019
22315 2 31OCT2019
22316 2 31OCT2019
22317 2 31OCT2019
22318 2 31OCT2019
22319 1 31OCT2019
;
data temp;
set have;
month_name=put(Date, monname. -l);
run;
proc sort data=temp;
by id;
run;
proc transpose data=temp out=temp2(drop=_:);
by id;
id month_name;
var Stage;
run;
data want;
set temp2;
array m{*} may--September;
if nmiss(of m[*])=dim(m) then call missing (month);
else do;
month=vname(m[whichn(max(of m[*]), of m[*])]);
if range(of m[*])=0 & m[1] in (1, 2) then call missing(month);
end;
run;
i am sorry no luck
NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
29:11
NOTE: Invalid numeric data, 'May' , at line 29 column 11.
id=xxxxx May=1 June=1 July=1 August=1 September=1 October=1 month=. _ERROR_=1 _N_=1
@kishoresamineni wrote:
i am sorry no luck
NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
29:11
NOTE: Invalid numeric data, 'May' , at line 29 column 11.
id=xxxxx May=1 June=1 July=1 August=1 September=1 October=1 month=. _ERROR_=1 _N_=1
Please post the whole log of that step, at least from the data statement up to first the line with the variable listing.
Use the {i} button for posting logs. This is mandatory.
32 data getmonth; 33 set have; 34 by id; 35 retain change month; 36 length month $9; ERROR: Character length cannot be used with numeric variable Month. 37 l_stage = lag(stage); 38 if first.id 39 then do; 40 change = 0; 41 month = ''; 42 end; 43 else if l_stage ne stage then change = 1; 44 if stage in (2,3) and not last.id and (l_stage not in (2,3) or first.id) then month = idvar; 45 if last.id; 46 if change = 0 then month = ''; 47 keep id month; 48 run; NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column). 41:11 46:28 WARNING: The variable id in the DROP, KEEP, or RENAME list has never been referenced. NOTE: The SAS System stopped processing this step because of errors. WARNING: The data set WORK.GETMONTH may be incomplete. When this step was stopped there were 0 observations and 1 variables. WARNING: Data set WORK.GETMONTH was not replaced because this step was stopped. NOTE: DATA statement used (Total process time): real time 0.00 seconds user cpu time 0.00 seconds system cpu time 0.00 seconds memory 604.40k OS Memory 22364.00k Timestamp 20/11/2019 03:20:42 PM Step Count 178 Switch Count 0
Did you put a variable called month into the "have" dataset?
Because my code as posted works, see this log:
73 data have; 74 input id :$5. stage date :date9.; 75 format date yymmddd10.; 76 idvar = put(date,monname.); 77 datalines; NOTE: The data set WORK.HAVE has 36 observations and 4 variables. NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit): real time 0.00 seconds cpu time 0.00 seconds 114 ; 115 116 proc sort data=have; 117 by id date; 118 run; NOTE: There were 36 observations read from the data set WORK.HAVE. NOTE: The data set WORK.HAVE has 36 observations and 4 variables. NOTE: Verwendet wurde: PROZEDUR SORT - (Gesamtverarbeitungszeit): real time 0.00 seconds cpu time 0.00 seconds 119 120 data getmonth; 121 set have; 122 by id; 123 retain change month; 124 length month $9; 125 l_stage = lag(stage); 126 if first.id 127 then do; 128 change = 0; 129 month = ''; 130 end; 131 else if l_stage ne stage then change = 1; 132 if stage in (2,3) and not last.id and (l_stage not in (2,3) or first.id) then month = idvar; 133 if last.id; 134 if change = 0 then month = ''; 135 keep id month; 136 run; NOTE: There were 36 observations read from the data set WORK.HAVE. NOTE: The data set WORK.GETMONTH has 6 observations and 2 variables. NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit): real time 0.00 seconds cpu time 0.01 seconds 137 138 proc transpose data=have out=trans (drop=_name_); 139 by id; 140 var stage; 141 id idvar; 142 run; NOTE: There were 36 observations read from the data set WORK.HAVE. NOTE: The data set WORK.TRANS has 6 observations and 7 variables. NOTE: Verwendet wurde: PROZEDUR TRANSPOSE - (Gesamtverarbeitungszeit): real time 0.00 seconds cpu time 0.01 seconds 143 144 data want; 145 merge 146 trans 147 getmonth 148 ; 149 by id; 150 run; NOTE: There were 6 observations read from the data set WORK.TRANS. NOTE: There were 6 observations read from the data set WORK.GETMONTH. NOTE: The data set WORK.WANT has 6 observations and 8 variables. NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit): real time 0.00 seconds cpu time 0.00 seconds 151 152 proc print data=want noobs; 153 run; NOTE: There were 6 observations read from the data set WORK.WANT. NOTE: Verwendet wurde: PROZEDUR PRINT - (Gesamtverarbeitungszeit): real time 0.04 seconds cpu time 0.03 seconds
sorry i modified the variable names now you code is working fine, unfortunately it is giving me the same results what i got earlier
results what i got from you program
e.g., 1 2 2 2 3 2 June (this should be September Month)
1 1 1 1 2 2 September (Month should be blank but we are getting September)
even this should be blank but we got June month
1 2 2 2 2 2 June
Hi KurtBremser,
thanks for helping me out in sorting out this issue.
below is a sample list that i extracted for validation most of them reported correctly after using your code but some of them
thrown errors. might be because of a messy formula =IF(G311<3,IFERROR(INDEX($B$1:$F$1,MATCH(3,B311:F311,0)),IF(G311<2,IFERROR(INDEX($B$1:$D$1,MATCH(2,B311:E311,0)),""),"")),"") business is using (B-F column names in Excel are May to September) 'G' Column October and 'H' is the Month column we are looking for:
could you please consider my request in fixing this issues
e.g.,
1 2 2 2 3 2 June (but it should be september Month, stage 3 is the
hightest one last 5 months out of six months)
3 3 2 2 2 2 May (reported correctly)
2 2 2 1 1 1 May (reported correctly)
3 3 3 2 2 2 May (reported correctly)
1 2 2 2 2 2 June (should be blank, may be because excel <2 condition)
3 2 2 2 2 2 May (reported correctly)
2 2 1 1 1 1 May (reported correctly)
2 3 3 3 3 3 May (should be blank, beacuse May-Sep ❤️ and
May - July < 2 this is what my understanding from excel formula)
3 3 2 2 2 2 May (reported correctly)
3 3 3 1 2 2 September (should be May, higher stage is in May)
2 2 2 2 3 3 May (should be blank)
2 2 2 2 1 1 May (reported correctly)
1 1 2 2 2 2 July (should be blank)
1 2 1 1 1 1 June (reported correctly)
1 1 2 1 2 2 September (should be blank)
3 3 3 3 3 2 May (reported correctly)
2 2 1 1 1 1 May (reported correctly)
2 2 3 2 3 2 May (should be July, higher stage started in July)
1 1 2 2 2 2 July (should be blank)
2 2 2 1 1 1 May (reported correctly)
Hi,
and one more request, every month i don't want to change the Month names in Array.
I am looking for something like automatic process where it picks last six months data and enables the conditions for that Monhs
Infact, i was using the macros to pull the last six months of data. Only issue is with the Month conditions as we discussing.
And i have seen scenarios where the ID might have started last month or two months before.
@kishoresamineni wrote:
Hi,
and one more request, every month i don't want to change the Month names in Array.
I am looking for something like automatic process where it picks last six months data and enables the conditions for that Monhs
Infact, i was using the macros to pull the last six months of data. Only issue is with the Month conditions as we discussing.
And i have seen scenarios where the ID might have started last month or two months before.
Share the structure if not some example data of what you have before you turn macros loose on the data to "pick the last 6 months".
It may well be that part of your problem is that step.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.