I am cleaning a huge longitudinal dataset (~1.5 million observations, ~400 thousand individuals). The data is unbalanced and looks something like table below. Please note: - I am using SAS version 9.4 - Visit # ranges from 1 to 561 - Status should always be the same value For each subject ID, I need to check that 'status' is consistent throughout (same value always). If different, I need to flag it. I will later need to impute the missing values. For example subject ID#1, I'll need to check and decide if the status is 1 or 4 and then impute the missing; ID #2 is fine; ID#4, I'll later need to impute the missing; ID#5 needs to flagged. Subject ID Visit # Status 1 1 . 1 2 . 1 3 4 1 4 1 1 5 1 2 1 4 2 2 4 3 1 4 4 1 2 4 2 2 4 3 . 5 1 2 5 2 2 5 3 3 5 4 2 I tried transposing and several combinations of arrays and do loops, permutations, combinations etc. An example of one (of many) attempts is below: PROC TRANSPOSE DATA=dups_a prefix=vis out=transdups_a (drop=_name_); by Subject_ID; id visit; var status; RUN; DATA transone; set transdups_a; array visit {*} vis:; n = dim(visit); k = 2; ncomb = comb(n,k); DO i=1 TO ncomb; Call allcomb (i,k, of visit[*]); If visit{1} NE visit{2} AND visit[1] ^=. then flag=1; END; If flag=1 then output; RUN; ERROR: The ALLCOMB routine cannot permute more than 33 variables, but 549 variables were specified. ERROR: Internal error detected in function ALLCOMB. The DATA step is terminating during the EXECUTION phase. NOTE: The SAS System stopped processing this step because of errors. Thanks!
... View more