I am cleaning a huge longitudinal dataset (~1.5 million observations, ~400 thousand individuals). The data is unbalanced and looks something like table below.
Please note:
- I am using SAS version 9.4
- Visit # ranges from 1 to 561
- Status should always be the same value
For each subject ID, I need to check that 'status' is consistent throughout (same value always). If different, I need to flag it. I will later need to impute the missing values.
For example subject ID#1, I'll need to check and decide if the status is 1 or 4 and then impute the missing; ID #2 is fine; ID#4, I'll later need to impute the missing; ID#5 needs to flagged.
Subject ID | Visit # | Status |
1 | 1 | . |
1 | 2 | . |
1 | 3 | 4 |
1 | 4 | 1 |
1 | 5 | 1 |
2 | 1 | 4 |
2 | 2 | 4 |
3 | 1 | 4 |
4 | 1 | 2 |
4 | 2 | 2 |
4 | 3 | . |
5 | 1 | 2 |
5 | 2 | 2 |
5 | 3 | 3 |
5 | 4 | 2 |
I tried transposing and several combinations of arrays and do loops, permutations, combinations etc. An example of one (of many) attempts is below:
PROC TRANSPOSE DATA=dups_a prefix=vis out=transdups_a (drop=_name_);
by Subject_ID;
id visit;
var status;
RUN;
DATA transone;
set transdups_a;
array visit {*} vis:;
n = dim(visit);
k = 2;
ncomb = comb(n,k);
DO i=1 TO ncomb;
Call allcomb (i,k, of visit[*]);
If visit{1} NE visit{2} AND visit[1] ^=. then flag=1;
END;
If flag=1 then output;
RUN;
ERROR: The ALLCOMB routine cannot permute more than 33 variables, but 549 variables were
specified.
ERROR: Internal error detected in function ALLCOMB. The DATA step is terminating during the
EXECUTION phase.
NOTE: The SAS System stopped processing this step because of errors.
Thanks!
That seems overly complex, 1.5 million rows is trivial in SAS.
I think you can identify the number of status using count distinct. Once you have the n_statuses you can make your rules.
Proc sql;
Create table combined as
Select *, count(distinct Status) as n_status
From have
Group by Id;
Quit;
That seems overly complex, 1.5 million rows is trivial in SAS.
I think you can identify the number of status using count distinct. Once you have the n_statuses you can make your rules.
Proc sql;
Create table combined as
Select *, count(distinct Status) as n_status
From have
Group by Id;
Quit;
That did it! Thanks.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.