longitudinal data cleaning: consistency checks for static variable

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

longitudinal data cleaning: consistency checks for static variable

I am cleaning a huge longitudinal dataset (~1.5 million observations, ~400 thousand individuals). The data is unbalanced and looks something like table below.

Please note:

- I am using SAS version 9.4

- Visit # ranges from 1 to 561

- Status should always be the same value

 

For each subject ID, I need to check that 'status' is consistent throughout (same value always). If different, I need to flag it. I will later need to impute the missing values.

For example subject ID#1, I'll need to check and decide if the status is 1 or 4 and then impute the missing; ID #2 is fine; ID#4, I'll later need to impute the missing; ID#5 needs to flagged.

 

Subject ID

Visit #

Status

1

1

.

1

2

.

1

3

4

1

4

1

1

5

1

2

1

4

2

2

4

3

1

4

4

1

2

4

2

2

4

3

.

5

1

2

5

2

2

5

3

3

5

4

2

 

I tried transposing and several combinations of arrays and do loops, permutations, combinations etc. An example of one (of many) attempts is below:

 

PROC TRANSPOSE DATA=dups_a prefix=vis out=transdups_a (drop=_name_);

 by Subject_ID; 

   id visit;

   var status;

RUN;

 

DATA transone;

 set transdups_a;

  array visit {*} vis:;

    n = dim(visit);

    k = 2;

    ncomb = comb(n,k); 

DO i=1 TO ncomb;

   Call allcomb (i,k, of visit[*]);

    If visit{1} NE visit{2} AND visit[1] ^=. then flag=1;

END;

    If flag=1 then output;

RUN;

 

ERROR: The ALLCOMB routine cannot permute more than 33 variables, but 549 variables were

specified.

ERROR: Internal error detected in function ALLCOMB. The DATA step is terminating during the

EXECUTION phase.

NOTE: The SAS System stopped processing this step because of errors.

 

Thanks!


Accepted Solutions
Solution
‎07-09-2017 09:08 PM
Super User
Posts: 17,784

Re: longitudinal data cleaning: consistency checks for static variable

That seems overly complex, 1.5 million rows is trivial in SAS. 

 

 

I think you can identify the number of status using count distinct. Once you have the n_statuses you can make your rules. 

Proc sql;
Create table combined as 
Select *, count(distinct Status) as n_status
From have
Group by Id;
Quit;

View solution in original post


All Replies
Solution
‎07-09-2017 09:08 PM
Super User
Posts: 17,784

Re: longitudinal data cleaning: consistency checks for static variable

That seems overly complex, 1.5 million rows is trivial in SAS. 

 

 

I think you can identify the number of status using count distinct. Once you have the n_statuses you can make your rules. 

Proc sql;
Create table combined as 
Select *, count(distinct Status) as n_status
From have
Group by Id;
Quit;
New Contributor
Posts: 2

Re: longitudinal data cleaning: consistency checks for static variable

That did it! Thanks.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 92 views
  • 0 likes
  • 2 in conversation