Solved: Count number of changes in variables over time

RyanSimmons · Posted 01-28-2016 10:44 AM

I have a feeling that there is a fairly simple programming solution to this problem I have, but I just have not been able to come up with it.

I have a dataset that looks something like this:

DATA example;
	input subject month pp01 pp02 pp03;
	datalines;
	1 0 0 0 2
	1 6 2 0 3
	2 0 2 0 1
	2 6 1 1 2
	3 0 0 2 0
	3 6 0 1 3
;
run;

My real dataset is formatted similarly, but with dozens of variables and thousands of subjects. The variables pp01-pp03 are Likert scale measurements that can take on values of 0, 1, 2, or 3.

In any case, what I need to do is count the number of times that a subject transitioned between states over time. That is, I want to count the number of times an individual who had a value of 0 or 1 at month 0 and then a value of 2 or 3 at month 6, AND the number of times an individual who had a value of 2 or 3 at month 0 then had a value of 0 or 1 at month 6.

To make things a bit simpler, we are going to define an individual who got "worse" over time as an individual who had a value of 0 or 1 at month 0 and then a value of 2 or 3 at month 6, and an individual who got "better" over time as an individual with the opposite pattern.

So, using the example dataset above, I would eventually want output that looks something like this:

DATA want;
input variable $ worse better;
datalines;
pp01 1 1
pp02 0 1
pp03 2 0
;
run;

So, on variable pp01 one person got worse (subject 1) and one person got better (subject 2). On variable pp02 one person got better (subject 3). And on variable pp03 two people got worse (subject 2 and subject 3).

Any ideas?

FreelanceReinh · Posted 01-28-2016 11:09 AM

Before I saw your second post, my suggestion would have been this:

proc transpose data=example out=trans;
by subject;
id month;
var pp:;
run;

proc sql;
create table want as
select _name_ as variable label=' ',
       sum((_0 in (0,1) & _6 in (2,3))) as worse,
       sum((_0 in (2,3) & _6 in (0,1))) as better
from trans
group by _name_;
quit;

View solution in original post

Reeza · Posted 01-28-2016 10:52 AM

Do you only have two records per person?

RyanSimmons · Posted 01-28-2016 10:58 AM

In my actual dataset, the number of observations per person can vary from 1 to 5. That is, some people were only observed at month 0, others were followed-up at month 6, month 12, month 18, up to month 24. At the moment, the primary outcome of interest is in change at month 6, which restricts the dataset to the subset of individuals who were observed at both month 0 and month 6, thus I kept my OP pretty simple. But, ultimately I will be looking at the other time points ... but by the nature of the comparison of interest, we automatically have to throw out anybody who wasn't observed at each time point.

Haikuo · Posted 01-28-2016 11:08 AM

So if you have more than 2 time points, what happens when the conditions goes back and forth, such as 'better-worse-better"? Or all you care is to compare starting point and ending point?

RyanSimmons · Posted 01-28-2016 11:12 AM

Well, as I said, right now our primary outcome is change at the first follow-up visit (month 6), so people can only either improve, get worse, or stay the same. Once we start looking at the further follow-up time points, we will be looking at the counts for every possible transition pattern.

RW9 · Posted 01-28-2016 10:58 AM

Well, this is a bit long winded, but works (will see if I can make better code when I get some time):

DATA example;
	input subject month pp01 pp02 pp03;
	datalines;
	1 0 0 0 2
	1 6 2 0 3
	2 0 2 0 1
	2 6 1 1 2
	3 0 0 2 0
	3 6 0 1 3
;
run;

proc transpose data=example out=have;
  by subject month;
  var pp01-pp03;
run;

proc sort data=have;
  by subject _name_ month;
run;

proc transpose data=have out=t_have prefix=month;
  by subject _name_;
  var col1;
  id month;
  idlabel month;
run;

proc sql;
  create table WANT as
  select  COALESCE(A._NAME_,B._NAME_) as VARIABLE,
          A.WORSE,
          B.BETTER
  from    ( select  _NAME_,
                    count(distinct SUBJECT) as WORSE
            from    (select * from T_HAVE where MONTH0 in (0,1) and MONTH6 in (2,3))
            group by _NAME_) A
  full join (select _NAME_,
                    count(distinct SUBJECT) as BETTER
             from   (select * from T_HAVE where MONTH0 in (2,3) and MONTH6 in (0,1))
             group by _NAME_) B
  on      A._NAME_=B._NAME_;
quit;

RyanSimmons · Posted 01-28-2016 11:03 AM

Thanks, RW9! That is better than the approach I had been working on, which was using array statements to convert all the variables to binary and then using PROC COMPARE and a dataset to format the output from that. Yours works a lot better. I will accept it as the answer if nobody ends up one-upping you 😉

FreelanceReinh · Posted 01-28-2016 11:09 AM

Before I saw your second post, my suggestion would have been this:

proc transpose data=example out=trans;
by subject;
id month;
var pp:;
run;

proc sql;
create table want as
select _name_ as variable label=' ',
       sum((_0 in (0,1) & _6 in (2,3))) as worse,
       sum((_0 in (2,3) & _6 in (0,1))) as better
from trans
group by _name_;
quit;

RyanSimmons · Posted 01-28-2016 11:15 AM

That solution works perfectly, actually, even with the unbalanced data.

BMiller · Posted 01-28-2016 12:37 PM

Another option.

1)Sort by subject / month
2)A data step containing
By subject
For each month create an array "month0", "month6", "month12", etc and store values pp01 - pp100 (or however many). This can be done using a macro loop.
At last.subject, compare the arrays for each sample month and loop through values (1-100), comparing the results and classifying (into better / worse / same) . Creat variables for start month, end month, test (1-100), result (better / worse /same) and output before looping to next pp<n> in the arrays.

Doing from my phone so tricky to code in full. 😉

FreelanceReinh · Posted 01-28-2016 02:20 PM

This might be a first draft for the general case with four follow-up visits. It assumes that all five month values (0, 6, 12, 18, 24) occur in the data (at least once, possibly with missing values for PP01 - PP03). If this was not the case, one could temporarily insert data from a "dummy patient" (e.g. subject=999999999) who has all five visits (with missing data) and then delete their records from dataset TRANS2 after the PROC TRANSPOSE step.

data example2;
	input subject month pp01 pp02 pp03;
	datalines;
	1  0 0 0 2
	1  6 1 . 0
	1 12 3 1 0
	1 18 2 2 2
	2  0 2 3 .
	2  6 1 3 2
	2 18 2 2 3
	2 24 . 1 1
	3  6 . . 3
	3 12 2 . 0
	3 18 2 0 2
	3 24 1 3 0
	4  0 1 1 1
	5  0 1 . 2
	5  6 0 . 0
	6  0 0 1 2
	6  6 1 2 3
;
run;

proc transpose data=example2 out=trans2;
by subject;
id month;
var pp:;
run;

proc format;
invalue likert
'0', '1' = 1   /* good */
'2', '3' = 0   /* bad  */
other    = .;
run;

data temp;
set trans2;
length cp $5; /* change pattern */
array v[5] _0 _6 _12 _18 _24;
array e[5];
retain lnm;   /* last non-missing value */
do i=1 to 5;
  e[i]=input(put(v[i], 1.), likert.);
  if i=1 then substr(cp,1,1)=ifc(e1>.,'B','.'); /* 'B' = "has baseline value" */
  else select;
    when(e[i]=.)   substr(cp,i,1)='.'; /* change missing, as current value is missing */
    when(lnm=.)    substr(cp,i,1)='*'; /* first non-missing value if not baseline */
    when(lnm=e[i]) substr(cp,i,1)='0'; /* no signif. change compared to last non-missing (lnm) value */
    when(lnm<e[i]) substr(cp,i,1)='+'; /* lnm value 2 or 3, current value 0 or 1 (better) */
    when(lnm>e[i]) substr(cp,i,1)='-'; /* lnm value 0 or 1, current value 2 or 3 (worse) */
    otherwise put 'CAUTION: Unexpected values!' _all_;
  end;
  if e[i]>. then lnm=e[i];
end;
lnm=.;        /* reset LNM after processing the last visit */
keep _name_ cp; /* for checking purposes you may want to comment this line out */
proc sort;
by _name_;
run;

/* Count change patterns separately for PP01, PP02 and PP03 */

proc freq data=temp noprint;
by _name_;
tables cp / out=want2(rename=(_name_=variable));
run;

proc print data=want2;
run;

Ksharp · Posted 01-28-2016 09:30 PM

This is very complicated problem since you have so many rules.

The following code is just for your simple data.

DATA example;
	input subject month pp01 pp02 pp03;
	datalines;
	1 0 0 0 2
	1 6 2 0 3
	2 0 2 0 1
	2 6 1 1 2
	3 0 0 2 0
	3 6 0 1 3
;
run;
data temp;
 merge example example(firstobs=2 keep=subject pp01-pp03 rename=(subject=_subject pp01-pp03=_pp01-_pp03));
array x{*} pp01-pp03;
array y{*} _pp01-_pp03;
array z1{*} n1-n3;
array z2{*} m1-m3;

do i=1 to dim(x);
 if subject=_subject and x{i} in (0 1) and y{i} in (2 3) then z1{i}=1;
 if subject=_subject and x{i} in (2 3) and y{i} in (0 1) then z2{i}=1;
end;
drop i;
run;
proc sql;
select 'pp01' as variable,coalesce(sum(n1),0) as better,coalesce(sum(m1),0) as worse from temp
union
select 'pp02' as variable,coalesce(sum(n2),0) as better,coalesce(sum(m2),0) as worse from temp
union
select 'pp03' as variable,coalesce(sum(n3),0) as better,coalesce(sum(m3),0) as worse from temp ;
quit;

RyanSimmons · Posted 01-29-2016 11:00 AM

Thank you everyone for your help! All the answers are great. Wish I could accept them all!

Count number of changes in variables over time

Re: Count number of changes in variables over time

Re: Count number of changes in variables over time

Re: Count number of changes in variables over time

Re: Count number of changes in variables over time

Re: Count number of changes in variables over time

Re: Count number of changes in variables over time

Re: Count number of changes in variables over time

Re: Count number of changes in variables over time

Re: Count number of changes in variables over time

Re: Count number of changes in variables over time

Re: Count number of changes in variables over time

Re: Count number of changes in variables over time

Re: Count number of changes in variables over time

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away