DATA Step, Macro, Functions and more

count consecutive variables, with replacement

Reply
Occasional Contributor
Posts: 19

count consecutive variables, with replacement

Hello all,

I have a SAS problem that is sending me down a black hole.

 

I have medication duplication data that looks like this:

data have;
	input scrssn drugname $ day1-day48;
	datalines ;
	99999999 aspirin 0 0 0 0 0 0 0 1 1 1 2 2 2 3 3 3 4 4 4 3 3 3 2 2 2 1 1 1 0 0 0 1 1 2 2 1 1 1 1 1 1 1 1 1 3 3 0 0 0
	99999999 beta 0 0 0 0 0 0 0 0 0 0 1 1 1 3 3 3 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 2 2 1 1 1 1 1 1 1 1 1 0 0 0 0 0
	;
run;

 

Each day that is >=2 should be counted as a medication duplication.  However, we only want to count a duplication where it lasts >10 (consecutive) days.  So, the only duplication should be from day 11-24 for aspirin (since it's >10).  What I ideally want is for my data to look like this, where all duplications <=10 days are set to "1":

 

data want;
	input scrssn drugname $ iday1-iday48;
	datalines ;
	99999999 aspirin 0 0 0 0 0 0 0 1 1 1 2 2 2 3 3 3 4 4 4 3 3 3 2 2 2 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
	99999999 beta 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
	;
run;

 

 

Any help/ideas would be most welcome--thank you!

Kelly

Super User
Super User
Posts: 6,502

Re: count consecutive variables, with replacement

Not sure I understand what you want.  Are looking for RUNS of non-negative values of length 10 or more?  Why did you change the values from 2 or 3 to 1 for BETA?  Why didn't you do the same for the ASPIRIN records?

 

Occasional Contributor
Posts: 19

Re: count consecutive variables, with replacement

Thanks--I want the data/values to be retained for duplications lasting >10 days but I want all duplications <=10 days to be set to "1" (the value for that day).  The beta duplications (as well as the additional aspirin duplications <10 days) would be set to 1.  Does that make sense? 

Super User
Super User
Posts: 6,502

Re: count consecutive variables, with replacement

You need to define the term 'duplicates' more fully as it is not clear what you want.

 

Why did you convert this run of 19 non-zero values from the "beta" row to all ones? Is it because only 5 in a row are greater than 1?  is it because the longest run of a single value is the last 10 days of 1's?

1 1 1 3 3 3 2 2 1 1 1 1 1 1 1 1 1 1

But not convert this run of 21 non-zero values from the "aspirin" row to all ones?  It looks to me like the longest run of any single value is only 3 days.  Are you ignoring it because there are 15 days in a row that are greater than 1?

 

1 1 1 2 2 2 3 3 3 4 4 4 3 3 3 2 2 2 1 1 1
Occasional Contributor
Posts: 19

Re: count consecutive variables, with replacement

Yes, I converted the run of 19 non-zero values from the "beta" row to all ones  because only 5 in a row are greater than 1.  Basically I want to retain the values of the duplication (so, keeping the 2, 3, 4, etc. values as in the aspirin example) but only if the duplication (a consecutive string of values >=2) are more than 10 days; otherwise I would ideally like to set the values of >=2 to 1. 

Super User
Super User
Posts: 6,502

Re: count consecutive variables, with replacement

[ Edited ]

Easier to work with it in tall skinny format. You can use PROC TRANSPOSE or other methods to convert.  Make a new variable DUPLICATE to indicate if the count is greater than 1 for this record. 

 

data have;
  input scrssn drugname $ @ ;
  do day=1 to 48 ;
      input count @;
      duplicate = (count > 1);
      output;
  end;
cards;
99999999 aspirin 0 0 0 0 0 0 0 1 1 1 2 2 2 3 3 3 4 4 4 3 3 3 2 2 2 1 1 1 0 0 0 1 1 2 2 1 1 1 1 1 1 1 1 1 3 3 0 0 0
99999999 beta 0 0 0 0 0 0 0 0 0 0 1 1 1 3 3 3 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 2 2 1 1 1 1 1 1 1 1 1 0 0 0 0 0
;

Then you could use a double DOW loop over the runs.  First will count the run length. Second can apply your transformation rule.

data want ;
  do until (last.duplicate);
    set have ;
    by scrssn drugname duplicate notsorted;
    runlength=sum(runlength,duplicate);
  end;
  do until (last.duplicate);
    set have ;
    by scrssn drugname duplicate notsorted;
    if runlength < 10 then count=min(count,1);
    output;
  end;  
run;

 

Super User
Posts: 17,936

Re: count consecutive variables, with replacement

Do you only have 48 days to work with?

You need to provide more information on your data and the issues you're facing.

 

There's a term called 'medicine cabinet' when looking at drug analysis. 

Usually you break it out to each day and what drugs would be in a persons 'cabinet' and then do analysis from that point on. Typically, the structure is a long format rather than wide, which makes it easier to do analysis as to how many drugs per day and durations of drugs.

Occasional Contributor
Posts: 19

Re: count consecutive variables, with replacement

Yes, thanks, I have more data than 48 days but wanted to simplify for the example.  Previously I had counted a duplication as any day >=2 (and I go on to flip the data to wide and identify episodes, etc.) but I need to go back now and eliminate any "duplication" <=10 days.  I was trying to preserve the dataset so I could use my old code (but eliminate any string of days of duplication <=10 days and set the value of each such days to 1).

Super User
Posts: 17,936

Re: count consecutive variables, with replacement

You haven't identified how you identified episodes >=2? 

Perhaps that code can be modified to fit the >=10 situation. 

 

We still don't have enough information to provide any answers to you.

Occasional Contributor
Posts: 19

Re: count consecutive variables, with replacement

Thanks, yes, maybe? My starting dataset is dispensed med-level data and I use an array to create the day1-day48 using "dispensed date" and "days supply".  Then I use proc means to sum the days (day1-day48) by ssn and drugname to get a count of the number of drugs on each day (so any day >=2 would be a medication duplication).  Except now the problem is I don't want to count it as a duplication unless there are more than 10 consecutive days (i.e. more than 10 days of >=2 when looking at the summed day1-day48 data--the "have" dataset).  There could be multiple duplication episodes within the same drug (as in the aspirin example) but the only one that would "count" (now) would be that one episode lasting >10 days.  

Super User
Posts: 5,096

Re: count consecutive variables, with replacement

Since the changes all involve the data values that comprise runs of shorter than 10 days, it's probably a good approach to identify those:

 

data want;

set have;

array day {48};

do k=1 to 48;

   if day{k} > 1 then do;

      if begin_at=. then begin_at = k;

      end_at=k;

      if k=48 and (end_at - begin_at <= 10) then do j=begin_at to end_at;

         day{j}=1;

      end;

   end;

   else if begin_at > . then do;

      if (end_at - begin_at) <= 10 then do j=begin_at to end_at;

         day{j}=1;

      end;

      begin_at=.;

      end_at=.;

   end;

end;

drop j k begin_at end_at;

run;

 

It's untested, but looks like it should work.

Respected Advisor
Posts: 3,124

Re: count consecutive variables, with replacement

Here is one try,

 

data have;
	input scrssn drugname $ day1-day48;
	datalines;
	99999999 aspirin 0 0 0 0 0 0 0 1 1 1 2 2 2 3 3 3 4 4 4 3 3 3 2 2 2 1 1 1 0 0 0 1 1 2 2 1 1 1 1 1 1 1 1 1 3 3 0 0 0
	99999999 beta 0 0 0 0 0 0 0 0 0 0 1 1 1 3 3 3 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 2 2 1 1 1 1 1 1 1 1 1 0 0 0 0 0
	;
run;

data want;
	set have;
	array d(48) day:;
	do i=1 to dim(d);
		if d(i)>=2 then
			flag+1;
		else
			do;
				if 0<flag<=10 then
					do j=i-flag to i-1;
						d(j)=1;
					end;

				flag=0;
			end;
	end;

	drop i j flag;
run;
Occasional Contributor
Posts: 19

Re: count consecutive variables, with replacement

Thank you (all)!  This is exactly what I was hoping to do (but didn't know how to do).  Thanks for bearing with me through my muddled question asking, too.

Super User
Posts: 9,691

Re: count consecutive variables, with replacement

Similar with Tom 's code.


data have;
	input scrssn drugname $ day1-day48;
	datalines ;
	99999999 aspirin 0 0 0 0 0 0 0 1 1 1 2 2 2 3 3 3 4 4 4 3 3 3 2 2 2 1 1 1 0 0 0 1 1 2 2 1 1 1 1 1 1 1 1 1 3 3 0 0 0
	99999999 beta 0 0 0 0 0 0 0 0 0 0 1 1 1 3 3 3 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 2 2 1 1 1 1 1 1 1 1 1 0 0 0 0 0
	;
run;
proc transpose data=have out=temp;
by scrssn drugname;
var day:;
run;
data temp;
 set temp;
 flag=(col1 ge 2);
run;
data temp1;
n=0;
 do until(last.flag);
  set temp;
  by scrssn drugname flag notsorted;
  n+1;
 end;
 _flag=flag;
 do until(last.flag);
  set temp;
  by scrssn drugname flag notsorted;
  if _flag=1 then do;
     if n gt 10 then do;date=col1;output;end;
      else do;date=flag;output;end;
  end;
  else do;date=col1;output;end;
 end;
run;
proc transpose data=temp1 out=want;
by scrssn drugname;
var date;
id _name_;
run;


Ask a Question
Discussion stats
  • 13 replies
  • 415 views
  • 4 likes
  • 6 in conversation