BookmarkSubscribeRSS Feed
kstolzmann
Obsidian | Level 7

Hello all,

I have a SAS problem that is sending me down a black hole.

 

I have medication duplication data that looks like this:

data have;
	input scrssn drugname $ day1-day48;
	datalines ;
	99999999 aspirin 0 0 0 0 0 0 0 1 1 1 2 2 2 3 3 3 4 4 4 3 3 3 2 2 2 1 1 1 0 0 0 1 1 2 2 1 1 1 1 1 1 1 1 1 3 3 0 0 0
	99999999 beta 0 0 0 0 0 0 0 0 0 0 1 1 1 3 3 3 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 2 2 1 1 1 1 1 1 1 1 1 0 0 0 0 0
	;
run;

 

Each day that is >=2 should be counted as a medication duplication.  However, we only want to count a duplication where it lasts >10 (consecutive) days.  So, the only duplication should be from day 11-24 for aspirin (since it's >10).  What I ideally want is for my data to look like this, where all duplications <=10 days are set to "1":

 

data want;
	input scrssn drugname $ iday1-iday48;
	datalines ;
	99999999 aspirin 0 0 0 0 0 0 0 1 1 1 2 2 2 3 3 3 4 4 4 3 3 3 2 2 2 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
	99999999 beta 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
	;
run;

 

 

Any help/ideas would be most welcome--thank you!

Kelly

13 REPLIES 13
Tom
Super User Tom
Super User

Not sure I understand what you want.  Are looking for RUNS of non-negative values of length 10 or more?  Why did you change the values from 2 or 3 to 1 for BETA?  Why didn't you do the same for the ASPIRIN records?

 

kstolzmann
Obsidian | Level 7

Thanks--I want the data/values to be retained for duplications lasting >10 days but I want all duplications <=10 days to be set to "1" (the value for that day).  The beta duplications (as well as the additional aspirin duplications <10 days) would be set to 1.  Does that make sense? 

Tom
Super User Tom
Super User

You need to define the term 'duplicates' more fully as it is not clear what you want.

 

Why did you convert this run of 19 non-zero values from the "beta" row to all ones? Is it because only 5 in a row are greater than 1?  is it because the longest run of a single value is the last 10 days of 1's?

1 1 1 3 3 3 2 2 1 1 1 1 1 1 1 1 1 1

But not convert this run of 21 non-zero values from the "aspirin" row to all ones?  It looks to me like the longest run of any single value is only 3 days.  Are you ignoring it because there are 15 days in a row that are greater than 1?

 

1 1 1 2 2 2 3 3 3 4 4 4 3 3 3 2 2 2 1 1 1
kstolzmann
Obsidian | Level 7

Yes, I converted the run of 19 non-zero values from the "beta" row to all ones  because only 5 in a row are greater than 1.  Basically I want to retain the values of the duplication (so, keeping the 2, 3, 4, etc. values as in the aspirin example) but only if the duplication (a consecutive string of values >=2) are more than 10 days; otherwise I would ideally like to set the values of >=2 to 1. 

Tom
Super User Tom
Super User

Easier to work with it in tall skinny format. You can use PROC TRANSPOSE or other methods to convert.  Make a new variable DUPLICATE to indicate if the count is greater than 1 for this record. 

 

data have;
  input scrssn drugname $ @ ;
  do day=1 to 48 ;
      input count @;
      duplicate = (count > 1);
      output;
  end;
cards;
99999999 aspirin 0 0 0 0 0 0 0 1 1 1 2 2 2 3 3 3 4 4 4 3 3 3 2 2 2 1 1 1 0 0 0 1 1 2 2 1 1 1 1 1 1 1 1 1 3 3 0 0 0
99999999 beta 0 0 0 0 0 0 0 0 0 0 1 1 1 3 3 3 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 2 2 1 1 1 1 1 1 1 1 1 0 0 0 0 0
;

Then you could use a double DOW loop over the runs.  First will count the run length. Second can apply your transformation rule.

data want ;
  do until (last.duplicate);
    set have ;
    by scrssn drugname duplicate notsorted;
    runlength=sum(runlength,duplicate);
  end;
  do until (last.duplicate);
    set have ;
    by scrssn drugname duplicate notsorted;
    if runlength < 10 then count=min(count,1);
    output;
  end;  
run;

 

Reeza
Super User

Do you only have 48 days to work with?

You need to provide more information on your data and the issues you're facing.

 

There's a term called 'medicine cabinet' when looking at drug analysis. 

Usually you break it out to each day and what drugs would be in a persons 'cabinet' and then do analysis from that point on. Typically, the structure is a long format rather than wide, which makes it easier to do analysis as to how many drugs per day and durations of drugs.

kstolzmann
Obsidian | Level 7

Yes, thanks, I have more data than 48 days but wanted to simplify for the example.  Previously I had counted a duplication as any day >=2 (and I go on to flip the data to wide and identify episodes, etc.) but I need to go back now and eliminate any "duplication" <=10 days.  I was trying to preserve the dataset so I could use my old code (but eliminate any string of days of duplication <=10 days and set the value of each such days to 1).

Reeza
Super User

You haven't identified how you identified episodes >=2? 

Perhaps that code can be modified to fit the >=10 situation. 

 

We still don't have enough information to provide any answers to you.

kstolzmann
Obsidian | Level 7

Thanks, yes, maybe? My starting dataset is dispensed med-level data and I use an array to create the day1-day48 using "dispensed date" and "days supply".  Then I use proc means to sum the days (day1-day48) by ssn and drugname to get a count of the number of drugs on each day (so any day >=2 would be a medication duplication).  Except now the problem is I don't want to count it as a duplication unless there are more than 10 consecutive days (i.e. more than 10 days of >=2 when looking at the summed day1-day48 data--the "have" dataset).  There could be multiple duplication episodes within the same drug (as in the aspirin example) but the only one that would "count" (now) would be that one episode lasting >10 days.  

Astounding
PROC Star

Since the changes all involve the data values that comprise runs of shorter than 10 days, it's probably a good approach to identify those:

 

data want;

set have;

array day {48};

do k=1 to 48;

   if day{k} > 1 then do;

      if begin_at=. then begin_at = k;

      end_at=k;

      if k=48 and (end_at - begin_at <= 10) then do j=begin_at to end_at;

         day{j}=1;

      end;

   end;

   else if begin_at > . then do;

      if (end_at - begin_at) <= 10 then do j=begin_at to end_at;

         day{j}=1;

      end;

      begin_at=.;

      end_at=.;

   end;

end;

drop j k begin_at end_at;

run;

 

It's untested, but looks like it should work.

Haikuo
Onyx | Level 15

Here is one try,

 

data have;
	input scrssn drugname $ day1-day48;
	datalines;
	99999999 aspirin 0 0 0 0 0 0 0 1 1 1 2 2 2 3 3 3 4 4 4 3 3 3 2 2 2 1 1 1 0 0 0 1 1 2 2 1 1 1 1 1 1 1 1 1 3 3 0 0 0
	99999999 beta 0 0 0 0 0 0 0 0 0 0 1 1 1 3 3 3 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 2 2 1 1 1 1 1 1 1 1 1 0 0 0 0 0
	;
run;

data want;
	set have;
	array d(48) day:;
	do i=1 to dim(d);
		if d(i)>=2 then
			flag+1;
		else
			do;
				if 0<flag<=10 then
					do j=i-flag to i-1;
						d(j)=1;
					end;

				flag=0;
			end;
	end;

	drop i j flag;
run;
kstolzmann
Obsidian | Level 7

Thank you (all)!  This is exactly what I was hoping to do (but didn't know how to do).  Thanks for bearing with me through my muddled question asking, too.

Ksharp
Super User
Similar with Tom 's code.


data have;
	input scrssn drugname $ day1-day48;
	datalines ;
	99999999 aspirin 0 0 0 0 0 0 0 1 1 1 2 2 2 3 3 3 4 4 4 3 3 3 2 2 2 1 1 1 0 0 0 1 1 2 2 1 1 1 1 1 1 1 1 1 3 3 0 0 0
	99999999 beta 0 0 0 0 0 0 0 0 0 0 1 1 1 3 3 3 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 2 2 1 1 1 1 1 1 1 1 1 0 0 0 0 0
	;
run;
proc transpose data=have out=temp;
by scrssn drugname;
var day:;
run;
data temp;
 set temp;
 flag=(col1 ge 2);
run;
data temp1;
n=0;
 do until(last.flag);
  set temp;
  by scrssn drugname flag notsorted;
  n+1;
 end;
 _flag=flag;
 do until(last.flag);
  set temp;
  by scrssn drugname flag notsorted;
  if _flag=1 then do;
     if n gt 10 then do;date=col1;output;end;
      else do;date=flag;output;end;
  end;
  else do;date=col1;output;end;
 end;
run;
proc transpose data=temp1 out=want;
by scrssn drugname;
var date;
id _name_;
run;


sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1830 views
  • 4 likes
  • 6 in conversation