BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Raza_M
Obsidian | Level 7

Hi,

I need help with a code. I am working with medication purchase data. Variables in the dataset are id, year, week number, medication type 1 (med1), medication type 2 (med2), and medication type 3 (med3). A purchase is considered a case if a person has not bought any type of medication 24 weeks before the current purchase. For example for med1 to be a case no other type of medication should be bought 24 weeks before the purchase and if there is a purchase in either of the medication types then it is not a case. I need to create these cases for each medication type in separate variables while considering the washout period.  However, after fulfilling the washout requirement if different types of medications are bought in the same week then it should be considered a case for each type. 

Here is a sample dataset 

data meds;
input id $ year $ weeknumber $ med1 $ med2 $ med3 ;
datalines;
001 2008 6 1 0 0
001 2008 8 1 0 0
001 2008 15 1 0 0
001 2016 14 1 0 0
001 2016 14 0 0 1
001 2016 23 1 0 0
001 2016 42 1 0 0
001 2016 42 1 0 0
001 2016 42 1 0 0
001 2016 42 1 0 0
001 2016 42 1 0 0
001 2017 17 0 1 0
001 2017 17 0 0 1
001 2017 17 1 0 0
001 2017 21 1 0 0
001 2017 38 0 0 1
001 2019 1  1 0 0
001 2019 5  1 0 0
001 2019 5  0 1 0
002 2006 7 0 1 0
002 .    . 0 0 0
003 2015 50 1 0 0
003 2015 50 0 0 1
003 2016 1  1 0 0
003 2016 4  1 0 0
003 2016 4  0 0 1
004 2006 51 0 0 1
004 2008 3 0 0 1
004 2008 52 0 0 1
004 2009 24 0 0 1
005 2001 13 0 0 1
005 2013 37 0 1 0
005 2014 3 0 1 0
005 2014 8 0 1 0
006 2008 5 1 0 0
006 2008 18 1 0 0
006 2008 20 0 0 1
;

For a coding person, it might be an easy task but quite difficult for me.

Can you kindly help and write a code? 

 

I am happy to answer if something is not clear.

 

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
Quentin
Super User

You didn't show the output data you want, but this might be close.

I removed the row with missing year and weeknumber, and read the variables in as numeric rather than character.

 

data meds;
input id $ year  weeknumber  med1  med2  med3 ;
datalines;
001 2008 6 1 0 0
001 2008 8 1 0 0
001 2008 15 1 0 0
001 2016 14 1 0 0
001 2016 14 0 0 1
001 2016 23 1 0 0
001 2016 42 1 0 0
001 2016 42 1 0 0
001 2016 42 1 0 0
001 2016 42 1 0 0
001 2016 42 1 0 0
001 2017 17 0 1 0
001 2017 17 0 0 1
001 2017 17 1 0 0
001 2017 21 1 0 0
001 2017 38 0 0 1
001 2019 1  1 0 0
001 2019 5  1 0 0
001 2019 5  0 1 0
002 2006 7 0 1 0
002 .    . 0 0 0
003 2015 50 1 0 0
003 2015 50 0 0 1
003 2016 1  1 0 0
003 2016 4  1 0 0
003 2016 4  0 0 1
004 2006 51 0 0 1
004 2008 3 0 0 1
004 2008 52 0 0 1
004 2009 24 0 0 1
005 2001 13 0 0 1
005 2013 37 0 1 0
005 2014 3 0 1 0
005 2014 8 0 1 0
006 2008 5 1 0 0
006 2008 18 1 0 0
006 2008 20 0 0 1
;

data want ;
  set meds(where=(nmiss(year,weeknumber)=0) ) ;
  by id year weeknumber ;

  array meds{*} med1-med3 ;
  array cases{*} casemed1-casemed3 ;

  *make a date variable ;
  date=input(cats(year,'W',put(weeknumber,z2.)), weekw7.);

  retain LastMed ;

  if first.id then LastMed="01Jan1960"d ;

  if sum(of meds{*}) > 0 then do ;
    if intck('week',LastMed,date)>=24 then do i=1 to dim(meds) ;
      cases{i}=meds{i} ;
    end ;
    else do i=1 to dim(meds) ;
      cases{i}=0 ;
    end ;

    LastMed=date ;
  end ;
  else do i=1 to dim(meds) ;
    cases{i}=0 ;
  end ;

  drop i lastmed;
  format date weekw11.;
run ;

proc print ;
run ;
BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

View solution in original post

7 REPLIES 7
ballardw
Super User

Do you have data with actual DATES?

SAS has functions that can determine intervals between dates which is going to be a lot easier than trying to manually cobble something together that works across calendar year boundaries with that year / week structure.

Why do you bother to include this repeated information 5 times? From your description it wouldn't have any affect on the outcome if only one record were included.

001 2016 42 1 0 0
001 2016 42 1 0 0
001 2016 42 1 0 0
001 2016 42 1 0 0
001 2016 42 1 0 0

And instead of

001 2016 14 1 0 0
001 2016 14 0 0 1

why not have as a single record?

001 2016 14 1 0 1

 

What should the resulting data set look like? I have no idea what you expect the output variables should look like either name or value.

sbxkoenk
SAS Super FREQ

No time to write the full code now , but here is way to get a proper SAS date.

proc sort data=meds;
 by id year weeknumber;
run;

data have;
 set meds;
 weeknummer=input(weeknumber,2.);
 if strip(year) NE '' and strip(weeknumber) NE '' then do;
  datum=input(put(year,4.)!!'W'!!put(weeknummer,z2.),weekw7.);
  end;
 else datum=.;
*format datum date9.;
 format datum weekw11.;
run;

Note : I have used weekw. informat.
Of course the weekv. or weeku. informat might be better suited (although that does not influence the number of weeks in between two 'datum'-s of course).

 

Use the INTCK function to return the number of interval boundaries of a given kind ('WEEK') that lie between two dates.

 

BR, Koen

Raza_M
Obsidian | Level 7

Thank you for the reply!

I dont have dates in my data because of privacy issues. Only have a week number when a medication was purchased. 

Regarding repeated measurements, this is from register I got it like that. These are multiple purchases in a week, may be purchased on different dates.

Quentin
Super User

You didn't show the output data you want, but this might be close.

I removed the row with missing year and weeknumber, and read the variables in as numeric rather than character.

 

data meds;
input id $ year  weeknumber  med1  med2  med3 ;
datalines;
001 2008 6 1 0 0
001 2008 8 1 0 0
001 2008 15 1 0 0
001 2016 14 1 0 0
001 2016 14 0 0 1
001 2016 23 1 0 0
001 2016 42 1 0 0
001 2016 42 1 0 0
001 2016 42 1 0 0
001 2016 42 1 0 0
001 2016 42 1 0 0
001 2017 17 0 1 0
001 2017 17 0 0 1
001 2017 17 1 0 0
001 2017 21 1 0 0
001 2017 38 0 0 1
001 2019 1  1 0 0
001 2019 5  1 0 0
001 2019 5  0 1 0
002 2006 7 0 1 0
002 .    . 0 0 0
003 2015 50 1 0 0
003 2015 50 0 0 1
003 2016 1  1 0 0
003 2016 4  1 0 0
003 2016 4  0 0 1
004 2006 51 0 0 1
004 2008 3 0 0 1
004 2008 52 0 0 1
004 2009 24 0 0 1
005 2001 13 0 0 1
005 2013 37 0 1 0
005 2014 3 0 1 0
005 2014 8 0 1 0
006 2008 5 1 0 0
006 2008 18 1 0 0
006 2008 20 0 0 1
;

data want ;
  set meds(where=(nmiss(year,weeknumber)=0) ) ;
  by id year weeknumber ;

  array meds{*} med1-med3 ;
  array cases{*} casemed1-casemed3 ;

  *make a date variable ;
  date=input(cats(year,'W',put(weeknumber,z2.)), weekw7.);

  retain LastMed ;

  if first.id then LastMed="01Jan1960"d ;

  if sum(of meds{*}) > 0 then do ;
    if intck('week',LastMed,date)>=24 then do i=1 to dim(meds) ;
      cases{i}=meds{i} ;
    end ;
    else do i=1 to dim(meds) ;
      cases{i}=0 ;
    end ;

    LastMed=date ;
  end ;
  else do i=1 to dim(meds) ;
    cases{i}=0 ;
  end ;

  drop i lastmed;
  format date weekw11.;
run ;

proc print ;
run ;
BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
Raza_M
Obsidian | Level 7
data have;
input id $ year weeknumber med1 med2 med3 casemed1 casemed2 casemed3 date;
datalines;
001 2008 6 1 0 0 1 0 0 2008-W06-01
001 2008 8 1 0 0 0 0 0 2008-W08-01
001 2008 15 1 0 0 0 0 0 2008-W15-01
001 2016 14 1 0 0 1 0 0 2016-W14-01
001 2016 14 0 0 1 0 0 0 2016-W14-01
001 2016 23 1 0 0 0 0 0 2016-W23-01
001 2016 42 1 0 0 0 0 0 2016-W42-01
001 2016 42 1 0 0 0 0 0 2016-W42-01
001 2016 42 1 0 0 0 0 0 2016-W42-01
001 2016 42 1 0 0 0 0 0 2016-W42-01
001 2016 42 1 0 0 0 0 0 2016-W42-01
001 2017 17 0 1 0 0 1 0 2017-W17-01
001 2017 17 0 0 1 0 0 0 2017-W17-01
001 2017 17 1 0 0 0 0 0 2017-W17-01
001 2017 21 1 0 0 0 0 0 2017-W21-01
001 2017 38 0 0 1 0 0 0 2017-W38-01
001 2019 1 1 0 0 1 0 0 2019-W01-01
001 2019 5 1 0 0 0 0 0 2019-W05-01
001 2019 5 0 1 0 0 0 0 2019-W01-01
002 2006 7 0 1 0 0 1 0 2006-W07-01
003 2015 50 1 0 0 1 0 0 2015-W50-01
003 2015 50 0 0 1 0 0 0 2015-W50-01
003 2016 1 1 0 0 0 0 0 2016-W01-01
003 2016 4 1 0 0 0 0 0 2016-W04-01
003 2016 4 0 0 1 0 0 0 2016-W04-01
004 2006 51 0 0 1 0 0 1 2006-W51-01
004 2008 3 0 0 1 0 0 1 2008-W03-01
004 2008 52 0 0 1 0 0 1 2008-W52-01
004 2009 24 0 0 1 0 0 1 2009-W24-01
005 2001 13 0 0 1 0 0 1 2001-W13-01
005 2013 37 0 1 0 0 1 0 2013-W37-01
005 2014 3 0 1 0 0 0 0 2014-W03-01
005 2014 8 0 1 0 0 0 0 2014-W08-01
006 2008 5 1 0 0 1 0 0 2008-W05-01
006 2008 18 1 0 0 0 0 0 2008-W18-01
006 2008 20 0 0 1 0 0 0 2008-W20-01
;

After applying Quentin's code, I have the above dataset. My goal is to bring my data in a case-crossover format where cases serve as their own controls. However, I need to clean the data first. Using the Quentin's code I am able to identify medication purchase as a case after full filling 24 weeks wash out period.
In the next step, I need to identify first medication purchase per person. This should be separate for each medication type. Variables to use to identify first purchase for each type separately are casemed1, casemed2, and casemed3. This first purchase of each medication type is the final case.
In the final step, I need to select a control for each case. Control is the same week as the first purchase week but one year earlier. To be clear, id 001's first purchase of med1 was in week 6 year 2008. Control for this is id 001 in week 6 in the year 2007. The tricky part here is 24 weeks washout period. Washout period of 24 weeks also applies on controls. if there was a purchase in the control week or 24 weeks before the control week of any medication type (med1 or med2 or med3) than the control is invalid and so is the case.
For a case (first purchase as case) to be valid there should not be any medication type bought in the control week or 24 weeks before.

The final data (supposed, not extracted from the above) should somewhat look like;

 

data final;
input id $ year weeknumber case firt_purchasemed1 firt_purchasemed2 firt_purchasemed3 ;
datalines;
001 2008 6 1 1 0 0
001 2007 6 0 1 0 0
001 2017 17 1 0 1 0
001 2016 17 0 0 1 0
002 2006 7 1 0 1 0
002 2005 7 0 0 1 0
004 2006 51 1 0 0 1
004 2006 51 0 0 0 1
005 2001 13 1 0 0 1
005 2000 13 0 0 0 1
005 2013 37 1 0 1 0
005 2012 37 0 0 1 0
006 2008 5 1 1 0 0
006 2007 5 0 1 0 0
run;
Quentin
Super User

Hi,

I found a mistake in my first solution.  I didn't account for the fact that you have multiple rows of data with the same date.

 

I took a new approach below, but I don't really like this code much, so hopefully someone else can help.  And I haven't fully tested it either.  So I wouldn't trust this, but hopefully it helps you think about one way you could maybe do this.

 

I start with your starting data and transpose it into a long format:

 

data meds;
input id $ year  weeknumber  med1  med2  med3 ;
datalines;
001 2008 6 1 0 0
001 2008 8 1 0 0
001 2008 15 1 0 0
001 2016 14 1 0 0
001 2016 14 0 0 1
001 2016 23 1 0 0
001 2016 42 1 0 0
001 2016 42 1 0 0
001 2016 42 1 0 0
001 2016 42 1 0 0
001 2016 42 1 0 0
001 2017 17 0 1 0
001 2017 17 0 0 1
001 2017 17 1 0 0
001 2017 21 1 0 0
001 2017 38 0 0 1
001 2019 1  1 0 0
001 2019 5  1 0 0
001 2019 5  0 1 0
002 2006 7 0 1 0
002 .    . 0 0 0
003 2015 50 1 0 0
003 2015 50 0 0 1
003 2016 1  1 0 0
003 2016 4  1 0 0
003 2016 4  0 0 1
004 2006 51 0 0 1
004 2008 3 0 0 1
004 2008 52 0 0 1
004 2009 24 0 0 1
005 2001 13 0 0 1
005 2013 37 0 1 0
005 2014 3 0 1 0
005 2014 8 0 1 0
006 2008 5 1 0 0
006 2008 18 1 0 0
006 2008 20 0 0 1
;

data medslong (keep=id date MedID Med);
  set meds(keep=id year weeknumber med1  med2  med3
           where=(nmiss(year,weeknumber)=0) 
           ) ;
  by id year weeknumber ;

  date=input(cats(year,'W',put(weeknumber,z2.)), weekw7.);

  array meds{*} med1 med2 med3 ;

  do MedID=1 to dim(meds) ;
    Med=meds{MedID} ;
    output ;
  end ;

  format date weekw11.;
run ;

Then I find all cases.  This should match the results from my earlier solution, but it doesn't because of my mistake in the earlier solution.

*find the Cases.  A case means used a medication, if they used no medications for the 24 weeks prior ;

proc sort data=medslong ;
  by id date med ;
run ;

data allcases ;
  set medslong ;
  by id date med;
  retain LastMed ;
  if first.id then LastMed="01Jan1960"d ;
  if Med=1 then do ;
    if intck('week',LastMed,date)>=24 then Case=1 ;
    else Case=0 ;   
    if last.date then LastMed=date ;
  end ;
  else Case=0 ;
  drop lastmed ;
run ;

Now find the first case per id for each medication, these are the cases you want:

proc sort data=allcases ;
  by id medid descending case date;
run ;

data firstcase ;
  set allcases ;
  by id medid descending case date;
  if first.medid and case=1 ;
run ;

Now read in the cases, and for each case, read through the entire medslong dataset to see if there were any medications taken 52 to 76 weeks before.  If there were not, then you have a control record.

data CaseControls (keep=id medid firstcasedate controldate);
  set firstcase (keep=id medid date rename=(date=firstcasedate)) ;
  do point=1 to nobs ;
    set medslong (keep=id date med rename=(id=_id)) nobs=nobs point=point;
    if (_id=id) and (med=1) and ( -76 <= intck('week',firstcasedate,date) <= -52) then _HaveMedInWash=1 ;
  end ;
  if NOT _HaveMedInWash then do ;
    ControlDate=intnx('week',firstcasedate,-52) ;
    output ;
  end ;
  format controldate weekw11.;
run ;

when I run that, I get CaseControls similar to what you want:

                        Med
id     firstcasedate     ID    ControlDate

001     2008-W06-01      1     2007-W06-07
001     2016-W14-01      3     2015-W13-07
002     2006-W07-01      2     2005-W06-07
003     2015-W50-01      1     2014-W49-07
003     2015-W50-01      3     2014-W49-07
004     2006-W51-01      3     2005-W50-07
005     2013-W37-01      2     2012-W37-07
005     2001-W13-01      3     2000-W12-07
006     2008-W05-01      1     2007-W05-07

Again, I don't like my code much, and there may be some off-by-one errors , but I hope the approach might be useful.  And hopefully someone else will share better ideas.  

I suggest you unmark my first answer as correct.

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
Raza_M
Obsidian | Level 7

Thanks a lot for the code!

I highly appreciate that you took time you write it. I have tested it on small dataset and it seems to work fine. I need to run it on the whole dataset. I am hopeful that it will work fine.

Thank you!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 711 views
  • 1 like
  • 4 in conversation