Re: Count all instances of flips in a binary variable

tuckjosh · Posted 07-30-2021 01:50 PM

Hi there!

I'm trying to count the number of times coverage for various drugs (all of which have 0s for 'not covered' and 1 for 'covered) over time in each state (which its own categorical variable). I also have two date variables (mmddyyyy). Each date represents a new entry for coverage status for each drug in each state. There is no pattern to the entries for each state; instead, every time a drug's coverage changes, a new entry is made. There could be an entry on 10/01/2020 on which Montana's Medicaid covers say, alprazolam (i.e., alprazolman_var=1), but on 10/21/2020, alprazolam_var=0. This happens for each drug in the dataset and for each state. I'm just trying to count (1) the number of 'additions,' (i.e., drug_var=1 after drug_var was 0 in a previous entry for the state) and (2) subtractions (i.e., drug_var=0 after drug_var was 1 in a previous entry for the state) over time. I've tried using a lag function. I'd really like to be able to code for this rather than count manually!

J. E. Tucker, MPH (he.him.his)

Reeza · Posted 07-30-2021 02:05 PM

Depends a bit on your data structure - please show a sample of your data. For example, is it structured as:

ID Drug Date Flag
1 Drug1 20140104 1
1 Drug1 20140403 0

Or you could have multiple variables for each flag?

If you want to count switches use the, first sort by DRUG, DATE.

Use the NOTSORTED option on the BY statement in a data step and include the FLAG in your BY statement in the data step. Every time you hit the first.FLAG in the group you'll have a switch so increment that and output it at the last record.

If you want a code example please provide a data example.

ballardw · Posted 07-30-2021 02:08 PM

One suspects that a State variable should be included and probably the first sort order variable.

Astounding · Posted 07-30-2021 04:25 PM

Also please confirm ... how should we handle the first observation for each drug? Does it count as a switch, or should it be ignored.

tuckjosh · Posted 07-30-2021 04:27 PM

It should be ignored.

J. E. Tucker, MPH (he.him.his)

mkeintz · Posted 07-30-2021 05:12 PM

Unless you want only generalized answers, show a sample of your data, in the form of a working data step. I'm especially curious about your statement that you have two date variables. What would be the purpose of the second data variable? And show what you want the results to look like.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

tuckjosh · Posted 07-30-2021 05:14 PM

I definitely will on Monday. I won't forget!

J. E. Tucker, MPH (he.him.his)

tuckjosh · Posted 08-03-2021 05:12 PM

data have;
input State Edate Vthrudate MATBupie MATMeth MATNal;
datalines;
Alabama 08012017 09302017 0 0 1
Alabama 10012017 10012017 1 0 1
Alabama 10022017 01012020 1 0 0
Alaska 09162018 01012019 1 0 0
Alaska 01022019 10032019 1 1 1
Alaska 10042019 01012020 1 0 1
Alaska 01022019 07032020 1 1 1
Michigan 11032020 11042020 1 1 0
Michigan 11052020 04172021 0 1 1
;

run;

What I'm looking for is a way to count up the number of times a drug's (MATBupie - MATNal) coverage is added (i.e., 0->1) or removed (i.e., 1->0) for each drug per day according to the date of entry. For example, I'm looking to track the frequency in which MATMeth's coverage in Alaska goes from 0 to 1 according to the date of entry. I'm going to calculate something like an incidence rate for coverage change. Does this all make sense? It's not as simple as counting the 0s and 1s at the most recent entry date; doing so will provide me only a cross-section of coverage rather than a method of prediction, per se. Furthermore, I need a way to do all this for drugs as individuals; not a blanket indication across all drugs per entry per state.

J. E. Tucker, MPH (he.him.his)

mkeintz · Posted 08-03-2021 07:03 PM

I think that you want the number of changes per state, which would mean you want an output dataset, or report, with one observation, or one line, per state.

Your data step is not a working data step, so the program below is untested:

proc sort data=have;
  by state edate;
run;

data want (keep=state flip:);
  set have;
  by state;

  array drug    matbupie      metmeth      matnal;
  array flips   flip_matbupie flip_matmeth flip_matnal;

  if first.state then do over flips;
    flips=0;
  end;

  do over drug;
    flips+ (first.state=0 and (dif(drug) in (1,-1)));
  end;

  if last.state;
run;

The DIF function is defined as DIF(x) = X - LAG(x), which for your sequence of dummy variables will return a 1, 0, or -1. (Except for the first observation where LAG(x) is missing, so the first DIF(x) is also missing).

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

tuckjosh · Posted 08-04-2021 04:03 PM

Question: for Connecticut, the output data has a value of 2 for the effective date (1/1/2020). Does this mean that there were two total flips, regardless of direction, over the course of data collection? Or is the code erroneously summing two '1s'? Additionally, when I in (1, 0, -1), the count for each drug flip goes much higher. Why is that?

J. E. Tucker, MPH (he.him.his)

tuckjosh · Posted 08-09-2021 04:17 PM

@mkeintz Following up on this.

J. E. Tucker, MPH (he.him.his)

mkeintz · Posted 08-10-2021 01:03 AM

I see no sample data for Connecticut, so I have no way of answering your question.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Tom · Posted 08-11-2021 12:44 PM

So you want one number per state per drug?

First let's fix your data step to create an actual SAS dataset.

data have;
  input State :$20. Edate :mmddyy. Vthrudate :mmddyy. MATBupie MATMeth MATNal;
  format edate vthrudate yymmdd10.;
datalines;
Alabama 08012017 09302017 0 0 1
Alabama 10012017 10012017 1 0 1
Alabama 10022017 01012020 1 0 0
Alaska 09162018 01012019 1 0 0
Alaska 01022019 10032019 1 1 1
Alaska 10042019 01012020 1 0 1
Alaska 01022019 07032020 1 1 1
Michigan 11032020 11042020 1 1 0
Michigan 11052020 04172021 0 1 1
;

Then let's transpose it so that the drug name gets out of the NAME of the variable into the VALUE of a variable instead.

And then re-order the results.

proc transpose data=have out=tall name=drug;
  by state edate vthrudate notsorted;
  var mat: ;
run;

proc sort data=tall;
  by state drug edate ;
run;

Now we can use the FIRST flag to count how many times the value changes per state per drug.

data want;
  do until (last.drug);
    set tall;
    by state drug col1 notsorted;
    switch=sum(switch,first.col1 and not first.drug);
  end;
  keep state drug switch ;
run;

Results:

Obs     State        drug      switch

 1     Alabama     MATBupie       1
 2     Alabama     MATMeth        0
 3     Alabama     MATNal         1
 4     Alaska      MATBupie       0
 5     Alaska      MATMeth        2
 6     Alaska      MATNal         1
 7     Michigan    MATBupie       1
 8     Michigan    MATMeth        0
 9     Michigan    MATNal         1

Astounding · Posted 08-03-2021 05:49 PM

Here's one guess, could be refined if the data is not as expected:

proc sort data=have;
   by state drug date;
run;
data want;
  set have;
   by state drug notsorted;;
   if first.drug then do;
      switch_to_1=0;
      switch_to_0=0;
   end;
   if first.drug and not first.state then do;
      if coverage=1 then switch_to_1 + 1;
      else if coverage=0 then switch_to_0 + 1;
   end;
   if last.drug;
run;

You may have to adjust for the actual variable names, and for any changes you would prefer to see for the structure of the results. Here, there will be 1 observation per state/drug combination. You would probably want to limit the variables you keep, since the observation still contains the DATE and COVERAGE information from the final observation.

tuckjosh · Posted 08-04-2021 03:17 PM

Is there a way to code all of this without having to run a copy for each drug? I have >20.

J. E. Tucker, MPH (he.him.his)

Catch up on SAS Innovate 2026