Convert from Events-Trials to binary form

Most or all instances of binary response models allow "Events/Trials" format. These binary response models could run in Proc Logistic, Proc GLIMMIX and Proc BGLIMM. Events could be number of people who cross the finish line, and trials could be the total number of runners, for example.

Sometimes there is interest to represent data in the binary format, where each trial comes repeated by row, and each event gets a row of its own, with a 1-0 or yes no flag. The reason I wanted to do this was to merge by-observation predictions into a corresponding binary table a la this example.

/*This data is in events-trials format*/
data bytrials;
	input Lot$ Event_count Total;
	datalines;
A 1 6
B 3 6
C 0 6
D 5 6
;

	/*Flag yes events in a YesEvents data set*/
data YesEvents;
	set bytrials;
	do Yes_Flag=1 to Event_count;
		output;
	end;
run;

/*Flag nonevents in a NoEvents data set*/
data NoEvents;
	set bytrials;
	do No_Flag=1 to Total-Event_count;
		output;
	end;
run;

/*Append YesEvents and NoEvents and create Binary Event Flag*/
data byevent;
	set YesEvents NoEvents;
	if Yes_Flag >0 then
		Event=1;
	if No_Flag >0 then
		Event=0;
	drop Yes_Flag No_Flag Event_count Total;
run;

/*Evaluate outcome by sorting and printing*/
proc sort data=byevent;
	by lot;
run;

proc print data=byevent;
run;

Binary data

While this code is not simple or elegant, it did get me to the binary data set I needed. Hopefully it will prove useful for others searching for the same thing. Comments with improvements are most welcome!

mkeintz · ‎05-02-2022

Are the intermediate data sets needed for some purpose other than producing the final BYEVENT data set?

If not, this may be a more maintainable approach:

data byevent (drop=event_count total i);
  set bytrials;
  do i=1 to total;
    if i<=event_count then event=1;
    else event=0;
    output;
  end;
run;

I don't see a need for the YESEVENTS and NOEVENTS datasets, since they could be trivially replicated by applying a WHERE filter to the BYEVENT datasets (i.e. where event=1 or where event=0;)

Rick_SAS · ‎05-03-2022

This technique is known as "unrolling the data." A similar technique can be used when there is a FREQ variable in a data set, but you want to unroll the data so that each subject appears on a row by itself.

In addition to the reasons you mention, unrolling the data can be useful if you are trying to obtain a subsample of the data. You can also use the unrolled data for a bootstrap analysis or another resampling technique.

jozgot · ‎05-03-2022

Thanks @mkeintz - the code worked well and quickly, and will prove much more maintainable.

Thanks @Rick_SAS as always for adding your perspective!

Convert from Events-Trials to binary form

Free course: Data Literacy Essentials

Get Started