BookmarkSubscribeRSS Feed

Convert from Events-Trials to binary form

Started ‎05-02-2022 by
Modified ‎05-02-2022 by
Views 615

Most or all instances of binary response models allow "Events/Trials" format. These binary response models could run in Proc Logistic, Proc GLIMMIX and Proc BGLIMM. Events could be number of people who cross the finish line, and trials could be the total number of runners, for example. 

 

Sometimes there is interest to represent data in the binary format, where each trial comes repeated by row, and each event gets a row of its own, with a 1-0 or yes no flag. The reason I wanted to do this was to merge by-observation predictions into a corresponding binary table a la this example.

 

/*This data is in events-trials format*/
data bytrials;
	input Lot$ Event_count Total;
	datalines;
A 1 6
B 3 6
C 0 6
D 5 6
;

	/*Flag yes events in a YesEvents data set*/
data YesEvents;
	set bytrials;
	do Yes_Flag=1 to Event_count;
		output;
	end;
run;

/*Flag nonevents in a NoEvents data set*/
data NoEvents;
	set bytrials;
	do No_Flag=1 to Total-Event_count;
		output;
	end;
run;

/*Append YesEvents and NoEvents and create Binary Event Flag*/
data byevent;
	set YesEvents NoEvents;
	if Yes_Flag >0 then
		Event=1;
	if No_Flag >0 then
		Event=0;
	drop Yes_Flag No_Flag Event_count Total;
run;

/*Evaluate outcome by sorting and printing*/
proc sort data=byevent;
	by lot;
run;

proc print data=byevent;
run;

Binary dataBinary data

While this code is not simple or elegant, it did get me to the binary data set I needed. Hopefully it will prove useful for others searching for the same thing. Comments with improvements are most welcome!

Comments

Are the intermediate data sets needed for some purpose other than producing the final BYEVENT data set?

 

If not, this may be a more maintainable approach:

 

data byevent (drop=event_count total i);
  set bytrials;
  do i=1 to total;
    if i<=event_count then event=1;
    else event=0;
    output;
  end;
run;

I don't see a need for the YESEVENTS and NOEVENTS datasets, since they could be trivially replicated by applying a WHERE filter to the BYEVENT datasets (i.e. where event=1 or where event=0;

This technique is known as "unrolling the data." A similar technique can be used when there is a FREQ variable in a data set, but you want to unroll the data so that each subject appears on a row by itself. 

 

In addition to the reasons you mention, unrolling the data can be useful if you are trying to obtain a subsample of the data. You can also use the unrolled data for a bootstrap analysis or another resampling technique.

Thanks @mkeintz - the code worked well and quickly, and will prove much more maintainable.

 

Thanks @Rick_SAS as always for adding your perspective!

Version history
Last update:
‎05-02-2022 09:01 PM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags