Solved: enumeration before an event occurs

Amanda3 · Posted 03-21-2024 11:03 AM

Hi, I am trying to 1) count the number of RBCs before the first unit of plasma for each id and 2) calculate the plasma ratio (plasma/(plasma+rbc)) at the time of the first unit of plasma for each id.

here is what my data looks like:

id	product	time
1	rbc	1
1	rbc	2
1	plasma	3
1	rbc	4
1	plasma	5
2	plasma	1
2	rbc	2
2	rbc	3

But this is what I want it to look like:

id	# of rbcs before plasma	ratio	time
1	2	.333	3
2	0	1	1

ballardw · Posted 03-21-2024 11:49 AM

Are there ever any other values of product besides "rbc" and "plasma"? If so, you need to provide an example and the rules of how they might affect the results you want.

This works for your example data. Please note use of the data step to create a working data set to run code against.

data have;
   input id $	product $	time;
datalines;
1	rbc	1
1	rbc	2
1	plasma	3
1	rbc	4
1	plasma	5
2	plasma	1
2	rbc	2
2	rbc	3
;

data want;
   set have;
   by id;
   retain rbccount plasmacount;
   if first.id then do ;
      rbccount=0;
      plasmacount=0;
   end;
   if product='rbc' and plasmacount=0 then rbccount+1;
   if product='plasma' and plasmacount=0 then plasmacount+1;
   if plasmacount=1 then do;
      ratio=plasmacount/(plasmacount+rbccount);
      output;
      plasmacount=.;
   end;
   keep id rbccount ratio time;
   label rbccount = '# of rbc before first plasma';
run;

The BY Id expects your ID values to be sorted. If they aren't you can add the NOTSORTED option to the By statement. That would then treat different blocks of a repeated ID as a different sequence with another output row.

The BY statement in a data step creates automatic variables FIRST (and LAST) that indicate whether the current observation is the first or last of a group. The values are 1/0 (true / false) and can be tested as such to set or reset values when occurring as shown.

The RETAIN instruction means that the values are kept across the boundary of a data step so are available for use with later observations in the set. Caveat: Retain with a variable that exists in the source data set gets reset to the value read in from the data set so not useful to retain an already existing variable.

The OUTPUT statement, when used, will only write to the output set when executed. So the above code will only write the first time 'plasma' is encountered per Id. It spends some time spinning wheels without output so can be inefficient with very large data sets with many more records after the first plasma record.

View solution in original post

ballardw · Posted 03-21-2024 11:49 AM

Are there ever any other values of product besides "rbc" and "plasma"? If so, you need to provide an example and the rules of how they might affect the results you want.

This works for your example data. Please note use of the data step to create a working data set to run code against.

data have;
   input id $	product $	time;
datalines;
1	rbc	1
1	rbc	2
1	plasma	3
1	rbc	4
1	plasma	5
2	plasma	1
2	rbc	2
2	rbc	3
;

data want;
   set have;
   by id;
   retain rbccount plasmacount;
   if first.id then do ;
      rbccount=0;
      plasmacount=0;
   end;
   if product='rbc' and plasmacount=0 then rbccount+1;
   if product='plasma' and plasmacount=0 then plasmacount+1;
   if plasmacount=1 then do;
      ratio=plasmacount/(plasmacount+rbccount);
      output;
      plasmacount=.;
   end;
   keep id rbccount ratio time;
   label rbccount = '# of rbc before first plasma';
run;

The BY Id expects your ID values to be sorted. If they aren't you can add the NOTSORTED option to the By statement. That would then treat different blocks of a repeated ID as a different sequence with another output row.

The BY statement in a data step creates automatic variables FIRST (and LAST) that indicate whether the current observation is the first or last of a group. The values are 1/0 (true / false) and can be tested as such to set or reset values when occurring as shown.

The RETAIN instruction means that the values are kept across the boundary of a data step so are available for use with later observations in the set. Caveat: Retain with a variable that exists in the source data set gets reset to the value read in from the data set so not useful to retain an already existing variable.

The OUTPUT statement, when used, will only write to the output set when executed. So the above code will only write the first time 'plasma' is encountered per Id. It spends some time spinning wheels without output so can be inefficient with very large data sets with many more records after the first plasma record.

Amanda3 · Posted 03-21-2024 01:39 PM

amazing. thanks for the help.

Ksharp · Posted 03-22-2024 01:26 AM

/*Just have some fun.*/
data have;
   input id $	product $	time;
datalines;
1	rbc	1
1	rbc	2
1	plasma	3
1	rbc	4
1	plasma	5
2	plasma	1
2	rbc	2
2	rbc	3
;

data want;
do until(last.id);
   set have;
   by id;
   if product='plasma' and not found then do;found=1;_time=time;end;
end;
rbccount=_time-1;
ratio=divide(1,_time);
   keep id rbccount ratio _time;
   label rbccount = '# of rbc before first plasma';
run;

enumeration before an event occurs

Re: enumeration before an event occurs

Re: enumeration before an event occurs

Re: enumeration before an event occurs

Re: enumeration before an event occurs

enumeration before an event occurs

Re: enumeration before an event occurs

Re: enumeration before an event occurs

Re: enumeration before an event occurs

Re: enumeration before an event occurs

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away