BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Amanda3
Fluorite | Level 6

Hi, I am trying to 1) count the number of RBCs before the first unit of plasma for each id and 2) calculate the plasma ratio (plasma/(plasma+rbc)) at the time of the first unit of plasma for each id.  

here is what my data looks like: 

idproducttime
1rbc1
1rbc2
1plasma3
1rbc4
1plasma5
2plasma1
2rbc2
2rbc3

 

But this is what I want it to look like: 

id# of rbcs before plasmaratiotime
12.3333
2011
1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Are there ever any other values of product besides "rbc" and "plasma"?  If so, you need to provide an example and the rules of how they might affect the results you want.

 

This works for your example data. Please note use of the data step to create a working data set to run code against.

data have;
   input id $	product $	time;
datalines;
1	rbc	1
1	rbc	2
1	plasma	3
1	rbc	4
1	plasma	5
2	plasma	1
2	rbc	2
2	rbc	3
;

data want;
   set have;
   by id;
   retain rbccount plasmacount;
   if first.id then do ;
      rbccount=0;
      plasmacount=0;
   end;
   if product='rbc' and plasmacount=0 then rbccount+1;
   if product='plasma' and plasmacount=0 then plasmacount+1;
   if plasmacount=1 then do;
      ratio=plasmacount/(plasmacount+rbccount);
      output;
      plasmacount=.;
   end;
   keep id rbccount ratio time;
   label rbccount = '# of rbc before first plasma';
run;

The BY Id expects your ID values to be sorted. If they aren't you can add the NOTSORTED option to the By statement. That would then treat different blocks of a repeated ID as a different sequence with another output row.

The BY statement in a data step creates automatic variables FIRST (and LAST) that indicate whether the current observation is the first or last of a group. The values are 1/0 (true / false) and can be tested as such to set or reset values when occurring as shown.

The RETAIN instruction means that the values are kept across the boundary of a data step so are available for use with later observations in the set. Caveat: Retain with a variable that exists in the source data set gets reset to the value read in from the data set so not useful to retain an already existing variable.

The OUTPUT statement, when used, will only write to the output set when executed. So the above code will only write the first time 'plasma' is encountered per Id. It spends some time spinning wheels without output so can be inefficient with very large data sets with many more records after the first plasma record.

View solution in original post

3 REPLIES 3
ballardw
Super User

Are there ever any other values of product besides "rbc" and "plasma"?  If so, you need to provide an example and the rules of how they might affect the results you want.

 

This works for your example data. Please note use of the data step to create a working data set to run code against.

data have;
   input id $	product $	time;
datalines;
1	rbc	1
1	rbc	2
1	plasma	3
1	rbc	4
1	plasma	5
2	plasma	1
2	rbc	2
2	rbc	3
;

data want;
   set have;
   by id;
   retain rbccount plasmacount;
   if first.id then do ;
      rbccount=0;
      plasmacount=0;
   end;
   if product='rbc' and plasmacount=0 then rbccount+1;
   if product='plasma' and plasmacount=0 then plasmacount+1;
   if plasmacount=1 then do;
      ratio=plasmacount/(plasmacount+rbccount);
      output;
      plasmacount=.;
   end;
   keep id rbccount ratio time;
   label rbccount = '# of rbc before first plasma';
run;

The BY Id expects your ID values to be sorted. If they aren't you can add the NOTSORTED option to the By statement. That would then treat different blocks of a repeated ID as a different sequence with another output row.

The BY statement in a data step creates automatic variables FIRST (and LAST) that indicate whether the current observation is the first or last of a group. The values are 1/0 (true / false) and can be tested as such to set or reset values when occurring as shown.

The RETAIN instruction means that the values are kept across the boundary of a data step so are available for use with later observations in the set. Caveat: Retain with a variable that exists in the source data set gets reset to the value read in from the data set so not useful to retain an already existing variable.

The OUTPUT statement, when used, will only write to the output set when executed. So the above code will only write the first time 'plasma' is encountered per Id. It spends some time spinning wheels without output so can be inefficient with very large data sets with many more records after the first plasma record.

Amanda3
Fluorite | Level 6
amazing. thanks for the help.
Ksharp
Super User
/*Just have some fun.*/
data have;
   input id $	product $	time;
datalines;
1	rbc	1
1	rbc	2
1	plasma	3
1	rbc	4
1	plasma	5
2	plasma	1
2	rbc	2
2	rbc	3
;

data want;
do until(last.id);
   set have;
   by id;
   if product='plasma' and not found then do;found=1;_time=time;end;
end;
rbccount=_time-1;
ratio=divide(1,_time);
   keep id rbccount ratio _time;
   label rbccount = '# of rbc before first plasma';
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 531 views
  • 1 like
  • 3 in conversation