03-02-2016 05:16 PM
I have a challenge seemingly requiring a DO UNTIL-type operation which I am having trouble getting to work.
What I have is a data set with multiple users, each with 1 or more records. I want to sum up successive rows for each user, resetting the sum once it reaches a certain amount.
So, just as an example:
input user $ value;
My desire is to create a new variable (call it simply "count"), essentially the number of successive rows which add up to less than 30. So, the first row (with value 9) would have a count value of 1; the second row (with value 12) would have a count value of 2 (since 9 + 12 = 21 < 30); the third row wouldhave a count value of 1 (since 9 + 12 + 18 = 39 > 30, forcing it to reset to 1); the fourth row (with value 15) would again have a count value of 1 (since 18 + 15 = 33 > 30, forcing another reset to 1); the fifth row (with value 7) would have a count value of 2 (since 15 + 7 = 22 < 30); the sixth row (with value 4) would have a count value of 3 (since 15 + 7 + 4 = 26 < 30); the seventh row (with value 3) would have a count value of 4 (since 15 + 7 + 4 + 3 = 29 < 30); and the next row's count would reset to 1 unless its value were 0.
If at any point it hits a new user ID, the count variable automatically outputs and resets to 0, and the process begins again for the next user.
I am unsure how to make this work; I cannot seem to get DO UNTIL to iterate over successive rows in the data set.
Of course, if there is another way to do this without using DO UNTIL, I'm all ears.
Thanks in advance for any assistance!
03-02-2016 05:23 PM
SAS data steps have implicit loops - you don't need to code the loop, just your logic.
If you post a sample output set, it will be easier to visualize and someone can demonstrate the coding for you.
03-02-2016 05:41 PM
To carry the value of a variable across observations you will want to use Retain. The break at ID would likely involve BY processing on the ID variable and first. and last. coding. Is your data sorted by ID?
Also, you should provide an example of what you want the output data set to look like with example of the cases that can be encountered:
Id that has at least one group that totals to 30, an Id with a single value less than 30, an Id with a single value greater than 30, an Id with several rows that does not reach 30.
Do any of the rows in your actual data have missing values for the Value variable? If so, does that row iterate the count? What if all values are missing for an Id? How to treat a missing if it is the last row but the total hasn't reached 30?
03-02-2016 06:57 PM
It would be helpful if you show the output you want in data set format. If I understand correctly the tricky part of this is the thing you want to output had already gone by when you determine the reset point. This method user look ahead. I don't know if you want to output every observation or just the row before the reset. If you remove the output statement you get last row before reset 2 1 1 4 ect.
data program; input user $ value @@; datalines; 1 9 1 12 1 18 1 15 1 7 1 4 1 3 2 9 2 1 2 1 2 15 2 7 2 4 2 3 ;;;; run; proc print; run; data rows; if 0 then set program; do row=1 by 1 until(sum gt 30 or last.user); set program; by user; if row eq 1 then sum = value; if not eof then set program(firstobs=2 keep=value rename=(value=_value)) end=eof; if last.user then call missing(_value); sum + _value; output; end; run; proc print; run;
03-02-2016 08:08 PM
data program; input user $ value; datalines; 1 9 1 12 1 18 1 15 1 7 1 4 1 3 ; run; data want; set program; by user ; if first.user then do;sum=0;count=0;end; sum+value; if sum gt 30 then do;sum=value;count=0;end; count+1; run;