Hi,
I am using SAS Enterprise Guide 5.1 version 9.3.
I am trying to create a new column by looking at other columns.
I have columns: amt_reached, amt_needed, Level and each are split by ID.
Currently if the amt_reached is not greater than or equal to amt_needed, then Level = 1, otherwise it is 0. I would like a new column to count Level up to 3 and then this is the max and all values after would be at 0 unless the following line the amt_reached is greater than or equals amt_needed. Then the count would restart.
Here is what I want:
ID | Level | amt_reached | amt_needed | What I want |
123 | 0 | 231.71 | 482.73 | 1 |
123 | 1 | 251.02 | 235.1 | 2 |
123 | 1 | 235.1 | 476.33 | 3 |
123 | 1 | 241.23 | 478.95 | 0 |
123 | 1 | 237.72 | 470.33 | 0 |
123 | 0 | 465.77 | 465.77 | 0 |
123 | 1 | . | 215.89 | 1 |
123 | 1 | 215.89 | 460.7 | 2 |
123 | 1 | 244.81 | 486.22 | 3 |
123 | 1 | 241.41 | 477.02 | 0 |
123 | 1 | 235.61 | 469.03 | 0 |
123 | 1 | 233.42 | 475.1 | 0 |
234 | 1 | 139.35 | 278.7 | 1 |
234 | 1 | 139.35 | 278.7 | 2 |
234 | 1 | 139.35 | 278.7 | 3 |
234 | 1 | 139.35 | 278.7 | 0 |
234 | 1 | 139.35 | 278.7 | 0 |
234 | 1 | 139.35 | 278.7 | 0 |
234 | 1 | 139.35 | 278.7 | 0 |
234 | 1 | 139.35 | 278.7 | 0 |
234 | 1 | 139.35 | 278.7 | 0 |
234 | 1 | 139.35 | 278.7 | 0 |
234 | 1 | 139.35 | 278.7 | 0 |
234 | 1 | 139.35 | 278.7 | 0 |
156 | 1 | . | 18.99 | 1 |
156 | 1 | 18.99 | 39.07 | 2 |
156 | 1 | 20.08 | 40.26 | 3 |
156 | 1 | 20.18 | 40.46 | 0 |
156 | 1 | 20.28 | 40.63 | 0 |
156 | 1 | 20.35 | 40.8 | 0 |
156 | 1 | 20.45 | 40.97 | 0 |
156 | 0 | 61.64 | 41.12 | 0 |
156 | 1 | . | 18.02 | 1 |
156 | 0 | 37.09 | 37.09 | 0 |
156 | 1 | . | 17.68 | 1 |
156 | 1 | 17.68 | 36.4 | 2 |
SAS Code:
data WORK.COUNTWITHMAX;
infile datalines dsd truncover;
input ID:BEST12. Level:BEST12. amt_reached:BEST12. amt_needed:BEST12.;
format ID BEST12. Level BEST12. amt_reached BEST12. amt_needed BEST12.;
datalines;
123 0 231.71 482.73
123 1 251.02 235.1
123 1 235.1 476.33
123 1 241.23 478.95
123 1 237.72 470.33
123 0 465.77 465.77
123 1 . 215.89
123 1 215.89 460.7
123 1 244.81 486.22
123 1 241.41 477.02
123 1 235.61 469.03
123 1 233.42 475.1
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
156 1 . 18.99
156 1 18.99 39.07
156 1 20.08 40.26
156 1 20.18 40.46
156 1 20.28 40.63
156 1 20.35 40.8
156 1 20.45 40.97
156 0 61.64 41.12
156 1 . 18.02
156 0 37.09 37.09
156 1 . 17.68
156 1 17.68 36.4
;;;;
Here is what I have tried so far:
retain MaxCount;
if first.ID
then MaxCount = 0;
if Level = 0
then MaxCount = 0;
else do;
MaxCount = mod(MaxCount + 1,4);
if MaxCount = 0 and amt_reached >= amt_needed then MaxCount = 1;
end;
It does not currently work, but I can't quite finish it.
Any help would be greatly appreciated
Thanks!
Per your description, you would restart with 1 in observation 5, as in observation 6 amt_reached equals amt_needed. Please clarify.
Hi KurtBremser,
Thanks for your question.
As in observation 6 column amt_reached = amt_needed the new column remains at 0. It is observation 7 where the count can restart, not observation 5. If amt_reached in observation 6 had been less than amt_needed then the new column would remain at 0 until there was on occasion where amt_reached >= amt_needed.
The count always gets to 3 and remains at 0 for the next observations until amt_reached >= amt_needed.
Hope this clarifies.
So you mean to start a new series when the previous observation satisfied the condition?
But then I do no understand the last 4 observations:
156 1 . 18.02 1 156 0 37.09 37.09 0 156 1 . 17.68 1 156 1 17.68 36.4 2
Should't those values be 1 - 2 - 1 - 2?
I think I have found it. It creates your output:
data have;
input ID Level amt_reached amt_needed;
datalines;
123 0 231.71 482.73
123 1 251.02 235.1
123 1 235.1 476.33
123 1 241.23 478.95
123 1 237.72 470.33
123 0 465.77 465.77
123 1 . 215.89
123 1 215.89 460.7
123 1 244.81 486.22
123 1 241.41 477.02
123 1 235.61 469.03
123 1 233.42 475.1
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
156 1 . 18.99
156 1 18.99 39.07
156 1 20.08 40.26
156 1 20.18 40.46
156 1 20.28 40.63
156 1 20.35 40.8
156 1 20.45 40.97
156 0 61.64 41.12
156 1 . 18.02
156 0 37.09 37.09
156 1 . 17.68
156 1 17.68 36.4
;
data want;
set have;
by id notsorted;
_level = lag(level);
if first.id then new = 1;
else do;
if level = 0 then new = 0;
if new in (1,2) then new + 1;
else if new = 3 then new = 0;
else if _level = 0 then new = 1;
end;
run;
proc print data=want noobs;
run;
Result:
amt_ amt_ ID Level reached needed _level new 123 0 231.71 482.73 . 1 123 1 251.02 235.10 0 2 123 1 235.10 476.33 1 3 123 1 241.23 478.95 1 0 123 1 237.72 470.33 1 0 123 0 465.77 465.77 1 0 123 1 . 215.89 0 1 123 1 215.89 460.70 1 2 123 1 244.81 486.22 1 3 123 1 241.41 477.02 1 0 123 1 235.61 469.03 1 0 123 1 233.42 475.10 1 0 234 1 139.35 278.70 1 1 234 1 139.35 278.70 1 2 234 1 139.35 278.70 1 3 234 1 139.35 278.70 1 0 234 1 139.35 278.70 1 0 234 1 139.35 278.70 1 0 234 1 139.35 278.70 1 0 234 1 139.35 278.70 1 0 234 1 139.35 278.70 1 0 234 1 139.35 278.70 1 0 234 1 139.35 278.70 1 0 234 1 139.35 278.70 1 0 156 1 . 18.99 1 1 156 1 18.99 39.07 1 2 156 1 20.08 40.26 1 3 156 1 20.18 40.46 1 0 156 1 20.28 40.63 1 0 156 1 20.35 40.80 1 0 156 1 20.45 40.97 1 0 156 0 61.64 41.12 1 0 156 1 . 18.02 0 1 156 0 37.09 37.09 1 0 156 1 . 17.68 0 1 156 1 17.68 36.40 1 2
Thanks that does work for these cases. I have found a case where it doesn't work and I think perhaps because there is no reference to the amt_reached and amt_needed columns.
In reference to your previous question for the last 4 observations, the reason why I want it to be 1,0,1,2 is because the 3rd from last observation is as follows:
ID | Level | amt_reached | amt_needed | What I want |
156 | 0 | 37.09 | 37.09 | 0 |
Since the columns amt_reached and amt_needed fall are equal, then the new column will be at 0. If amt_reached < amt_needed, then it should be 1,2,3,0 for the last 4 observations. Whenever Level = 0 and amt_reached >= amt_needed then that row should = 0 and the count restarts at the next observation.
However if Level = 1 and amt_reached >= amt_needed, then the new column should equal 1 at that point. These two columns are the main decision points.
Hope that is a bit clearer.
Here is an example of when it doesn't work with your new code:
ID | _Level | new | Level | amt_reached | amt_needed | What I want |
111 | 0 | 1 | 1 | 125.27 | 247.92 | 1 |
111 | 1 | 2 | 1 | 122.65 | 242.71 | 2 |
111 | 1 | 3 | 1 | 120.06 | 237.33 | 3 |
111 | 1 | 0 | 1 | 117.27 | 232.1 | 0 |
111 | 1 | 0 | 1 | 114.83 | 227.21 | 0 |
111 | 1 | 0 | 1 | 112.38 | 222.43 | 0 |
111 | 1 | 0 | 1 | 110.05 | 217.77 | 0 |
111 | 1 | 0 | 0 | 213.27 | 213.27 | 0 |
111 | 0 | 1 | 0 | 97.51 | 97.51 | 0 |
111 | 0 | 1 | 0 | 95.02 | 95.02 | 0 |
111 | 0 | 1 | 0 | 92.67 | 92.67 | 0 |
111 | 0 | 1 | 0 | 90.3 | 90.3 | 0 |
Thanks again!
I think a small change in the code should do it:
data want;
set have;
by id notsorted;
_level = lag(level);
if first.id then new = 1;
else do;
if level = 0 then new = 0;
else if new in (1,2) then new + 1;
else if new = 3 then new = 0;
else if _level = 0 then new = 1;
end;
run;
Note the extra "else" in line 8 that makes it a whole if-then-else-if chain.
That did the trick.
The other thing is with the code saying if first.ID then new = 1, this is causing some cases where a new ID begins to have 1 on the first observation, even if Level doesn't equal 1. I also can see cases where new = 2 but the previous observation isn't there is at 0
Please see below:
ID | _Level | new | Level | amt_reached | amt_needed | What I want |
222 | . | 1 | 0 | 21 | 20.59 | 0 |
222 | 0 | 0 | 0 | 21 | 20.25 | 0 |
222 | 0 | 0 | 0 | 21 | 20.45 | 0 |
222 | 1 | 0 | 0 | 22.54 | 22.54 | 0 |
222 | 0 | 0 | 0 | 21 | 20.06 | 0 |
222 | 0 | 0 | 0 | 20 | 19.46 | 0 |
222 | 0 | 0 | 0 | 20 | 19.92 | 0 |
222 | 0 | 1 | 1 | 18.5 | 18.95 | 1 |
222 | 0 | 0 | 0 | 23 | 21.59 | 0 |
222 | 0 | 0 | 0 | 30 | 19.14 | 0 |
222 | 0 | 0 | 0 | 21 | 20.97 | 0 |
222 | 0 | 0 | 0 | 21 | 19.71 | 0 |
333 | 0 | 1 | 0 | 8.99 | 7.2 | 0 |
333 | 0 | 0 | 0 | 7.69 | 5.13 | 0 |
333 | 0 | 0 | 0 | 5.44 | 5 | 0 |
333 | 0 | 0 | 0 | 7.54 | 5 | 0 |
333 | 0 | 0 | 0 | 10.51 | 5.53 | 0 |
333 | 0 | 0 | 0 | 10 | 5.9 | 0 |
333 | 0 | 0 | 0 | 7.67 | 5.13 | 0 |
333 | 1 | 0 | 0 | 6.77 | 5.34 | 0 |
333 | 0 | 0 | 0 | 10 | 5 | 0 |
333 | 0 | 1 | 1 | 6 | 5 | 1 |
333 | 0 | 0 | 0 | 7.36 | 5 | 0 |
333 | 0 | 0 | 0 | 7 | 5 | 0 |
444 | 1 | 2 | 1 | . | 5 | 1 |
444 | 1 | 0 | 0 | 77.7 | 10 | 0 |
444 | 0 | 1 | 1 | 20 | 5 | 1 |
444 | 0 | 0 | 0 | 20 | 5 | 0 |
444 | 0 | 1 | 0 | 20 | 12.09 | 0 |
444 | 0 | 1 | 1 | . | 5 | 1 |
444 | 0 | 0 | 0 | 20 | 5.52 | 0 |
444 | 1 | 0 | 0 | 20 | 10.34 | 0 |
444 | 0 | 1 | 1 | 20 | 5 | 1 |
444 | 0 | 0 | 0 | . | 0 | 0 |
444 | 1 | 2 | 1 | 50 | 13.8 | 1 |
Sorry, I realised I had ordered the data incorrectly for my last comment. The issue with new = 2 is fine. The issue is only with the starting ID = 1.
Apologies.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.