Count column to a max value using other columns

1SasUser1 · Posted 08-01-2019 05:09 AM

Hi,

I am using SAS Enterprise Guide 5.1 version 9.3.

I am trying to create a new column by looking at other columns.

I have columns: amt_reached, amt_needed, Level and each are split by ID.

Currently if the amt_reached is not greater than or equal to amt_needed, then Level = 1, otherwise it is 0. I would like a new column to count Level up to 3 and then this is the max and all values after would be at 0 unless the following line the amt_reached is greater than or equals amt_needed. Then the count would restart.

Here is what I want:

ID	Level	amt_reached	amt_needed	What I want
123	0	231.71	482.73	1
123	1	251.02	235.1	2
123	1	235.1	476.33	3
123	1	241.23	478.95	0
123	1	237.72	470.33	0
123	0	465.77	465.77	0
123	1	.	215.89	1
123	1	215.89	460.7	2
123	1	244.81	486.22	3
123	1	241.41	477.02	0
123	1	235.61	469.03	0
123	1	233.42	475.1	0
234	1	139.35	278.7	1
234	1	139.35	278.7	2
234	1	139.35	278.7	3
234	1	139.35	278.7	0
234	1	139.35	278.7	0
234	1	139.35	278.7	0
234	1	139.35	278.7	0
234	1	139.35	278.7	0
234	1	139.35	278.7	0
234	1	139.35	278.7	0
234	1	139.35	278.7	0
234	1	139.35	278.7	0
156	1	.	18.99	1
156	1	18.99	39.07	2
156	1	20.08	40.26	3
156	1	20.18	40.46	0
156	1	20.28	40.63	0
156	1	20.35	40.8	0
156	1	20.45	40.97	0
156	0	61.64	41.12	0
156	1	.	18.02	1
156	0	37.09	37.09	0
156	1	.	17.68	1
156	1	17.68	36.4	2

SAS Code:

data WORK.COUNTWITHMAX;
  infile datalines dsd truncover;
  input ID:BEST12. Level:BEST12. amt_reached:BEST12. amt_needed:BEST12.;
  format ID BEST12. Level BEST12. amt_reached BEST12. amt_needed BEST12.;
datalines;
123 0 231.71 482.73
123 1 251.02 235.1
123 1 235.1 476.33
123 1 241.23 478.95
123 1 237.72 470.33
123 0 465.77 465.77
123 1 . 215.89
123 1 215.89 460.7
123 1 244.81 486.22
123 1 241.41 477.02
123 1 235.61 469.03
123 1 233.42 475.1
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
156 1 . 18.99
156 1 18.99 39.07
156 1 20.08 40.26
156 1 20.18 40.46
156 1 20.28 40.63
156 1 20.35 40.8
156 1 20.45 40.97
156 0 61.64 41.12
156 1 . 18.02
156 0 37.09 37.09
156 1 . 17.68
156 1 17.68 36.4
;;;;

Here is what I have tried so far:

retain MaxCount;
	if first.ID	
		then MaxCount = 0;
	if Level = 0
		then MaxCount = 0;
	else do;
		MaxCount = mod(MaxCount + 1,4);
if MaxCount = 0 and amt_reached >= amt_needed then MaxCount = 1;


end;

It does not currently work, but I can't quite finish it.

Any help would be greatly appreciated

Thanks!

Kurt_Bremser · Posted 08-01-2019 05:42 AM

Per your description, you would restart with 1 in observation 5, as in observation 6 amt_reached equals amt_needed. Please clarify.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

1SasUser1 · Posted 08-01-2019 05:52 AM

Hi KurtBremser,

Thanks for your question.

As in observation 6 column amt_reached = amt_needed the new column remains at 0. It is observation 7 where the count can restart, not observation 5. If amt_reached in observation 6 had been less than amt_needed then the new column would remain at 0 until there was on occasion where amt_reached >= amt_needed.

The count always gets to 3 and remains at 0 for the next observations until amt_reached >= amt_needed.

Hope this clarifies.

Kurt_Bremser · Posted 08-01-2019 07:02 AM

So you mean to start a new series when the previous observation satisfied the condition?

But then I do no understand the last 4 observations:

156	1	.	18.02	1
156	0	37.09	37.09	0
156	1	.	17.68	1
156	1	17.68	36.4	2

Should't those values be 1 - 2 - 1 - 2?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Kurt_Bremser · Posted 08-01-2019 07:08 AM

I think I have found it. It creates your output:

data have;
input ID Level amt_reached amt_needed;
datalines;
123 0 231.71 482.73
123 1 251.02 235.1
123 1 235.1 476.33
123 1 241.23 478.95
123 1 237.72 470.33
123 0 465.77 465.77
123 1 . 215.89
123 1 215.89 460.7
123 1 244.81 486.22
123 1 241.41 477.02
123 1 235.61 469.03
123 1 233.42 475.1
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
234 1 139.35 278.7
156 1 . 18.99
156 1 18.99 39.07
156 1 20.08 40.26
156 1 20.18 40.46
156 1 20.28 40.63
156 1 20.35 40.8
156 1 20.45 40.97
156 0 61.64 41.12
156 1 . 18.02
156 0 37.09 37.09
156 1 . 17.68
156 1 17.68 36.4
;

data want;
set have;
by id notsorted;
_level = lag(level);
if first.id then new = 1;
else do;
  if level = 0 then new = 0;
  if new in (1,2) then new + 1;
  else if new = 3 then new = 0;
  else if _level = 0 then new = 1;
end;
run;

proc print data=want noobs;
run;

Result:

                  amt_      amt_
 ID    Level    reached    needed    _level    new

123      0       231.71    482.73       .       1 
123      1       251.02    235.10       0       2 
123      1       235.10    476.33       1       3 
123      1       241.23    478.95       1       0 
123      1       237.72    470.33       1       0 
123      0       465.77    465.77       1       0 
123      1          .      215.89       0       1 
123      1       215.89    460.70       1       2 
123      1       244.81    486.22       1       3 
123      1       241.41    477.02       1       0 
123      1       235.61    469.03       1       0 
123      1       233.42    475.10       1       0 
234      1       139.35    278.70       1       1 
234      1       139.35    278.70       1       2 
234      1       139.35    278.70       1       3 
234      1       139.35    278.70       1       0 
234      1       139.35    278.70       1       0 
234      1       139.35    278.70       1       0 
234      1       139.35    278.70       1       0 
234      1       139.35    278.70       1       0 
234      1       139.35    278.70       1       0 
234      1       139.35    278.70       1       0 
234      1       139.35    278.70       1       0 
234      1       139.35    278.70       1       0 
156      1          .       18.99       1       1 
156      1        18.99     39.07       1       2 
156      1        20.08     40.26       1       3 
156      1        20.18     40.46       1       0 
156      1        20.28     40.63       1       0 
156      1        20.35     40.80       1       0 
156      1        20.45     40.97       1       0 
156      0        61.64     41.12       1       0 
156      1          .       18.02       0       1 
156      0        37.09     37.09       1       0 
156      1          .       17.68       0       1 
156      1        17.68     36.40       1       2

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

1SasUser1 · Posted 08-01-2019 08:15 AM

Thanks that does work for these cases. I have found a case where it doesn't work and I think perhaps because there is no reference to the amt_reached and amt_needed columns.

In reference to your previous question for the last 4 observations, the reason why I want it to be 1,0,1,2 is because the 3rd from last observation is as follows:

ID	Level	amt_reached	amt_needed	What I want
156	0	37.09	37.09	0

Since the columns amt_reached and amt_needed fall are equal, then the new column will be at 0. If amt_reached < amt_needed, then it should be 1,2,3,0 for the last 4 observations. Whenever Level = 0 and amt_reached >= amt_needed then that row should = 0 and the count restarts at the next observation.

However if Level = 1 and amt_reached >= amt_needed, then the new column should equal 1 at that point. These two columns are the main decision points.

Hope that is a bit clearer.

Here is an example of when it doesn't work with your new code:

ID	_Level	new	Level	amt_reached	amt_needed	What I want
111	0	1	1	125.27	247.92	1
111	1	2	1	122.65	242.71	2
111	1	3	1	120.06	237.33	3
111	1	0	1	117.27	232.1	0
111	1	0	1	114.83	227.21	0
111	1	0	1	112.38	222.43	0
111	1	0	1	110.05	217.77	0
111	1	0	0	213.27	213.27	0
111	0	1	0	97.51	97.51	0
111	0	1	0	95.02	95.02	0
111	0	1	0	92.67	92.67	0
111	0	1	0	90.3	90.3	0

Thanks again!

Kurt_Bremser · Posted 08-01-2019 08:39 AM

I think a small change in the code should do it:

data want;
set have;
by id notsorted;
_level = lag(level);
if first.id then new = 1;
else do;
  if level = 0 then new = 0;
  else if new in (1,2) then new + 1;
  else if new = 3 then new = 0;
  else if _level = 0 then new = 1;
end;
run;

Note the extra "else" in line 8 that makes it a whole if-then-else-if chain.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

1SasUser1 · Posted 08-01-2019 09:01 AM

That did the trick.

The other thing is with the code saying if first.ID then new = 1, this is causing some cases where a new ID begins to have 1 on the first observation, even if Level doesn't equal 1. I also can see cases where new = 2 but the previous observation isn't there is at 0

Please see below:

ID	_Level	new	Level	amt_reached	amt_needed	What I want
222	.	1	0	21	20.59	0
222	0	0	0	21	20.25	0
222	0	0	0	21	20.45	0
222	1	0	0	22.54	22.54	0
222	0	0	0	21	20.06	0
222	0	0	0	20	19.46	0
222	0	0	0	20	19.92	0
222	0	1	1	18.5	18.95	1
222	0	0	0	23	21.59	0
222	0	0	0	30	19.14	0
222	0	0	0	21	20.97	0
222	0	0	0	21	19.71	0
333	0	1	0	8.99	7.2	0
333	0	0	0	7.69	5.13	0
333	0	0	0	5.44	5	0
333	0	0	0	7.54	5	0
333	0	0	0	10.51	5.53	0
333	0	0	0	10	5.9	0
333	0	0	0	7.67	5.13	0
333	1	0	0	6.77	5.34	0
333	0	0	0	10	5	0
333	0	1	1	6	5	1
333	0	0	0	7.36	5	0
333	0	0	0	7	5	0
444	1	2	1	.	5	1
444	1	0	0	77.7	10	0
444	0	1	1	20	5	1
444	0	0	0	20	5	0
444	0	1	0	20	12.09	0
444	0	1	1	.	5	1
444	0	0	0	20	5.52	0
444	1	0	0	20	10.34	0
444	0	1	1	20	5	1
444	0	0	0	.	0	0
444	1	2	1	50	13.8	1

1SasUser1 · Posted 08-01-2019 09:52 AM

Sorry, I realised I had ordered the data incorrectly for my last comment. The issue with new = 2 is fine. The issue is only with the starting ID = 1.

Apologies.

Count column to a max value using other columns

Re: Count column to a max value using other columns

Re: Count column to a max value using other columns

Re: Count column to a max value using other columns

Re: Count column to a max value using other columns

Re: Count column to a max value using other columns

Re: Count column to a max value using other columns

Re: Count column to a max value using other columns

Re: Count column to a max value using other columns