I am trying to compute an average within each ID group but with constraints.
For example, I would like to start taking average of two cells that are continuously ABOVE 25 and assign it a value.
- if the average is between 26-50 then 'Medium'
- if the average is between 51-60 then 'High'
If the average needs to start for the FIRST value after 25.
Here's the dataset:
ID | MAXVEL | seq_id |
A | 15 | 1 |
A | 27 | 2 |
A | 29 | 3 |
A | 52 | 4 |
B | 60 | 1 |
B | 26 | 2 |
B | 40 | 3 |
B | 2 | 4 |
B | 9 | 5 |
B | 45 | 6 |
C | 60 | 1 |
C | 12 | 2 |
C | 60 | 3 |
C | 50 | 4 |
C | 12 | 5 |
Output:
For example, starting point for A group will be at 27 as 15 is less than 26
ID | MAXVEL | seq_id | Average | Output |
A | 15 | 1 | ||
A | 27 | 2 | 28 | Medium |
A | 29 | 3 | ||
A | 52 | 4 | ||
B | 60 | 1 | 43 | Medium |
B | 26 | 2 | ||
B | 40 | 3 | ||
B | 2 | 4 | ||
B | 9 | 5 | ||
B | 45 | 6 | ||
C | 60 | 1 | ||
C | 12 | 2 | ||
C | 60 | 3 | 55 | High |
C | 50 | 4 | ||
C | 12 | 5 |
<
Thank you! 🙂
Any help would be greatly appreciated!
data have;
input ID$ MAXVEL seq_id;
cards;
A 15 1
A 27 2
A 29 3
A 52 4
B 60 1
B 26 2
B 40 3
B 2 4
B 9 5
B 45 6
C 60 1
C 12 2
C 60 3
C 50 4
C 12 5
;
data want;
length cnt $10;
set have;
by id;
retain cnt val;
if first.id then do;cnt='';val=.;end;
if maxvel >25 then check='1';
if check ne '' then cnt=cats(check,cnt);
if check='' then cnt='';
if cnt in ('1','11') then do;
val+maxvel;
end;
seq=lag(seq_id);
if check='' then val=.;
if cnt='11' then val=val/2;
if val in (26:50) then out='Medium';
else if val in (51:60) then out='High';
run;
proc sort data=want;
by id seq_id ;
run;
data want2(drop=_cnt check seq cnt );
merge want(in=a rename=(cnt=_cnt) drop=val out) want(in=b keep=id val out seq cnt rename=(seq=seq_id) where=(cnt='11'));
by id seq_id;
if a;
run;
data have;
input ID$ MAXVEL seq_id;
cards;
A 15 1
A 27 2
A 29 3
A 52 4
B 60 1
B 26 2
B 40 3
B 2 4
B 9 5
B 45 6
C 60 1
C 12 2
C 60 3
C 50 4
C 12 5
;
data want;
length cnt $10;
set have;
by id;
retain cnt val;
if first.id then do;cnt='';val=.;end;
if maxvel >25 then check='1';
if check ne '' then cnt=cats(check,cnt);
if check='' then cnt='';
if cnt in ('1','11') then do;
val+maxvel;
end;
seq=lag(seq_id);
if check='' then val=.;
if cnt='11' then val=val/2;
if val in (26:50) then out='Medium';
else if val in (51:60) then out='High';
run;
proc sort data=want;
by id seq_id ;
run;
data want2(drop=_cnt check seq cnt );
merge want(in=a rename=(cnt=_cnt) drop=val out) want(in=b keep=id val out seq cnt rename=(seq=seq_id) where=(cnt='11'));
by id seq_id;
if a;
run;
Hello. Thank you for the suggestion. I tested it and it works. Really appreciate it.
Another quick question - is there an easy fix if I have the following?
Link: https://communities.sas.com/t5/SAS-Data-Management/Starting-point-average/m-p/376087
It's almost similar to this one but there is one other constraint.
Really appreciate it!
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.