Hello
I want to calculate percentiles (P50,P70,P80,P90) for each group.
The group is defined by the fields old and gk.
There is a problem in the result of the following code:
I get duplication of groups.
For example: For gk='(a) 2-3' I get 2 rows with old='(a) 0-10'
What is the way to solve it please?
Why did it happen?
data have;
input gk old New;
cards;
2 10 15
2 15 22
2 20 25
2 17 19
2 19 17
2 31 35
2 35 40
2 27 30
2 18 18
2 17 19
3 72 72
3 12 10
3 19 20
3 10 15
3 9 10
3 42 45
3 35 30
5 37 40
5 27 30
5 29 29
5 32 33
5 35 36
5 65 70
5 9 10
5 13 12
5 18 15
;
run;
proc format;
value gkfmt
2,3='(a) 2-3'
5='(b) 5'
;
Run;
proc format;
value old_fmt
0-10='(a) 0-10'
10-20='(b) 10-20'
20-high='(d) 20+'
;
Run;
proc sort data=have;
by gk old;
Run;
proc summary data=have;
by gk old ;
var new: ;
output out=want(drop=_type_) p50= p70= P80= P90=/autoname;
format old old_fmt. gk gkfmt.;
run;
In oldfmt, the values 10 and 20 appear in two ranges each, so the multilabel-capable PROC SUMMARY will honor that. This may cause incorrect calculations.
Regarding the duplicates:
By sorting and using BY instead of CLASS, the combination of 2 and 10 appears before 2 and 15, but much later you get 3 and 10, which falls into the same formatted range as and 10. But then the values for the group have already been calculated once, so you get a second entry.
Fix your format definition, and use CLASS and the NWAY option in the PROC SUMMARY statement.
In oldfmt, the values 10 and 20 appear in two ranges each, so the multilabel-capable PROC SUMMARY will honor that. This may cause incorrect calculations.
Regarding the duplicates:
By sorting and using BY instead of CLASS, the combination of 2 and 10 appears before 2 and 15, but much later you get 3 and 10, which falls into the same formatted range as and 10. But then the values for the group have already been calculated once, so you get a second entry.
Fix your format definition, and use CLASS and the NWAY option in the PROC SUMMARY statement.
Thanks,I write the solution by your advice
data have;
input gk old New;
cards;
2 10 15
2 15 22
2 20 25
2 17 19
2 19 17
2 31 35
2 35 40
2 27 30
2 18 18
2 17 19
3 72 72
3 12 10
3 19 20
3 10 15
3 9 10
3 42 45
3 35 30
5 37 40
5 27 30
5 29 29
5 32 33
5 35 36
5 65 70
5 9 10
5 13 12
5 18 15
;
run;
proc format;
value gkfmt
2,3='(a) 2-3'
5='(b) 5'
;
Run;
proc format;
value old_fmt
0-10='(a) 0-10'
10<-20='(b) 10-20'
20<-high='(d) 20+'
;
Run;
/**Way1**/
proc summary data=have nway;
class gk old ;
var new ;
output out=want(drop=_type_) p50= p70= P80= P90=/autoname;
format old old_fmt. gk gkfmt.;
run;
/**Way2**/
proc sort data=have;
by gk old ;
Run;
proc summary data=have nway;
class gk old ;
var new ;
output out=want(drop=_type_) p50= p70= P80= P90=/autoname;
format old old_fmt. gk gkfmt.;
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.