BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Ronein
Onyx | Level 15

Hello

I want to calculate percentiles (P50,P70,P80,P90)  for each group.

The group is defined by the fields old and gk.

There is a problem in the result of the following code:

I get duplication of groups.

For example: For  gk='(a) 2-3' I get 2 rows with old='(a) 0-10'

What is the way to solve it please?

Why did it happen?

Ronein_0-1654497618995.png

 

 


data have;
input gk old New;
cards;
2 10 15
2 15 22
2 20 25
2 17 19
2 19 17
2 31 35
2 35 40
2 27 30
2 18 18
2 17 19
3 72 72
3 12 10
3 19 20
3 10 15
3 9 10
3 42 45
3 35 30
5 37 40
5 27 30
5 29 29
5 32 33
5 35 36
5 65 70
5 9 10
5 13 12
5 18 15
;
run;

proc format;
value gkfmt
2,3='(a) 2-3'
5='(b) 5'
;
Run;

proc format;
value old_fmt
0-10='(a) 0-10'
10-20='(b) 10-20'
20-high='(d) 20+'
;
Run;

proc sort data=have;
by gk old;
Run;

proc summary data=have;
by gk old ;
var new: ;
output out=want(drop=_type_) p50= p70= P80=  P90=/autoname;
format  old old_fmt.  gk gkfmt.;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

In oldfmt, the values 10 and 20 appear in two ranges each, so the multilabel-capable PROC SUMMARY will honor that. This may cause incorrect calculations.

 

Regarding the duplicates:

By sorting and using BY instead of CLASS, the combination of 2 and 10 appears before 2 and 15, but much later you get 3 and 10, which falls into the same formatted range as  and 10. But then the values for the group have already been calculated once, so you get a second entry.

 

Fix your format definition, and use CLASS and the NWAY option in the PROC SUMMARY statement.

View solution in original post

3 REPLIES 3
Kurt_Bremser
Super User

In oldfmt, the values 10 and 20 appear in two ranges each, so the multilabel-capable PROC SUMMARY will honor that. This may cause incorrect calculations.

 

Regarding the duplicates:

By sorting and using BY instead of CLASS, the combination of 2 and 10 appears before 2 and 15, but much later you get 3 and 10, which falls into the same formatted range as  and 10. But then the values for the group have already been calculated once, so you get a second entry.

 

Fix your format definition, and use CLASS and the NWAY option in the PROC SUMMARY statement.

Ronein
Onyx | Level 15

Thanks,I write the solution by your advice


data have;
input gk old New;
cards;
2 10 15
2 15 22
2 20 25
2 17 19
2 19 17
2 31 35
2 35 40
2 27 30
2 18 18
2 17 19
3 72 72
3 12 10
3 19 20
3 10 15
3 9 10
3 42 45
3 35 30
5 37 40
5 27 30
5 29 29
5 32 33
5 35 36
5 65 70
5 9 10
5 13 12
5 18 15
;
run;
proc format;
value gkfmt
2,3='(a) 2-3'
5='(b) 5'
;
Run;
proc format;
value old_fmt
0-10='(a) 0-10'
10<-20='(b) 10-20'
20<-high='(d) 20+'
;
Run;

/**Way1**/
proc summary data=have nway;
class gk old ;
var new ;
output out=want(drop=_type_) p50= p70= P80=  P90=/autoname;
format  old old_fmt.  gk gkfmt.;
run;


/**Way2**/
proc sort data=have;
by gk old ;
Run;
proc summary data=have nway;
class gk old ;
var new ;
output out=want(drop=_type_) p50= p70= P80=  P90=/autoname;
format  old old_fmt.  gk gkfmt.;
run;
Ronein
Onyx | Level 15
As I see, if the value appear in 2 ranges (proc format) then the value will be allocated to first range.
Is it still "multilabel" case??

data ttt;
input x;
cards;
9
9.9
10
15
19
20
25
40
;
run;

proc means data=ttt;
class x;
format x old_fmt.;
Run;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 893 views
  • 1 like
  • 2 in conversation