BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Ronein
Onyx | Level 15

Hello

I want to calculate percentiles (P50,P70,P80,P90)  for each group.

The group is defined by the fields old and gk.

There is a problem in the result of the following code:

I get duplication of groups.

For example: For  gk='(a) 2-3' I get 2 rows with old='(a) 0-10'

What is the way to solve it please?

Why did it happen?

Ronein_0-1654497618995.png

 

 


data have;
input gk old New;
cards;
2 10 15
2 15 22
2 20 25
2 17 19
2 19 17
2 31 35
2 35 40
2 27 30
2 18 18
2 17 19
3 72 72
3 12 10
3 19 20
3 10 15
3 9 10
3 42 45
3 35 30
5 37 40
5 27 30
5 29 29
5 32 33
5 35 36
5 65 70
5 9 10
5 13 12
5 18 15
;
run;

proc format;
value gkfmt
2,3='(a) 2-3'
5='(b) 5'
;
Run;

proc format;
value old_fmt
0-10='(a) 0-10'
10-20='(b) 10-20'
20-high='(d) 20+'
;
Run;

proc sort data=have;
by gk old;
Run;

proc summary data=have;
by gk old ;
var new: ;
output out=want(drop=_type_) p50= p70= P80=  P90=/autoname;
format  old old_fmt.  gk gkfmt.;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

In oldfmt, the values 10 and 20 appear in two ranges each, so the multilabel-capable PROC SUMMARY will honor that. This may cause incorrect calculations.

 

Regarding the duplicates:

By sorting and using BY instead of CLASS, the combination of 2 and 10 appears before 2 and 15, but much later you get 3 and 10, which falls into the same formatted range as  and 10. But then the values for the group have already been calculated once, so you get a second entry.

 

Fix your format definition, and use CLASS and the NWAY option in the PROC SUMMARY statement.

View solution in original post

3 REPLIES 3
Kurt_Bremser
Super User

In oldfmt, the values 10 and 20 appear in two ranges each, so the multilabel-capable PROC SUMMARY will honor that. This may cause incorrect calculations.

 

Regarding the duplicates:

By sorting and using BY instead of CLASS, the combination of 2 and 10 appears before 2 and 15, but much later you get 3 and 10, which falls into the same formatted range as  and 10. But then the values for the group have already been calculated once, so you get a second entry.

 

Fix your format definition, and use CLASS and the NWAY option in the PROC SUMMARY statement.

Ronein
Onyx | Level 15

Thanks,I write the solution by your advice


data have;
input gk old New;
cards;
2 10 15
2 15 22
2 20 25
2 17 19
2 19 17
2 31 35
2 35 40
2 27 30
2 18 18
2 17 19
3 72 72
3 12 10
3 19 20
3 10 15
3 9 10
3 42 45
3 35 30
5 37 40
5 27 30
5 29 29
5 32 33
5 35 36
5 65 70
5 9 10
5 13 12
5 18 15
;
run;
proc format;
value gkfmt
2,3='(a) 2-3'
5='(b) 5'
;
Run;
proc format;
value old_fmt
0-10='(a) 0-10'
10<-20='(b) 10-20'
20<-high='(d) 20+'
;
Run;

/**Way1**/
proc summary data=have nway;
class gk old ;
var new ;
output out=want(drop=_type_) p50= p70= P80=  P90=/autoname;
format  old old_fmt.  gk gkfmt.;
run;


/**Way2**/
proc sort data=have;
by gk old ;
Run;
proc summary data=have nway;
class gk old ;
var new ;
output out=want(drop=_type_) p50= p70= P80=  P90=/autoname;
format  old old_fmt.  gk gkfmt.;
run;
Ronein
Onyx | Level 15
As I see, if the value appear in 2 ranges (proc format) then the value will be allocated to first range.
Is it still "multilabel" case??

data ttt;
input x;
cards;
9
9.9
10
15
19
20
25
40
;
run;

proc means data=ttt;
class x;
format x old_fmt.;
Run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 974 views
  • 1 like
  • 2 in conversation