Group the data sequencially based on ranks

Reply
Contributor
Posts: 40

Group the data sequencially based on ranks

Hi,

I have divided the data into 10 groups based on ranks of a numerical variable ("var" here) - using proc rank.

Now I want to group the data into two groups sequentially based on the ranks:

The first grouping would be: the first ranking group (o here) as one group, others as the other group;

The second grouping: the first two ranking groups as one group (0 and 1 here), other as the other group;

as so on, until the first 9 ranks is one group and the last rank is the other group;

I also want the suffix of variable name is the ranking value. Or I may don't need the new group variable as long as I can identify the new groups.,

the data i have and what i want are simplified below:

==========================

data have;

input class $ var rank;

datalines;

a 1  0

b 3  2

a 5  2

a 8  3

b 2  1

a 1  0

b 1  0

a 9  3

b 10 4

a 21 6

b 45 8

a 19 5

a 28 6

b 31 7

a 56 8

b 61 9

a 57 9

a 43 7

b 18 5

b 17 4

;

run;

data want;

input class $ var rank group0 $ group1 $;

datalines;

a 1  0 <=0 <=1

b 3  2 >0 >1

a 5  2 >0 >1

a 8  3 >0 >1

b 2  1 >0 <=1

a 1  0 <=0 <=1

b 1  0 <=0 <=1

a 9  3 >0 >1

b 10 4 >0 >1

a 21 6 >0 >1

b 45 8 >0 >1

a 19 5 >0 >1

a 28 6 >0 >1

b 31 7 >0 >1

a 56 8 >0 >1

b 61 9 >0 >1

a 57 9 >0 >1

a 43 7 >0 >1

b 18 5 >0 >1

b 17 4 >0 >1

;

run;

=============================

thanks for help.

Z

Super User
Posts: 11,343

Re: Group the data sequencially based on ranks

You don't say what you'll do with this data but you may not need to add any variables at all. A group of custom formats applied to your rank variable may be all that you need unless you are explicitly going to do something with group1 and group0, in which case the formats are an easy way to create the variables.

proc format;

value group0

0 = '<=0'

1 - 9 = '>0';

value group1

0, 1 = '<=1'

2-9  = '>1'

;

value group2

0 - 2 = '<=2'

3- 9 = '>2'

;

/* continue in the hopefully obvious manner'

run;

data want;

     set have;

     group0 = put (rank,group0.); /*to create variables using the formats*/

     group1 = put 9rank,group1.);

etc.

run;

Or

proc freq data=have;

     tables rank;

     format rank group0.;

run;

Most analysis programs will honor the format for analysis.

Ask a Question
Discussion stats
  • 1 reply
  • 137 views
  • 0 likes
  • 2 in conversation