BookmarkSubscribeRSS Feed
Shonesum
Calcite | Level 5

Hello all, 

How can I add a column into dataset based on 1 column value.  How can I divide each rank into 2 splits or 3 splits. If I used Proc rank. it just splits the dataset into 2/3 equal portion. Is there a way to split the each unique ranks into 2/3 split. In the example below I have splited into 2 splits. 

 

Have:

Test1Test2Rank
Appledd1
Balldd1
Catdd1
Dogdd1
Elephantdd2
dsddd2
sdfdd2
asdfe3
asdfe3
asdfe3
adsfe3

 

want:

Test1Test2RankDesired
Appledd11
Balldd11
Catdd12
Dogdd12
Elephantdd21
dsddd22
sdfdd22
asdfe31
asdfe31
asdfe32
adsfe32

 

 

proc rank data = have out = want group =2;

var = rank;

ranks =desired;

quit;

 

This code gave me :

 

Test1Test2RankDesired
Appledd10
Balldd10
Catdd10
Dogdd10
Elephantdd21
dsddd21
sdfdd21
asdfe31
asdfe32
asdfe32
adsfe32

Thank you in advance.

 

 

7 REPLIES 7
PaigeMiller
Diamond | Level 26

@Shonesum wrote:

Hello all, 

How can I add a column into dataset based on 1 column value.  How can I divide each rank into 2 splits or 3 splits. If I used Proc rank. it just splits the dataset into 2/3 equal portion. Is there a way to split the each unique ranks into 2/3 split. In the example below I have splited into 2 splits. 

 

Have:

Test1 Test2 Rank
Apple dd 1
Ball dd 1
Cat dd 1
Dog dd 1
Elephant dd 2
dsd dd 2
sdf dd 2
asdf e 3
asdf e 3
asdf e 3
adsf e 3

 

want:

Test1 Test2 Rank Desired
Apple dd 1 1
Ball dd 1 1
Cat dd 1 2
Dog dd 1 2
Elephant dd 2 1
dsd dd 2 2
sdf dd 2 2
asdf e 3 1
asdf e 3 1
asdf e 3 2
adsf e 3 2

 


What is the rule that decides that sometimes you get 1/2 of the rows as 1 and 1/2 as 2 for some ranks, and 1/3 versus 2/3 for other ranks? What happens if there are 5 observations in a Rank? Is this done at random or non-random and sequentially?

--
Paige Miller
Shonesum
Calcite | Level 5

Hi Paige it is done sequentially. I solved it by using "by" statement in proc rank 🙂 

PaigeMiller
Diamond | Level 26

@Shonesum wrote:

Hi Paige it is done sequentially. I solved it by using "by" statement in proc rank 🙂 


That's great!

 

Could you please share your code in case someone else reads this thread and wants to know the answer?

--
Paige Miller
Shonesum
Calcite | Level 5

Since there was no numeric column, I added col variable  then by rank.

 

data work.have;
set work.test;
col=_n_;
run;

proc rank data = work.have out = work.want group =2;
by rank;
var  col;

ranks desired;

quit;
PaigeMiller
Diamond | Level 26

Nice! Well done! Smiley Happy

--
Paige Miller
Shonesum
Calcite | Level 5
Thank you 🙂
KachiM
Rhodochrosite | Level 12

You want the "desired" to start from 1. You also desire 2 or 3 groups within each RANK given in your HAVE data set. Here is a Data Step for SUB SPLITTING. If the RANK numbers is less than 6 I made a split of 2 groups otherwise 3 groups. With your example data set, two groups are made. I have added another 6 rows to show for a split of 3 groups.

 

data have;
input Test1 :$8. Test2 :$2. Rank;
datalines;
Apple    dd 1
Ball     dd 1
Cat      dd 1
Dog      dd 1
Elephant dd 2
dsd      dd 2
sdf      dd 2
asdf     e  3
asdf     e  3
asdf     e  3
adsf     e  3
;
run;

The following program splits into 2 groups:

 


data want;
   do _n_ = 1 by 1 until(last.rank);
      set have;
      by rank;
   end;
   ngroups = ceil(ifN(_n_ >= 6, _n_/3, _n_/2));
   do _n_ = 1 by 1 until(last.rank);
      set have;
      by rank;
      if _n_ <= ngroups then desired = 1;
      else desired + mod(_n_, ngroups);
      output;
   end;
drop ngroups;
run;

The statement:

   ngroups = ceil(ifN(_n_ >= 6, _n_/3, _n_/2));

computes the desired number of rows to be present in each split. The following statements

 

      if _n_ <= ngroups then desired = 1;
      else desired + mod(_n_, ngroups);

creates the values for "desired" based on the number of observations per RANK.

 

Is this what you want?

 

Here is the additional observations. The same program works.

 

data have;
input Test1 :$8. Test2 :$2. Rank;
datalines;
Apple    dd 1
Ball     dd 1
Cat      dd 1
Dog      dd 1
Elephant dd 2
dsd      dd 2
sdf      dd 2
asdf     e  3
asdf     e  3
asdf     e  3
adsf     e  3
aaa      g  4
aaa      g  4
aaa      g  4
aaa      g  4
aaa      g  4
aaa      g  4
;
run;

The output of this:

 

Obs 	Test1 	Test2 	Rank 	desired
1 	Apple 	dd 	1 	1
2 	Ball 	dd 	1 	1
3 	Cat 	dd 	1 	2
4 	Dog 	dd 	1 	2
5 	Elephant 	dd 	2 	1
6 	dsd 	dd 	2 	1
7 	sdf 	dd 	2 	2
8 	asdf 	e 	3 	1
9 	asdf 	e 	3 	1
10 	asdf 	e 	3 	2
11 	adsf 	e 	3 	2
12 	aaa 	g 	4 	1
13 	aaa 	g 	4 	1
14 	aaa 	g 	4 	2
15 	aaa 	g 	4 	2
16 	aaa 	g 	4 	3
17 	aaa 	g 	4 	3