Hello all,
How can I add a column into dataset based on 1 column value. How can I divide each rank into 2 splits or 3 splits. If I used Proc rank. it just splits the dataset into 2/3 equal portion. Is there a way to split the each unique ranks into 2/3 split. In the example below I have splited into 2 splits.
Have:
Test1 | Test2 | Rank |
Apple | dd | 1 |
Ball | dd | 1 |
Cat | dd | 1 |
Dog | dd | 1 |
Elephant | dd | 2 |
dsd | dd | 2 |
sdf | dd | 2 |
asdf | e | 3 |
asdf | e | 3 |
asdf | e | 3 |
adsf | e | 3 |
want:
Test1 | Test2 | Rank | Desired |
Apple | dd | 1 | 1 |
Ball | dd | 1 | 1 |
Cat | dd | 1 | 2 |
Dog | dd | 1 | 2 |
Elephant | dd | 2 | 1 |
dsd | dd | 2 | 2 |
sdf | dd | 2 | 2 |
asdf | e | 3 | 1 |
asdf | e | 3 | 1 |
asdf | e | 3 | 2 |
adsf | e | 3 | 2 |
proc rank data = have out = want group =2; var = rank; ranks =desired; quit;
This code gave me :
Test1 | Test2 | Rank | Desired |
Apple | dd | 1 | 0 |
Ball | dd | 1 | 0 |
Cat | dd | 1 | 0 |
Dog | dd | 1 | 0 |
Elephant | dd | 2 | 1 |
dsd | dd | 2 | 1 |
sdf | dd | 2 | 1 |
asdf | e | 3 | 1 |
asdf | e | 3 | 2 |
asdf | e | 3 | 2 |
adsf | e | 3 | 2 |
Thank you in advance.
@Shonesum wrote:
Hello all,
How can I add a column into dataset based on 1 column value. How can I divide each rank into 2 splits or 3 splits. If I used Proc rank. it just splits the dataset into 2/3 equal portion. Is there a way to split the each unique ranks into 2/3 split. In the example below I have splited into 2 splits.
Have:
Test1 Test2 Rank Apple dd 1 Ball dd 1 Cat dd 1 Dog dd 1 Elephant dd 2 dsd dd 2 sdf dd 2 asdf e 3 asdf e 3 asdf e 3 adsf e 3
want:
Test1 Test2 Rank Desired Apple dd 1 1 Ball dd 1 1 Cat dd 1 2 Dog dd 1 2 Elephant dd 2 1 dsd dd 2 2 sdf dd 2 2 asdf e 3 1 asdf e 3 1 asdf e 3 2 adsf e 3 2
What is the rule that decides that sometimes you get 1/2 of the rows as 1 and 1/2 as 2 for some ranks, and 1/3 versus 2/3 for other ranks? What happens if there are 5 observations in a Rank? Is this done at random or non-random and sequentially?
Hi Paige it is done sequentially. I solved it by using "by" statement in proc rank 🙂
@Shonesum wrote:
Hi Paige it is done sequentially. I solved it by using "by" statement in proc rank 🙂
That's great!
Could you please share your code in case someone else reads this thread and wants to know the answer?
Since there was no numeric column, I added col variable then by rank.
data work.have;
set work.test;
col=_n_;
run;
proc rank data = work.have out = work.want group =2;
by rank;
var col;
ranks desired;
quit;
Nice! Well done!
You want the "desired" to start from 1. You also desire 2 or 3 groups within each RANK given in your HAVE data set. Here is a Data Step for SUB SPLITTING. If the RANK numbers is less than 6 I made a split of 2 groups otherwise 3 groups. With your example data set, two groups are made. I have added another 6 rows to show for a split of 3 groups.
data have; input Test1 :$8. Test2 :$2. Rank; datalines; Apple dd 1 Ball dd 1 Cat dd 1 Dog dd 1 Elephant dd 2 dsd dd 2 sdf dd 2 asdf e 3 asdf e 3 asdf e 3 adsf e 3 ; run;
The following program splits into 2 groups:
data want;
do _n_ = 1 by 1 until(last.rank);
set have;
by rank;
end;
ngroups = ceil(ifN(_n_ >= 6, _n_/3, _n_/2));
do _n_ = 1 by 1 until(last.rank);
set have;
by rank;
if _n_ <= ngroups then desired = 1;
else desired + mod(_n_, ngroups);
output;
end;
drop ngroups;
run;
The statement:
ngroups = ceil(ifN(_n_ >= 6, _n_/3, _n_/2));
computes the desired number of rows to be present in each split. The following statements
if _n_ <= ngroups then desired = 1; else desired + mod(_n_, ngroups);
creates the values for "desired" based on the number of observations per RANK.
Is this what you want?
Here is the additional observations. The same program works.
data have; input Test1 :$8. Test2 :$2. Rank; datalines; Apple dd 1 Ball dd 1 Cat dd 1 Dog dd 1 Elephant dd 2 dsd dd 2 sdf dd 2 asdf e 3 asdf e 3 asdf e 3 adsf e 3 aaa g 4 aaa g 4 aaa g 4 aaa g 4 aaa g 4 aaa g 4 ; run;
The output of this:
Obs Test1 Test2 Rank desired 1 Apple dd 1 1 2 Ball dd 1 1 3 Cat dd 1 2 4 Dog dd 1 2 5 Elephant dd 2 1 6 dsd dd 2 1 7 sdf dd 2 2 8 asdf e 3 1 9 asdf e 3 1 10 asdf e 3 2 11 adsf e 3 2 12 aaa g 4 1 13 aaa g 4 1 14 aaa g 4 2 15 aaa g 4 2 16 aaa g 4 3 17 aaa g 4 3
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.