Hello all,
How can I add a column into dataset based on 1 column value. How can I divide each rank into 2 splits or 3 splits. If I used Proc rank. it just splits the dataset into 2/3 equal portion. Is there a way to split the each unique ranks into 2/3 split. In the example below I have splited into 2 splits.
Have:
| Test1 | Test2 | Rank |
| Apple | dd | 1 |
| Ball | dd | 1 |
| Cat | dd | 1 |
| Dog | dd | 1 |
| Elephant | dd | 2 |
| dsd | dd | 2 |
| sdf | dd | 2 |
| asdf | e | 3 |
| asdf | e | 3 |
| asdf | e | 3 |
| adsf | e | 3 |
want:
| Test1 | Test2 | Rank | Desired |
| Apple | dd | 1 | 1 |
| Ball | dd | 1 | 1 |
| Cat | dd | 1 | 2 |
| Dog | dd | 1 | 2 |
| Elephant | dd | 2 | 1 |
| dsd | dd | 2 | 2 |
| sdf | dd | 2 | 2 |
| asdf | e | 3 | 1 |
| asdf | e | 3 | 1 |
| asdf | e | 3 | 2 |
| adsf | e | 3 | 2 |
proc rank data = have out = want group =2; var = rank; ranks =desired; quit;
This code gave me :
| Test1 | Test2 | Rank | Desired |
| Apple | dd | 1 | 0 |
| Ball | dd | 1 | 0 |
| Cat | dd | 1 | 0 |
| Dog | dd | 1 | 0 |
| Elephant | dd | 2 | 1 |
| dsd | dd | 2 | 1 |
| sdf | dd | 2 | 1 |
| asdf | e | 3 | 1 |
| asdf | e | 3 | 2 |
| asdf | e | 3 | 2 |
| adsf | e | 3 | 2 |
Thank you in advance.
@Shonesum wrote:
Hello all,
How can I add a column into dataset based on 1 column value. How can I divide each rank into 2 splits or 3 splits. If I used Proc rank. it just splits the dataset into 2/3 equal portion. Is there a way to split the each unique ranks into 2/3 split. In the example below I have splited into 2 splits.
Have:
Test1 Test2 Rank Apple dd 1 Ball dd 1 Cat dd 1 Dog dd 1 Elephant dd 2 dsd dd 2 sdf dd 2 asdf e 3 asdf e 3 asdf e 3 adsf e 3
want:
Test1 Test2 Rank Desired Apple dd 1 1 Ball dd 1 1 Cat dd 1 2 Dog dd 1 2 Elephant dd 2 1 dsd dd 2 2 sdf dd 2 2 asdf e 3 1 asdf e 3 1 asdf e 3 2 adsf e 3 2
What is the rule that decides that sometimes you get 1/2 of the rows as 1 and 1/2 as 2 for some ranks, and 1/3 versus 2/3 for other ranks? What happens if there are 5 observations in a Rank? Is this done at random or non-random and sequentially?
Hi Paige it is done sequentially. I solved it by using "by" statement in proc rank 🙂
@Shonesum wrote:
Hi Paige it is done sequentially. I solved it by using "by" statement in proc rank 🙂
That's great!
Could you please share your code in case someone else reads this thread and wants to know the answer?
Since there was no numeric column, I added col variable then by rank.
data work.have;
set work.test;
col=_n_;
run;
proc rank data = work.have out = work.want group =2;
by rank;
var col;
ranks desired;
quit;
Nice! Well done! ![]()
You want the "desired" to start from 1. You also desire 2 or 3 groups within each RANK given in your HAVE data set. Here is a Data Step for SUB SPLITTING. If the RANK numbers is less than 6 I made a split of 2 groups otherwise 3 groups. With your example data set, two groups are made. I have added another 6 rows to show for a split of 3 groups.
data have; input Test1 :$8. Test2 :$2. Rank; datalines; Apple dd 1 Ball dd 1 Cat dd 1 Dog dd 1 Elephant dd 2 dsd dd 2 sdf dd 2 asdf e 3 asdf e 3 asdf e 3 adsf e 3 ; run;
The following program splits into 2 groups:
data want;
do _n_ = 1 by 1 until(last.rank);
set have;
by rank;
end;
ngroups = ceil(ifN(_n_ >= 6, _n_/3, _n_/2));
do _n_ = 1 by 1 until(last.rank);
set have;
by rank;
if _n_ <= ngroups then desired = 1;
else desired + mod(_n_, ngroups);
output;
end;
drop ngroups;
run;
The statement:
ngroups = ceil(ifN(_n_ >= 6, _n_/3, _n_/2));
computes the desired number of rows to be present in each split. The following statements
if _n_ <= ngroups then desired = 1;
else desired + mod(_n_, ngroups);
creates the values for "desired" based on the number of observations per RANK.
Is this what you want?
Here is the additional observations. The same program works.
data have; input Test1 :$8. Test2 :$2. Rank; datalines; Apple dd 1 Ball dd 1 Cat dd 1 Dog dd 1 Elephant dd 2 dsd dd 2 sdf dd 2 asdf e 3 asdf e 3 asdf e 3 adsf e 3 aaa g 4 aaa g 4 aaa g 4 aaa g 4 aaa g 4 aaa g 4 ; run;
The output of this:
Obs Test1 Test2 Rank desired 1 Apple dd 1 1 2 Ball dd 1 1 3 Cat dd 1 2 4 Dog dd 1 2 5 Elephant dd 2 1 6 dsd dd 2 1 7 sdf dd 2 2 8 asdf e 3 1 9 asdf e 3 1 10 asdf e 3 2 11 adsf e 3 2 12 aaa g 4 1 13 aaa g 4 1 14 aaa g 4 2 15 aaa g 4 2 16 aaa g 4 3 17 aaa g 4 3
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.