BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
shasank
Quartz | Level 8

Hi SAS Users,

 

I would like your help with the following. 

 

Raw data:

data have;
input ID ;
cards;
11
11
11
11
11
12
12
12
12
12
;
run;

 

I would like to assign ranks 1 - 5 to the above rows. Each ID should have all the ranks but randomly assigned. 

 

 

Any help is appreciated. 

Thank you for your time. 

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21
data temp;
set have;
rnd = rand("uniform");
run;

proc rank data=temp out=want;
by id;
var rnd;
run;
PG

View solution in original post

12 REPLIES 12
Reeza
Super User
Please show an example of what you expect?
Should 1 to 5 be assigned randomly to each row? or by ID so that each ID has the same random value?
shasank
Quartz | Level 8
Each ID has 5 rows so 1 - 5 ranks must be randomly assigned and not repeated. in the same ID
shasank
Quartz | Level 8
Example:
11 2
11 1
11 4
11 3
11 5
12 5
12 1
12 4
12 3
12 2
Reeza
Super User
data temp;
set have;
by ID;
if first.ID then order=1; else order+1;

x=rand('normal', 50, 4);
run;

proc sort data=temp;
by ID x;
run;

data want;
set have;
by id;
if first.id then rank=1; else rank+1;
run;

proc sort data=want;
by id order;
run;

Or essentially use PROC SURVEYSELECT with a 100% samprate.

 

 

shasank
Quartz | Level 8

Thank you for your time. This worked too. 

PGStats
Opal | Level 21
data temp;
set have;
rnd = rand("uniform");
run;

proc rank data=temp out=want;
by id;
var rnd;
run;
PG
novinosrin
Tourmaline | Level 20

Nice one Sir @PGStats  . I am just wondering what are chances of duplication caused by rand('uniform')? I did the read the docs, but something is not registering in my head.

@Reeza, Please chime in too. I need this education plz

Reeza
Super User

An exact duplicate that would cause RANK to consider them the same value?

Very, very small IMO to the point of negligible. 

 

This wouldn't be an issue with my solution, which is more code than @PGStats , his is definitley more elegant. I'm pretty sure that there should be a way with PROC SURVEYSELECT.

 

 

 

 

PGStats
Opal | Level 21

The likelyhood of getting exactly the same random value within 5 consecutive calls is very very small and depends on the chosen RNG. Read

 

https://documentation.sas.com/?docsetId=lefunctionsref&docsetTarget=p026ygl6toz3tgn14lt4iu6cl5bb.htm... 

PG
novinosrin
Tourmaline | Level 20

Marvelous documentation link. Jeez! how I wish I had read that earlier. Hmm, There's a time for everything. @PGStats  Needless to mention my usual i.e "Prodigygeniusstats" as you have cycled faster to grasp the cycle lengths. Kudos! and Thank you!!!

FreelanceReinh
Jade | Level 19

@novinosrin wrote:

I am just wondering what are chances of duplication caused by rand('uniform')? I did the read the docs, but something is not registering in my head.


Hi @novinosrin: Good question. According to section Duplicate Values in the RAND function documentation (which I think is even more relevant to this than the information about cycle lengths) the default 32-bit Mersenne Twister algorithm used here would result in at least one duplicate rank for the OP's sample data with an approximate probability of 1-(1-25/2**33)**2≈5.82E-9 (assuming independence between BY groups), which is indeed "very, very small."

 

However, if the real HAVE dataset was much larger, say, 10,000 IDs with 50 observations each, the corresponding probability would be 1-(1-2500/2**33)**10000≈0.002906 and merely ceil(-0.0001/log2(1-2500/2**33))=239 independent experiments of this size would suffice to observe a duplicate rank with a probability >=50%.

 

Let's try it:

data temp;
call streaminit(27182818);
do r=1 to 239;
  do id=1 to 10000;
    do _n_=1 to 50;
      rnd = rand("uniform");
      output;
    end;
  end;
end;
run;

proc rank data=temp out=want;
by r id;
var rnd;
run;

proc freq data=want;
tables rnd;
run;

Partial PROC FREQ output shows that it really happened (and, no, I did not tinker with the random seed):

                                Cumulative    Cumulative
rnd    Frequency     Percent     Frequency      Percent
--------------------------------------------------------
  1     2390000        2.00       2390000         2.00
  2     2390000        2.00       4780000         4.00
  3     2389999        2.00       7169999         6.00
3.5           2        0.00       7170001         6.00
  4     2389999        2.00       9560000         8.00
  5     2390000        2.00      11950000        10.00
  6     2390000        2.00      14340000        12.00

Find the culprit:

proc print data=want;
where rnd=3.5;
run;

proc print data=temp;
where r=147 & id=8807;
format rnd best16.;
run;

Voilà:

Obs     r      id                  rnd

 73440301    147    8807    0.91091977874748
 73440302    147    8807    0.04118569614366
 73440303    147    8807    0.14874891191721
 73440304    147    8807    0.93303100718185
 73440305    147    8807    0.04243680997752
 73440306    147    8807    0.23993166047148
 73440307    147    8807    0.46066979528405
 73440308    147    8807    0.32410094887018
 73440309    147    8807    0.09882239229046
 73440310    147    8807    0.44069946813397
 73440311    147    8807    0.56899999175221
 73440312    147    8807    0.04243680997752
 73440313    147    8807    0.79902740451507
 ...

The HEX16. representations of the two highlighted numbers are 3FA5BA470DFFFFFF and in fact the representations of all 119.5 million RND values end in "FFFFF" showing their 32-bit nature. So, one of the available 64-bit RNGs would be a better choice if the risk of duplicates is an issue, e.g.,

 call streaminit('PCG', 27182818);

 

novinosrin
Tourmaline | Level 20

Good morning/afternoon Sir @FreelanceReinh  Thank you and I really feel obliged for that beautiful description. I should pay for it : ) Understood well and clear. Cheers! 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 2461 views
  • 8 likes
  • 5 in conversation