turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- Help - Using Proc Rank With Two Datasets

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-18-2012 11:41 AM

Hello,

I am using proc rank to divide a dataset into 20 buckets. The code I'm using is similar to:

proc rank data=dataset_a out=dataset_a_ranked groups=20 ties=low;

by &segment_var;

var &score_var;

ranks score_bucket;

run;

I would like to take the ranks (or buckets) created on dataset_a and apply them to dataset_b. Is there any way to:

1. Use proc rank to output not only the rank (or bucket) of each observation, but also the minimum and maximum values of that rank (or bucket)? For instance,

rank 0 has a minimum value of 1 and a maximum value of 10

rank 1 has a minimum value of 11 and a maximum value of 19

...

rank n has a minimum value of x and a maximum value of y

2. Use the output from proc rank and apply the same ranks from dataset_a and to a second dataset_b? If I use proc rank on dataset_a and then on dataset_b, I will end up with different ranks (or buckets).

I'm thinking there must be a simple trick to doing this. Thanks for any help you could give.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to josh_wander

09-18-2012 12:23 PM

hi ... one approach, make a format using the"rules" from the first set of ranks ... here's an example ...

*** use ONLY MALES ... find ranks for age in sashelp.class**

**proc rank data=sashelp.class (where=(sex='M')) out=mranks groups=3;**

**var age;**

**ranks rage;**

**run;**

*** create a format using min and max values of age in each rank for MALES;**

**proc sql;**

**create view fmt as**

**select "ranks" as fmtname, min(age) as start, max(age) as end, catt(rage) as label**

**from mranks**

**group by rage;**

**quit;**

**proc format cntlin=fmt;**

**run;**

*** assign ranks to ONLY FEMALES based on rank rules found for MALES;**

**data franks;**

**set sashelp.class;**

**where sex eq 'F';**

**rage = input(put(age,ranks.),1.);**

**run;**

*** take a look;**

**proc print data=franks noobs;**

**var name age rage;**

**run;**

**Name Age rage**

**Joyce 11 0**

**Jane 12 0**

**Louise 12 0**

**Alice 13 1**

**Barbara 13 1**

**Carol 14 1**

**Judy 14 1**

**Janet 15 2**

**Mary 15 2**

in case there are values in the second data set that are not present in the first data set and cannot be assigned to a rank with the format, an "OTHER" condition can be added to the format that will assign a missing value for the rank in the second data set ...

**proc sql;**

**create view fmt as**

**select "ranks" as fmtname, min(age) as start, max(age) as end, catt(rage) as label**

**from mranks**

**group by rage**

**outer union corr**

**select distinct "ranks" as fmtname, 99 as start, 99 as end, ' ' as label, "o" as hlo from mranks**

**quit;**

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MikeZdeb

09-18-2012 01:43 PM

Mike,

Thank you very much. Your answer is very helpful and with some quick testing, it appears to be what I need. To accommodate 20 buckets, instead of 10 buckets, I modified this line:

From:

**rage = input(put(age,ranks.),1.);**

To:

**rage = input(put(age,ranks.),10.);**

Since I am relatively weak in writing sql code, I have an additional question. I need to use a "by" variable in my proc rank. My dataset_a contains 4 different segments and I use a macro variable called &segment_var to delineate them. I then use proc rank and the "by" variable to create 20 ranks (or buckets) for each segment. Is there any way for me to modify your sql code to account for the different ranks within each segment?

Thanks, again.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to josh_wander

09-18-2012 12:59 PM

1. Use proc rank to output not only the rank (or bucket) of each observation, but also the minimum and maximum values of that rank (or bucket)? For instance,

rank 0 has a minimum value of 1 and a maximum value of 10

rank 1 has a minimum value of 11 and a maximum value of 19

...

rank n has a minimum value of x and a maximum value of y

What happens when the item in dataset 2 has a value of 10.5? More generally, the ranks/buckets created from data set one won't come out so nice and clean as your example, it might be more along the lines of

rank 0 has a minimum value of 1 and a maximum value of 7

rank 1 has a minimum value of 14 and a maximum value of 23

Now what do you do when you get a value of 10?

I think your entire concept is flawed, as empirical buckets don't have hard and fast upper and lower limits, they have empirical limits, as I have described, and will result in problems if you try to apply this to a 2nd data set. The idea of creating buckets from continuous variables is also a flawed idea, in my opinion, and you might want to re-think what you are doing with this data. If you explain the long-term analysis goal, as well as what the data is, we might be able to propose more meaningful analysis techniques.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to josh_wander

09-18-2012 02:07 PM

hi ... here's another approach to accommodate PROC RANK that uses a by-groups

you can make one format per by-group in one data set, then use PUTN in a data step to apply the correct format based on the by-variable value in the second data set

*** some fake data ... 2 genders, age ranges from 1 to 99, 10 segments;**

**data fake;**

**do _n_ = 1 to 10000;**

** gender = rantbl(999,0.5);**

** age = ceil(99*ranuni(999));**

** segment = ceil(10*ranuni(999));**

** output;**

**end;**

**run;**

*** compute ranks for males (gender = 1) in groups by segment (the by variable);**

**proc sort data=fake (where=(gender=1) out=males (drop=gender);**

**by segment;**

**run;**

**proc rank data=males out=mranks groups=20;**

**var age;**

**ranks rage;**

**by segment;**

**run;**

*** create 10 formats ... one for each value of segment (ranks1_, ranks2_, ... , ranks10_);**

**proc sql;**

**create view fmt as**

**select catt("ranks",segment,"_") as fmtname, min(age) as start, max(age) as end, catt(rage) as label**

**from mranks**

**group by segment, rage;**

**quit;**

**proc format cntlin=fmt fmtlib;**

**run;**

*** apply the 10 formats to the data for females;**

**data franks (drop=gender);**

**set fake;**

**where gender eq 2;**

*** use PUTN to vary the format that is being used based on the value of the variable segment;**

**rage = input(putn(age,catt('ranks',segment,'_')),2.);**

**run;**

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MikeZdeb

10-01-2012 01:40 PM

This works.

Thank you very much for you help, Mike.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to josh_wander

10-01-2012 02:46 PM

hi ... you're welcome ... so was that a correct answer ???

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MikeZdeb

10-01-2012 02:55 PM

Yes, it was. I tested it earlier this morning.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to josh_wander

10-01-2012 03:17 PM

Please mark you quetion as answered. Thank you!