Rank, 95th rank and 95th percentile datastep needed

Reply
Contributor
Posts: 59

Rank, 95th rank and 95th percentile datastep needed

Hey all, Please help me with a Data step to rank, compute 95th rank.

I have a table

Customer id                            Sales

e235                                       6767            

e235                                       4343

e235                                       5767

y234                                      3565

a123                                      7979

e345                                       8978

I want to rank the sales column for each customerid and also calculate the 95th rank.

95thrank= count(sales_ranks)*0.95;

My SAS code:

Proc sort data=have;

                By customerid;

Run;

Proc means data=have;

                By customerid;

                Var sales;

                Output out=counts  N=nbr_records;

Run;

Proc rank data=have;

                By customerid;

                Var sales;

                Ranks sales_ranks;

Run;

Data sales_95;

                Merge have (in=a) counts(in=b);

                By customerid;

                Where sales_ranks= int(.95*nbr_records);

Run;

I am sure there are other better efficient ways to do it. Can you guys share anything better than mine?

Many thanks,

Andy

Super User
Posts: 5,498

Re: Rank, 95th rank and 95th percentile datastep needed

Andygray,

Here's one way ...

proc sort data=have;

   by customerid sales;

run;

data want;

   nbr_records=0;

   do until (last.customerid);
        set have (keep=customerid);

        by customerid;

        nbr_records + 1;

   end;

   nbr_found=0;

   record_wanted = int(.95 * nbr_records);

   do until (last.customerid);

       set have;

       by customerid;

       nbr_found + 1;

       if nbr_found = record_wanted then output;

   end;

run;

I replicated your formula here, but note that you may want to adjust it (possibly using CEIL instead of INT).  Customers with only one purchase will not have any records selected.

Good luck.

Respected Advisor
Posts: 4,919

Re: Rank, 95th rank and 95th percentile datastep needed

Posted in reply to Astounding

Or, simplified and adjusted for the n=1 case :

proc sort data=have;

   by customerid sales;

run;

data want;

   do nbr_records=1 by 1 until (last.customerid);

        set have; by customerid;

     end;

   do nbr_found = 1 by 1 until (last.customerid);

       set have; by customerid;

       if nbr_found = round(0.95*nbr_records) then output;

       end;

run;

PG

PG
Contributor
Posts: 59

Re: Rank, 95th rank and 95th percentile datastep needed

HI Pgstats,

Thanks for your input. I really appreciate it. Can you give me a solution to compute the average last month sales per customer for all customers and the average first month sales per customer for all customers from the original dataset above.

My difficulty is identifying the correct date/month function to filter out which is the first month for customer and which is the last month for a customer. Or is there any better way to do it?

Hope you can help. Thanks.

Andy

Contributor
Posts: 59

Re: Rank, 95th rank and 95th percentile datastep needed

Posted in reply to Astounding

Hi Pgstats,

Thanks for your input. I would like to compute the average sales of last month sales per customer for all customers and first month sales per customer and for all customers. Can you give me a solution?  

My difficulty is indentifying the correct Date/month function to filter out which is the first month for customer and which is the last month for customer and then take the average. Or is there any other better way to do it?

Thanks,

Andy

Contributor
Posts: 59

Re: Rank, 95th rank and 95th percentile datastep needed

Posted in reply to Astounding

Hi Astounding,

Thanks for your input. Really appreciate it. Can you give me a solution to compute the average last month sales of per customer and for all customers and the average first month sales per customer (and for all customers) from the original dataset above.

My difficulty is identifying the correct date/month function to filter out which is the first month for customer and which is the last month for a customer. Or is there any better way to do it?

Hope you can help, Thanks.

Andy

Super User
Posts: 5,498

Re: Rank, 95th rank and 95th percentile datastep needed

Your sample data doesn't include any variable to indicate the date.  You'll need to tell us about that.

Do you really need average sales in first/last month, or do you just need to know which is the first month and which is the last month?

Contributor
Posts: 59

Re: Rank, 95th rank and 95th percentile datastep needed

Posted in reply to Astounding

Hi Astounding, Sorry that I missed out to mention the date variable. Thats very silly that i forgot to mention it. Well yeah there is a record of sales date in ddmmyy in the database. I wanted to see customer pattern and retention studies. Lately in retail research it is widely helpful to study pattern to retain customers. Anyway, So yeah there is a data variable with date values in ddmmyyyy.

Here is my code that i successfully ran, however not happy that i am using more SQL as opposed to my wishes of doing it in the powerful datastep:

Proc sql;

create table want as

select t.customerid, avg(sales) as avg

from have t

inner join (select customerid, Max(substr (chardate,1,2) as mx_date /*Date var converted to char using put function to jump the error argument*/

from have

group by customerid

) b

on (substr (char, 1,2)= b.mx_date and t.customerid=b.customerid)

group by t.customerid

order by 1;

quit;

The above is what i wrote for last month average sales and changing it to MIN( ) should give me the result for first month.

Sure it ran fine, However i am not 100% confident and rather i am seeking the solution in datastep.  can???

Thanks,

Andy

Respected Advisor
Posts: 4,919

Re: Rank, 95th rank and 95th percentile datastep needed

SUBSTR(chardate,1,2) would give you the day, not the month of DDMMYYYY... So, assuming that chardate is formatted as  "MMDDYYYY" instead, you could get the monthly 95th percentile sales from the first and last month of each customer like this :


data have0;
set have;
month = put(input(chardate,mmddyy10.),yymmn.); /* Change mmddyy10. to suit your data */
run;

proc sort data=have0;
   by customerid month sales;
run;

data mm(keep=customerId month firstMonth lastMonth);
set have0;
by customerId;
firstMonth = first.customerId;
lastMonth = last.customerId;
if first.customerId or last.customerId;
run;

data have1;
merge have0 mm(in=ok);
by customerId month;
if ok;
run;

data want;
   do nbr_records=1 by 1 until (last.month);
        set have1; by customerid month;
     end;
   do nbr_found = 1 by 1 until (last.month);
       set have1; by customerid month;
       if nbr_found = round(0.95*nbr_records) then output;
       end;
run;

PG

PG
Ask a Question
Discussion stats
  • 8 replies
  • 972 views
  • 6 likes
  • 3 in conversation