Tough Question: How to compare data to prior year's data

Frequent Contributor
Posts: 101

Tough Question: How to compare data to prior year's data


I have another question.  Again, thank you for all of your help, I am learning so much!!

I have sasdata that takes the following form:

ID                           YEAR                           SALES

001                         2001                              100

002                         2001                              50

003                         2001                              75

004                         2001                              150

...                              .....                                ....

001                         2002                              15

002                         2002                              20

003                         2002                              75

004                         2002                              100

....                              .....                              .....

001                         2003                              150

002                         2003                              200

003                         2003                              75

004                         2003                              100

.....                              .....                              .....

So basically I have a list of how each ID performed every year.  What I want to do is compare the performance of any given firm in any given year against the performance of ALL FIRMS IN THE PRIOR YEAR.  So if there is a firm that sold 150 in 2002, I want to know where that would have ranked among how all firms did in 2001.

It has been suggested that I do this using PROC UNIVARIATE?  But I can't figure out how to do that?

Thanks so much!

Super User
Posts: 5,876

Re: Tough Question: How to compare data to prior year's data

Posted in reply to mahler_ji

First, what is the requirement for the comparison? Mean, median...?

You could use PROC UNIVARIATE, MEANS/SUMMARY to get year totals. Then use SQL to match it back to the firms sales data (year = year - 1).

Data never sleeps
Super Contributor
Posts: 644

Re: Tough Question: How to compare data to prior year's data

I think you can meet this somewhat odd requirement by first ranking all ids within each year (proc rank with year as a by variable).  Select the rank option to give the top rank (=1) to the best performer.  Then use SQL to join back the ranked data with the original dataset, where rank.year = (data.year - 1) and select max(rank.rank) where rank.sales >= data.sales (for each id in the chosen year, select the maximum rank [of all ids] in the previous year where the maximum value of last year's sales is greater or equal to the sale for that id in the chosen year). 

Someone with access to SAS this weekend should be able to code it.


Frequent Contributor
Posts: 101

Re: Tough Question: How to compare data to prior year's data


Thank you for your reply.  The requirement for the comparison is where the observation for the current year would fall on the distribution from the prior year, on a decile or quartile basis.  In essence, I am trying to find which decile/quartile the current year observation would have been if it would have happened in the prior year.  And I need to do this comparison for all years.

I was told that I need to use PROC UNIVARIATE, but when reading the instructions on the SAS website, I don't understand how to use the UNIVARIATE and what the arguments are, etc.

Any help is greatly appreciated.

Super Contributor
Posts: 644

Re: Tough Question: How to compare data to prior year's data

Posted in reply to mahler_ji


It would have been helpful to have used the term 'quartile' in your original post rather than stating it in terms of ranking.  Proc univariate will provide quartile information, but so will proc summary / proc means, probably easier to use in this context.  However,

Proc Univariate data = have ;  /* have is your existing dataset     */

     By year ;                         /* you can use Class instead of By if your data is not sorted and not too big */

     var     sales ;

     Output out = quartiles     q1     =     salesq1

                                   median     =     salesq2

                                        q3       =     salesq3


Run ;

Now since the quartliles are in the same row the easiest way to proceed is with datasteps, assuming your have data is at least sorted by year.  First adjust the year value in the quartiles to allow matching for the previous year

Data quartiles_adj ;

     By year ;

     Year + 1 ;

Run ;

Data want ;

     merge     have



     By      year ;

     Retain firstyear ;

     if     _N_     =     1     then firstyear = year ;

     Select ;

          When (year = firstyear)    Prev_quartile = . ;

          When (sales >= salesq1) Prev_quartile = 1 ;

          When (sales >= salesq2) Prev_quartile = 2 ;

          When (sales >= salesq3) Prev_quartile = 3 ;

          Otherwise                       Prev_quartile = 4 ;

     End ;

     Drop salesq1 - salesq3  firstyear;

Run ;

[this code is untested]

Using SQL would avoid the first datastep so if you are more familiar with SQL it would look like this

Proc SQL ;

     Create table want as

          Select h.*

               ,     q.sales:

               ,     case

                         when sales >= salesq1 then 1

                         when sales >= salesq2 then 2

                         when sales >= salesq3 then 3

                         else     4

                    end     as     Prev_quartile

          From     have          h

               left join

               ,       quartiles     q

               on     h.year     =     (q.year - 1)

          Order     by


                  ,     year


Quit ;

[also untested]


Ask a Question
Discussion stats
  • 4 replies
  • 3 in conversation