01-24-2014 10:58 AM
I have another question. Again, thank you for all of your help, I am learning so much!!
I have sasdata that takes the following form:
ID YEAR SALES
001 2001 100
002 2001 50
003 2001 75
004 2001 150
... ..... ....
001 2002 15
002 2002 20
003 2002 75
004 2002 100
.... ..... .....
001 2003 150
002 2003 200
003 2003 75
004 2003 100
..... ..... .....
So basically I have a list of how each ID performed every year. What I want to do is compare the performance of any given firm in any given year against the performance of ALL FIRMS IN THE PRIOR YEAR. So if there is a firm that sold 150 in 2002, I want to know where that would have ranked among how all firms did in 2001.
It has been suggested that I do this using PROC UNIVARIATE? But I can't figure out how to do that?
Thanks so much!
01-24-2014 11:10 AM
First, what is the requirement for the comparison? Mean, median...?
You could use PROC UNIVARIATE, MEANS/SUMMARY to get year totals. Then use SQL to match it back to the firms sales data (year = year - 1).
01-24-2014 09:17 PM
I think you can meet this somewhat odd requirement by first ranking all ids within each year (proc rank with year as a by variable). Select the rank option to give the top rank (=1) to the best performer. Then use SQL to join back the ranked data with the original dataset, where rank.year = (data.year - 1) and select max(rank.rank) where rank.sales >= data.sales (for each id in the chosen year, select the maximum rank [of all ids] in the previous year where the maximum value of last year's sales is greater or equal to the sale for that id in the chosen year).
Someone with access to SAS this weekend should be able to code it.
01-25-2014 10:54 AM
Thank you for your reply. The requirement for the comparison is where the observation for the current year would fall on the distribution from the prior year, on a decile or quartile basis. In essence, I am trying to find which decile/quartile the current year observation would have been if it would have happened in the prior year. And I need to do this comparison for all years.
I was told that I need to use PROC UNIVARIATE, but when reading the instructions on the SAS website, I don't understand how to use the UNIVARIATE and what the arguments are, etc.
Any help is greatly appreciated.
01-25-2014 08:09 PM
It would have been helpful to have used the term 'quartile' in your original post rather than stating it in terms of ranking. Proc univariate will provide quartile information, but so will proc summary / proc means, probably easier to use in this context. However,
Proc Univariate data = have ; /* have is your existing dataset */
By year ; /* you can use Class instead of By if your data is not sorted and not too big */
var sales ;
Output out = quartiles q1 = salesq1
median = salesq2
q3 = salesq3
Now since the quartliles are in the same row the easiest way to proceed is with datasteps, assuming your have data is at least sorted by year. First adjust the year value in the quartiles to allow matching for the previous year
Data quartiles_adj ;
By year ;
Year + 1 ;
Data want ;
By year ;
Retain firstyear ;
if _N_ = 1 then firstyear = year ;
When (year = firstyear) Prev_quartile = . ;
When (sales >= salesq1) Prev_quartile = 1 ;
When (sales >= salesq2) Prev_quartile = 2 ;
When (sales >= salesq3) Prev_quartile = 3 ;
Otherwise Prev_quartile = 4 ;
Drop salesq1 - salesq3 firstyear;
[this code is untested]
Using SQL would avoid the first datastep so if you are more familiar with SQL it would look like this
Proc SQL ;
Create table want as
when sales >= salesq1 then 1
when sales >= salesq2 then 2
when sales >= salesq3 then 3
end as Prev_quartile
From have h
, quartiles q
on h.year = (q.year - 1)