turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- Tough Question: How to compare data to prior year'...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-24-2014 10:58 AM

Hello,

I have another question. Again, thank you for all of your help, I am learning so much!!

I have sasdata that takes the following form:

ID YEAR SALES

001 2001 100

002 2001 50

003 2001 75

004 2001 150

... ..... ....

001 2002 15

002 2002 20

003 2002 75

004 2002 100

.... ..... .....

001 2003 150

002 2003 200

003 2003 75

004 2003 100

..... ..... .....

So basically I have a list of how each ID performed every year. What I want to do is compare the performance of any given firm in any given year against the performance of ALL FIRMS IN THE PRIOR YEAR. So if there is a firm that sold 150 in 2002, I want to know where that would have ranked among how all firms did in 2001.

It has been suggested that I do this using PROC UNIVARIATE? But I can't figure out how to do that?

Thanks so much!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to mahler_ji

01-24-2014 11:10 AM

First, what is the requirement for the comparison? Mean, median...?

You could use PROC UNIVARIATE, MEANS/SUMMARY to get year totals. Then use SQL to match it back to the firms sales data (year = year - 1).

Data never sleeps

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to LinusH

01-24-2014 09:17 PM

I think you can meet this somewhat odd requirement by first ranking all ids within each year (proc rank with year as a by variable). Select the rank option to give the top rank (=1) to the best performer. Then use SQL to join back the ranked data with the original dataset, where rank.year = (data.year - 1) and select max(rank.rank) where rank.sales >= data.sales (for each id in the chosen year, select the maximum rank [of all ids] in the previous year where the maximum value of last year's sales is greater or equal to the sale for that id in the chosen year).

Someone with access to SAS this weekend should be able to code it.

Richard

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to LinusH

01-25-2014 10:54 AM

Linus,

Thank you for your reply. The requirement for the comparison is where the observation for the current year would fall on the distribution from the prior year, on a decile or quartile basis. In essence, I am trying to find which decile/quartile the current year observation would have been if it would have happened in the prior year. And I need to do this comparison for all years.

I was told that I need to use PROC UNIVARIATE, but when reading the instructions on the SAS website, I don't understand how to use the UNIVARIATE and what the arguments are, etc.

Any help is greatly appreciated.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to mahler_ji

01-25-2014 08:09 PM

@mahler.ji

It would have been helpful to have used the term 'quartile' in your original post rather than stating it in terms of ranking. Proc univariate will provide quartile information, but so will proc summary / proc means, probably easier to use in this context. However,

Proc Univariate data = have ; /* have is your existing dataset */

By year ; /* you can use Class instead of By if your data is not sorted and not too big */

var sales ;

Output out = quartiles q1 = salesq1

median = salesq2

q3 = salesq3

;

Run ;

Now since the quartliles are in the same row the easiest way to proceed is with datasteps, assuming your have data is at least sorted by year. First adjust the year value in the quartiles to allow matching for the previous year

Data quartiles_adj ;

By year ;

Year + 1 ;

Run ;

Data want ;

merge have

quartiles_adj

;

By year ;

Retain firstyear ;

if _N_ = 1 then firstyear = year ;

Select ;

When (year = firstyear) Prev_quartile = . ;

When (sales >= salesq1) Prev_quartile = 1 ;

When (sales >= salesq2) Prev_quartile = 2 ;

When (sales >= salesq3) Prev_quartile = 3 ;

Otherwise Prev_quartile = 4 ;

End ;

Drop salesq1 - salesq3 firstyear;

Run ;

[this code is untested]

Using SQL would avoid the first datastep so if you are more familiar with SQL it would look like this

Proc SQL ;

Create table want as

Select h.*

, q.sales:

, case

when sales >= salesq1 then 1

when sales >= salesq2 then 2

when sales >= salesq3 then 3

else 4

end as Prev_quartile

From have h

left join

, quartiles q

on h.year = (q.year - 1)

Order by

id

, year

;

Quit ;

[also untested]

Richard