BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
aminkarimid
Lapis Lazuli | Level 10

Hello everybody;

I decide to run a regression based on decile of data. suppose that I have firm, date and price variables. So I want to make decile by average price of each firm. For instance, the 10th decile has firms with the highest average price and first decile has firms with the lowest average price and so on.

 

How can I make this decile?

How can I run this regression?

 

Thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

I sounds like you need these steps:

 

  1. Run PROC SUMMARY, with CLASS of firm_id, to get the mean price for each firm, outputting a dataset with 1 obs per firm.
  2. Rank the output of step one to using the mean_price to assign 10 groups, outputting a second data set.
  3. join/Merge, by firm_id, the original dataset and the output of step 2, yielding a dataset with the same number of records as the original, but with the new rank/decile variable (call it mydecile), and sort by mydecile (or maybe mydecile/firm_id/date).

 

I imagine then you could run your regressions, etcs, with a BY mydecile statement.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

7 REPLIES 7
Reeza
Super User

PROC RANK with GROUPS=10.

It supports BY group processing.

aminkarimid
Lapis Lazuli | Level 10

Thanks, @Reeza;
But it does not work.
There are two problems;
1- A variable, which I want to rank data based on it, has missing values;
2- This code rank data by firms and price variable, but each firm can be repeated in groups. For example, firms A can be in 5th and second groups.
Please consider I want to make a decile based on average of price for each firm. So, firms are unique in each group and don't repeat in other groups.

aminkarimid
Lapis Lazuli | Level 10

Here's an example:

Table one is a sample of my data:

namedate price
A16
A1.
A27
A47
A45
A57
A65
B2.
B3.
B52
B51
B7.
C13
C14
C15
C33
C32
C31
C5.

 

Table two, which is shown below, is what I want:

namedatepricemeanrank
B2.0.61
B3.0.61
B520.61
B510.61
B7.0.61
C132.7142862
C142.7142862
C152.7142862
C332.7142862
C322.7142862
C312.7142862
C5.2.7142862
A165.2857143
A1.5.2857143
A275.2857143
A475.2857143
A455.2857143
A575.2857143
A655.2857143
Reeza
Super User
Those are not deciles. Why did you say deciles?
aminkarimid
Lapis Lazuli | Level 10
It is just an example to shed light on this question. I am working with a high-frequency data sample which is included more than 60 million observations and I need decile for a final result.
mkeintz
PROC Star

I sounds like you need these steps:

 

  1. Run PROC SUMMARY, with CLASS of firm_id, to get the mean price for each firm, outputting a dataset with 1 obs per firm.
  2. Rank the output of step one to using the mean_price to assign 10 groups, outputting a second data set.
  3. join/Merge, by firm_id, the original dataset and the output of step 2, yielding a dataset with the same number of records as the original, but with the new rank/decile variable (call it mydecile), and sort by mydecile (or maybe mydecile/firm_id/date).

 

I imagine then you could run your regressions, etcs, with a BY mydecile statement.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Reeza
Super User

You need one more step, because I don't think PROC SUMMARY/MEANS works here as @aminkarimid is treating missing as 0's, not missing. I guess you could do it 'manually' using SUMMARY/MEANS to calculate the sum and count automatically. The remaining steps are the same, merge in results, use PROC RANK etc. 

 

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 2200 views
  • 0 likes
  • 3 in conversation