How to create groups by deciles to run a regression?

Accepted Solution Solved
Reply
Regular Contributor
Posts: 166
Accepted Solution

How to create groups by deciles to run a regression?

Hello everybody;

I decide to run a regression based on decile of data. suppose that I have firm, date and price variables. So I want to make decile by average price of each firm. For instance, the 10th decile has firms with the highest average price and first decile has firms with the lowest average price and so on.

 

How can I make this decile?

How can I run this regression?

 

Thanks in advance.


Accepted Solutions
Solution
‎11-12-2017 02:44 PM
Trusted Advisor
Posts: 1,312

Re: How to create groups by deciles to run a regression?

Posted in reply to aminkarimid

I sounds like you need these steps:

 

  1. Run PROC SUMMARY, with CLASS of firm_id, to get the mean price for each firm, outputting a dataset with 1 obs per firm.
  2. Rank the output of step one to using the mean_price to assign 10 groups, outputting a second data set.
  3. join/Merge, by firm_id, the original dataset and the output of step 2, yielding a dataset with the same number of records as the original, but with the new rank/decile variable (call it mydecile), and sort by mydecile (or maybe mydecile/firm_id/date).

 

I imagine then you could run your regressions, etcs, with a BY mydecile statement.

View solution in original post


All Replies
Super User
Posts: 23,323

Re: How to create groups by deciles to run a regression?

Posted in reply to aminkarimid

PROC RANK with GROUPS=10.

It supports BY group processing.

Regular Contributor
Posts: 166

Re: How to create groups by deciles to run a regression?

[ Edited ]

Thanks, @Reeza;
But it does not work.
There are two problems;
1- A variable, which I want to rank data based on it, has missing values;
2- This code rank data by firms and price variable, but each firm can be repeated in groups. For example, firms A can be in 5th and second groups.
Please consider I want to make a decile based on average of price for each firm. So, firms are unique in each group and don't repeat in other groups.

Regular Contributor
Posts: 166

Re: How to create groups by deciles to run a regression?

Posted in reply to aminkarimid

Here's an example:

Table one is a sample of my data:

namedate price
A16
A1.
A27
A47
A45
A57
A65
B2.
B3.
B52
B51
B7.
C13
C14
C15
C33
C32
C31
C5.

 

Table two, which is shown below, is what I want:

namedatepricemeanrank
B2.0.61
B3.0.61
B520.61
B510.61
B7.0.61
C132.7142862
C142.7142862
C152.7142862
C332.7142862
C322.7142862
C312.7142862
C5.2.7142862
A165.2857143
A1.5.2857143
A275.2857143
A475.2857143
A455.2857143
A575.2857143
A655.2857143
Super User
Posts: 23,323

Re: How to create groups by deciles to run a regression?

Posted in reply to aminkarimid
Those are not deciles. Why did you say deciles?
Regular Contributor
Posts: 166

Re: How to create groups by deciles to run a regression?

It is just an example to shed light on this question. I am working with a high-frequency data sample which is included more than 60 million observations and I need decile for a final result.
Solution
‎11-12-2017 02:44 PM
Trusted Advisor
Posts: 1,312

Re: How to create groups by deciles to run a regression?

Posted in reply to aminkarimid

I sounds like you need these steps:

 

  1. Run PROC SUMMARY, with CLASS of firm_id, to get the mean price for each firm, outputting a dataset with 1 obs per firm.
  2. Rank the output of step one to using the mean_price to assign 10 groups, outputting a second data set.
  3. join/Merge, by firm_id, the original dataset and the output of step 2, yielding a dataset with the same number of records as the original, but with the new rank/decile variable (call it mydecile), and sort by mydecile (or maybe mydecile/firm_id/date).

 

I imagine then you could run your regressions, etcs, with a BY mydecile statement.

Super User
Posts: 23,323

Re: How to create groups by deciles to run a regression?

You need one more step, because I don't think PROC SUMMARY/MEANS works here as @aminkarimid is treating missing as 0's, not missing. I guess you could do it 'manually' using SUMMARY/MEANS to calculate the sum and count automatically. The remaining steps are the same, merge in results, use PROC RANK etc. 

 

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 201 views
  • 0 likes
  • 3 in conversation