## How to create groups by deciles to run a regression?

Solved
Regular Contributor
Posts: 166

# How to create groups by deciles to run a regression?

Hello everybody;

I decide to run a regression based on decile of data. suppose that I have firm, date and price variables. So I want to make decile by average price of each firm. For instance, the 10th decile has firms with the highest average price and first decile has firms with the lowest average price and so on.

How can I make this decile?

How can I run this regression?

Accepted Solutions
Solution
‎11-12-2017 02:44 PM
Posts: 1,312

## Re: How to create groups by deciles to run a regression?

I sounds like you need these steps:

1. Run PROC SUMMARY, with CLASS of firm_id, to get the mean price for each firm, outputting a dataset with 1 obs per firm.
2. Rank the output of step one to using the mean_price to assign 10 groups, outputting a second data set.
3. join/Merge, by firm_id, the original dataset and the output of step 2, yielding a dataset with the same number of records as the original, but with the new rank/decile variable (call it mydecile), and sort by mydecile (or maybe mydecile/firm_id/date).

I imagine then you could run your regressions, etcs, with a BY mydecile statement.

All Replies
Super User
Posts: 23,323

## Re: How to create groups by deciles to run a regression?

PROC RANK with GROUPS=10.

It supports BY group processing.

Regular Contributor
Posts: 166

## Re: How to create groups by deciles to run a regression?

[ Edited ]

Thanks, @Reeza;
But it does not work.
There are two problems;
1- A variable, which I want to rank data based on it, has missing values;
2- This code rank data by firms and price variable, but each firm can be repeated in groups. For example, firms A can be in 5th and second groups.
Please consider I want to make a decile based on average of price for each firm. So, firms are unique in each group and don't repeat in other groups.

Regular Contributor
Posts: 166

## Re: How to create groups by deciles to run a regression?

Here's an example:

Table one is a sample of my data:

 name date price A 1 6 A 1 . A 2 7 A 4 7 A 4 5 A 5 7 A 6 5 B 2 . B 3 . B 5 2 B 5 1 B 7 . C 1 3 C 1 4 C 1 5 C 3 3 C 3 2 C 3 1 C 5 .

Table two, which is shown below, is what I want:

 name date price mean rank B 2 . 0.6 1 B 3 . 0.6 1 B 5 2 0.6 1 B 5 1 0.6 1 B 7 . 0.6 1 C 1 3 2.714286 2 C 1 4 2.714286 2 C 1 5 2.714286 2 C 3 3 2.714286 2 C 3 2 2.714286 2 C 3 1 2.714286 2 C 5 . 2.714286 2 A 1 6 5.285714 3 A 1 . 5.285714 3 A 2 7 5.285714 3 A 4 7 5.285714 3 A 4 5 5.285714 3 A 5 7 5.285714 3 A 6 5 5.285714 3
Super User
Posts: 23,323

## Re: How to create groups by deciles to run a regression?

Those are not deciles. Why did you say deciles?
Regular Contributor
Posts: 166

## Re: How to create groups by deciles to run a regression?

It is just an example to shed light on this question. I am working with a high-frequency data sample which is included more than 60 million observations and I need decile for a final result.
Solution
‎11-12-2017 02:44 PM
Posts: 1,312

## Re: How to create groups by deciles to run a regression?

I sounds like you need these steps:

1. Run PROC SUMMARY, with CLASS of firm_id, to get the mean price for each firm, outputting a dataset with 1 obs per firm.
2. Rank the output of step one to using the mean_price to assign 10 groups, outputting a second data set.
3. join/Merge, by firm_id, the original dataset and the output of step 2, yielding a dataset with the same number of records as the original, but with the new rank/decile variable (call it mydecile), and sort by mydecile (or maybe mydecile/firm_id/date).

I imagine then you could run your regressions, etcs, with a BY mydecile statement.

Super User
Posts: 23,323

## Re: How to create groups by deciles to run a regression?

You need one more step, because I don't think PROC SUMMARY/MEANS works here as @aminkarimid is treating missing as 0's, not missing. I guess you could do it 'manually' using SUMMARY/MEANS to calculate the sum and count automatically. The remaining steps are the same, merge in results, use PROC RANK etc.

☑ This topic is solved.