dear all,
i have my data in the following format
company_name | year | NIC | Y | X1 | X2 |
20 Microns Ltd. | 2011 | 8 | 0.332 | 1.484 | 0.367 |
20 Microns Ltd. | 2012 | 8 | 0.306 | 1.324 | 0.139 |
20 Microns Ltd. | 2013 | 8 | 0.289 | 1.081 | 0.050 |
20 Microns Ltd. | 2014 | 8 | 0.303 | 1.030 | 0.048 |
20 Microns Ltd. | 2015 | 8 | 0.283 | 0.972 | 0.071 |
20 Microns Ltd. | 2016 | 8 | 0.280 | 1.015 | 0.067 |
20 Microns Ltd. | 2017 | 8 | 0.287 | 1.068 | 0.069 |
20 Microns Ltd. | 2018 | 8 | 0.249 | 1.050 | 0.075 |
20 Microns Ltd. | 2019 | 8 | 0.251 | 1.145 | 0.135 |
20 Microns Ltd. | 2020 | 8 | 0.251 | 1.233 | 0.142 |
20 Microns Ltd. | 2021 | 8 | 0.223 | 1.057 | -0.137 |
3I Infotech Ltd. | 2011 | 62 | 0.110 | 0.186 | 0.010 |
3I Infotech Ltd. | 2012 | 62 | 0.106 | 0.155 | -0.011 |
3I Infotech Ltd. | 2013 | 62 | 0.075 | 0.103 | -0.037 |
3I Infotech Ltd. | 2014 | 62 | 0.054 | 0.072 | -0.012 |
3I Infotech Ltd. | 2015 | 62 | 0.052 | 0.089 | 0.017 |
3I Infotech Ltd. | 2016 | 62 | 0.034 | 0.092 | -0.015 |
3I Infotech Ltd. | 2017 | 62 | 0.028 | 0.099 | -0.024 |
3I Infotech Ltd. | 2018 | 62 | 0.028 | 0.085 | -0.014 |
3I Infotech Ltd. | 2019 | 62 | 0.037 | 0.108 | 0.018 |
3I Infotech Ltd. | 2020 | 62 | 0.018 | 0.073 | -0.032 |
3I Infotech Ltd. | 2021 | 62 | 0.023 | 0.073 | 0.003 |
3M India Ltd. | 2011 | 22 | 0.332 | 1.919 | 0.158 |
3M India Ltd. | 2012 | 22 | 0.331 | 1.842 | 0.281 |
3M India Ltd. | 2013 | 22 | 0.303 | 1.681 | 0.182 |
3M India Ltd. | 2014 | 22 | 0.268 | 1.272 | 0.120 |
3M India Ltd. | 2015 | 22 | 0.263 | 1.279 | 0.069 |
3M India Ltd. | 2016 | 22 | 0.283 | 1.393 | 0.195 |
3M India Ltd. | 2017 | 22 | 0.348 | 1.570 | 0.145 |
3M India Ltd. | 2018 | 22 | 0.241 | 1.289 | 0.104 |
I have to regress Y on X1 and X2. the regression should be run for each year and in each year, each NIC code separately. I use the following code for running the regression and for saving the residual and predicted values.
PROC REG DATA=have;
MODEL Y=X1 X2;
by YEAR NIC;
output out=estimated
p=predicted
r=residual;
run;
in some cases, the observations under any particular NIC code will be less than 10 also. i would like to exclude observations under such NIC codes in running the regression.
please let me how should i modify the above mentioned code to achieve this purpose.
thanking you in advance
This is about this part of ypour quwation.
"the observations under any particular NIC code will be less than 10 also. i would like to exclude observations under such NIC codes in running the regression. "
Identify the Year+NIC combination and remove such records, in the data preparation step and before you start performing the regression.
Make sure that the data is sorted using the by variables before the regression is performed.
Once done your code should give the desired results.
Did you mean to use Company_Name instead of Year on the BY statement?
Also, the NIC is constant for each company, so I suspect you should simply use
BY Company_Name;
for this analysis.
To exclude observations for which NIC < 10, use a WHERE statement, as follows:
PROC REG DATA=have;
where NIC >= 10;
by Company_Name;
MODEL Y=X1 X2;
output out=estimated p=predicted r=residual;
run;
This is about this part of ypour quwation.
"the observations under any particular NIC code will be less than 10 also. i would like to exclude observations under such NIC codes in running the regression. "
Identify the Year+NIC combination and remove such records, in the data preparation step and before you start performing the regression.
Make sure that the data is sorted using the by variables before the regression is performed.
Once done your code should give the desired results.
@srikanthyadav44 wrote:
NIC code is the classification of the companies. for example, NIC code 01
has only 8 companies in a particular year, want to exclude all the
companies with NIC code 01 for that particular year , in running the
regression.
this is confusing, as this is not what is seen in your example data. Could you explain more and provide example data that illustrates the issue?
company_name | year | NIC | Y | X1 | X2 |
A D F Foods Ltd. | 2011 | 10 | 0.18 | 0.88 | 0.08 |
Adani Wilmar Ltd. | 2011 | 10 | 0.29 | 4.08 | 1.16 |
Agro Tech Foods Ltd. | 2011 | 10 | 0.44 | 2.50 | 0.24 |
Ajanta Soya Ltd. | 2011 | 10 | 0.17 | 6.23 | 1.44 |
Ambar Protein Inds. Ltd. | 2011 | 10 | 0.17 | 7.12 | 1.53 |
Andrew Yule & Co. Ltd. | 2011 | 10 | 0.14 | 0.54 | 0.12 |
Arcuttipore Tea Co. Ltd. | 2011 | 10 | 0.05 | 0.22 | 0.00 |
Avanti Feeds Ltd. | 2011 | 10 | 0.15 | 1.76 | 0.95 |
Associated Alcohols & Breweries Ltd. | 2011 | 11 | 0.13 | 0.95 | 0.11 |
Devyani International Ltd. | 2011 | 11 | 0.37 | 1.44 | -0.04 |
Eco Friendly Food Processing Park Ltd. | 2011 | 11 | 0.01 | 0.11 | 0.02 |
Elegant Floriculture & Agrotech (India) Ltd. | 2011 | 11 | 0.02 | 0.04 | 0.00 |
Flex Foods Ltd. | 2011 | 11 | 0.15 | 0.61 | 0.11 |
G M Breweries Ltd. | 2011 | 11 | 3.89 | 2.01 | 0.10 |
G R M Overseas Ltd. | 2011 | 11 | 0.13 | 1.61 | -1.00 |
Naturite Agro Products Ltd. | 2011 | 11 | 0.19 | 1.26 | 1.13 |
Orient Beverages Ltd. | 2011 | 11 | 0.14 | 0.39 | 0.12 |
S I P Industries Ltd. | 2011 | 11 | 0.11 | 0.21 | -0.05 |
Sinnar Bidi Udyog Ltd. | 2011 | 11 | 0.07 | 0.92 | 0.13 |
Tarai Foods Ltd. | 2011 | 11 | 0.10 | 0.22 | -0.22 |
A D F Foods Ltd. | 2012 | 10 | 0.17 | 0.71 | 0.01 |
Adani Wilmar Ltd. | 2012 | 10 | 0.19 | 3.70 | 1.08 |
Agro Tech Foods Ltd. | 2012 | 10 | 0.40 | 2.47 | -0.05 |
Ajanta Soya Ltd. | 2012 | 10 | 0.17 | 6.87 | 1.26 |
Andrew Yule & Co. Ltd. | 2012 | 10 | 0.16 | 0.62 | 0.08 |
Arcuttipore Tea Co. Ltd. | 2012 | 10 | 0.07 | 0.20 | -0.03 |
Avanti Feeds Ltd. | 2012 | 10 | 0.23 | 2.72 | 1.28 |
Associated Alcohols & Breweries Ltd. | 2012 | 11 | 0.13 | 1.24 | 0.43 |
Devyani International Ltd. | 2012 | 11 | 0.42 | 1.50 | 0.36 |
Eco Friendly Food Processing Park Ltd. | 2012 | 11 | 0.01 | 0.43 | 0.34 |
Elegant Floriculture & Agrotech (India) Ltd. | 2012 | 11 | 0.01 | 0.03 | 0.00 |
Flex Foods Ltd. | 2012 | 11 | 0.20 | 0.67 | 0.06 |
G M Breweries Ltd. | 2012 | 11 | 4.26 | 1.69 | 0.13 |
Naturite Agro Products Ltd. | 2012 | 11 | 0.09 | 0.88 | 0.14 |
S D F Industries Ltd. | 2012 | 11 | 0.32 | 0.69 | -0.07 |
S I P Industries Ltd. | 2012 | 11 | 0.17 | 0.14 | -0.21 |
Saptarishi Agro Inds. Ltd. | 2012 | 11 | 0.03 | 0.00 | -0.41 |
Sinnar Bidi Udyog Ltd. | 2012 | 11 | 0.08 | 0.35 | -0.55 |
Tarai Foods Ltd. | 2012 | 11 | 0.10 | 0.22 | -0.05 |
i have updated the my data illustration to get more clarity on my query
in the above table, in the year 2011, the companies under "NIC 10" are only 8 . so, i would like to exclude in running the regression.
in the same year, companies "under NIC 11" are 12,, so i would like to include these companies in running the regression and so on .
i have to run the regression for every year and for each NIC code separately(yearwise-NIC code wise)
i am using this code presently, as requesting to suggest the change to this code to achieve my above discussed objective in running the regression
PROC REG DATA=have;
MODEL Y=X1 X2 X3;
by YEAR NIC;
output out=estimated
p=predicted
r=residual;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.