BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.

dear all, 

i have my data in the following format 

company_nameyearNICYX1X2
20 Microns Ltd.201180.3321.4840.367
20 Microns Ltd.201280.3061.3240.139
20 Microns Ltd.201380.2891.0810.050
20 Microns Ltd.201480.3031.0300.048
20 Microns Ltd.201580.2830.9720.071
20 Microns Ltd.201680.2801.0150.067
20 Microns Ltd.201780.2871.0680.069
20 Microns Ltd.201880.2491.0500.075
20 Microns Ltd.201980.2511.1450.135
20 Microns Ltd.202080.2511.2330.142
20 Microns Ltd.202180.2231.057-0.137
3I Infotech Ltd.2011620.1100.1860.010
3I Infotech Ltd.2012620.1060.155-0.011
3I Infotech Ltd.2013620.0750.103-0.037
3I Infotech Ltd.2014620.0540.072-0.012
3I Infotech Ltd.2015620.0520.0890.017
3I Infotech Ltd.2016620.0340.092-0.015
3I Infotech Ltd.2017620.0280.099-0.024
3I Infotech Ltd.2018620.0280.085-0.014
3I Infotech Ltd.2019620.0370.1080.018
3I Infotech Ltd.2020620.0180.073-0.032
3I Infotech Ltd.2021620.0230.0730.003
3M India Ltd.2011220.3321.9190.158
3M India Ltd.2012220.3311.8420.281
3M India Ltd.2013220.3031.6810.182
3M India Ltd.2014220.2681.2720.120
3M India Ltd.2015220.2631.2790.069
3M India Ltd.2016220.2831.3930.195
3M India Ltd.2017220.3481.5700.145
3M India Ltd.2018220.2411.2890.104

I have to regress Y on X1 and X2. the regression should be run for each year  and in each year, each NIC code separately. I use the following code for running the regression and for saving the residual and predicted values. 

 

PROC REG DATA=have;
MODEL Y=X1 X2;
by YEAR NIC;
output out=estimated
p=predicted
r=residual;
run; 

 

in some cases, the observations under any particular NIC code will be less than 10 also. i would like to exclude observations under such NIC codes in running the regression. 

 

please let me how should i modify the above mentioned code to achieve this purpose. 

 

thanking you in advance  

1 ACCEPTED SOLUTION

Accepted Solutions
Sajid01
Meteorite | Level 14

This is about this part of ypour quwation.
"the observations under any particular NIC code will be less than 10 also. i would like to exclude observations under such NIC codes in running the regression. "

Identify the Year+NIC combination and remove such records, in the data preparation step and before you start performing the regression. 
Make sure that the data is sorted using the by variables before the regression is performed.
Once done your code should give the desired results.

View solution in original post

6 REPLIES 6
Rick_SAS
SAS Super FREQ

Did you mean to use Company_Name instead of Year on the BY statement? 

Also, the NIC is constant for each company, so I suspect you should simply use
BY Company_Name;
for this analysis.

 

To exclude observations for which NIC < 10, use a WHERE statement, as follows:

PROC REG DATA=have;
   where NIC >= 10;
   by Company_Name; 
   MODEL Y=X1 X2;  
   output out=estimated p=predicted r=residual;
run; 
Sajid01
Meteorite | Level 14

This is about this part of ypour quwation.
"the observations under any particular NIC code will be less than 10 also. i would like to exclude observations under such NIC codes in running the regression. "

Identify the Year+NIC combination and remove such records, in the data preparation step and before you start performing the regression. 
Make sure that the data is sorted using the by variables before the regression is performed.
Once done your code should give the desired results.

srikanthyadav44
Quartz | Level 8
Hi Rick_SAS
the code is not working.
i want to include the observations where frequency with the NIC code is
more than 10.
NIC code is the classification of the companies. for example, NIC code 01
has only 8 companies in a particular year, want to exclude all the
companies with NIC code 01 for that particular year , in running the
regression.
please suggest me the SAS CODE with this condition

thanking you in advance
PaigeMiller
Diamond | Level 26

@srikanthyadav44 wrote:

NIC code is the classification of the companies. for example, NIC code 01
has only 8 companies in a particular year, want to exclude all the
companies with NIC code 01 for that particular year , in running the
regression.

this is confusing, as this is not what is seen in your example data. Could you explain more and provide example data that illustrates the issue?

--
Paige Miller
srikanthyadav44
Quartz | Level 8
Dear Mr. Paige Miller
i have updated my query with more clear illustration.
please have a look and suggest a SAS code
thank you in advance
srikanthyadav44
Quartz | Level 8
company_nameyearNICYX1X2
A D F Foods Ltd.2011100.180.880.08
Adani Wilmar Ltd.2011100.294.081.16
Agro Tech Foods Ltd.2011100.442.500.24
Ajanta Soya Ltd.2011100.176.231.44
Ambar Protein Inds. Ltd.2011100.177.121.53
Andrew Yule & Co. Ltd.2011100.140.540.12
Arcuttipore Tea Co. Ltd.2011100.050.220.00
Avanti Feeds Ltd.2011100.151.760.95
Associated Alcohols & Breweries Ltd.2011110.130.950.11
Devyani International Ltd.2011110.371.44-0.04
Eco Friendly Food Processing Park Ltd.2011110.010.110.02
Elegant Floriculture & Agrotech (India) Ltd.2011110.020.040.00
Flex Foods Ltd.2011110.150.610.11
G M Breweries Ltd.2011113.892.010.10
G R M Overseas Ltd.2011110.131.61-1.00
Naturite Agro Products Ltd.2011110.191.261.13
Orient Beverages Ltd.2011110.140.390.12
S I P Industries Ltd.2011110.110.21-0.05
Sinnar Bidi Udyog Ltd.2011110.070.920.13
Tarai Foods Ltd.2011110.100.22-0.22
A D F Foods Ltd.2012100.170.710.01
Adani Wilmar Ltd.2012100.193.701.08
Agro Tech Foods Ltd.2012100.402.47-0.05
Ajanta Soya Ltd.2012100.176.871.26
Andrew Yule & Co. Ltd.2012100.160.620.08
Arcuttipore Tea Co. Ltd.2012100.070.20-0.03
Avanti Feeds Ltd.2012100.232.721.28
Associated Alcohols & Breweries Ltd.2012110.131.240.43
Devyani International Ltd.2012110.421.500.36
Eco Friendly Food Processing Park Ltd.2012110.010.430.34
Elegant Floriculture & Agrotech (India) Ltd.2012110.010.030.00
Flex Foods Ltd.2012110.200.670.06
G M Breweries Ltd.2012114.261.690.13
Naturite Agro Products Ltd.2012110.090.880.14
S D F Industries Ltd.2012110.320.69-0.07
S I P Industries Ltd.2012110.170.14-0.21
Saptarishi Agro Inds. Ltd.2012110.030.00-0.41
Sinnar Bidi Udyog Ltd.2012110.080.35-0.55
Tarai Foods Ltd.2012110.100.22-0.05

i have updated the my data illustration to get more clarity on my query

in the above table, in the year 2011, the companies under "NIC 10" are only 8 . so, i would like to exclude in running the regression. 

in the same year, companies "under NIC 11"  are 12,, so i would like to include these companies in running the regression and so on .

 

i have to run the regression for every year and for each NIC code separately(yearwise-NIC code wise)

 

i am using this code presently, as requesting to suggest the change to this code to achieve my above discussed objective in running the regression 

PROC REG DATA=have;
MODEL Y=X1 X2 X3;
by YEAR NIC;
output out=estimated
p=predicted
r=residual;
run; 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 786 views
  • 0 likes
  • 4 in conversation