Dear Madam/Sir,
With the help of this community, I have created industry dummies using the following program and obtain the following variables.
data want;
set have;
length dummy_sic2 $ 10;
dummy_sic2=cats('ind#',put(sic2,z2.));
run;
211 | 0.26 | 211 | 0.26 |
49 | 0.06 | 260 | 0.33 |
61 | 0.08 | 321 | 0.40 |
30 | 0.04 | 351 | 0.44 |
3 | 0.00 | 354 | 0.44 |
971 | 1.22 | 1325 | 1.66 |
However, I have the following error message in the regressions.
ERROR: Variable dummy_sic2 should be either numeric or specified in the CLASS statement.
It will be highly appreciative if you can advise how to convert 'dummy_sic2' into numeric data or other data form that can be included as industry dummies in the regressions.
Thank you
Joon1
Well, I am stumped, and there's something wrong here. This code (using a smaller data set) runs almost instantly, doesn't cause a freeze. Even with your data set of 40,000 records, similar code ought to run quickly (although not instantly) and I can't understand why your SAS freezes.
ods output DesignPoints = DesignMatrix;
proc glmmod data=sashelp.cars;
class make;
model msrp = make invoice enginesize cylinders horsepower mpg_city mpg_highway weight wheelbase length;
run;
Why do you need the ind# in your dummy_sic2 values, anyway? Although that shouldn't make a difference, what happens if you use
dummy_sic2=put(sic2,z2.);
Anyway, here is another way to generate dummy variables: https://blogs.sas.com/content/iml/2020/08/31/best-generate-dummy-variables-sas.html
You don't show what regression you ran. Likely the approach would be to add:
Class dummy_sic2;
to the regression code.
Thanks for your quick reply, ballardw.
The original error message is as follows:
proc surveyreg data=m20;
cluster gvkey;
model ln_audit = sdd1 sstmat leverage sodebt cspec ln_nonaudit icw restatement gc auchange
merger financing yearend ln_at mb big4 roa loss fsalepro sq_segs ar_in special_item ln_tenure
1704! y2000-y2016 dummy_sic2; run;
ERROR: Variable dummy_sic2 should be either numeric or specified in the CLASS statement.
NOTE: The previous statement has been deleted.
ERROR: No MODEL statement.
NOTE: The SAS System stopped processing this step because of errors.
The use of "class" in the regression statement has the following error message:
proc surveyreg data=m20;
cluster gvkey;
model ln_audit = sdd1 sstmat leverage sodebt cspec ln_nonaudit icw restatement gc auchange
merger financing yearend ln_at mb big4 roa loss fsalepro sq_segs ar_in special_item ln_tenure
y2000-y2016 Class dummy_sic2; run;
ERROR: Variable Class not found.
NOTE: The previous statement has been deleted.
ERROR: No MODEL statement.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SURVEYREG used (Total process time):
real time 0.00 seconds
cpu time 0.03 seconds
Any help will be highly appreciated.
Joon1
Look carefully at @ballardw 's suggestion. It is a class STATEMENT (note the trailing semi-colon), not a suggestion to insert the word "class" into the model statement. Use the original model statement, and add a separate
class dummy_sic2;
statement.
Thanks, mkeinz, for your reply.
Unfortunately, the sas output does not provide t-table. Is there any way to generate dummy variables that is can be included in the regressions like numerical variables (ind#1-ind#99). Thank you.
The SAS System |
48299 |
13.29416 |
642094.6 |
7158 |
0.8724 |
0.5177 |
7157 |
65 | ind#01 ind#02 ind#07 ind#08 ind#09 ind#10 ind#12 ind#13 ind#14 ind#15 ind#16 ind#17 ind#20 ind#21 ind#22 ind#23 ind#24 ind#25 ind#26 ind#27 ind#28 ind#29 ind#30 ind#31 ind#32 ind#33 ind#34 ind#35 ind#36 ind#37 ind#38 ind#39 ind#40 ind#41 ind#42 ind#44 ind#45 ind#46 ind#47 ind#48 ind#49 ind#50 ind#51 ind#52 ind#53 ind#54 ind#55 ind#56 ind#57 ind#58 ind#59 ind#70 ind#72 ind#73 ind#75 ind#76 ind#78 ind#79 ind#80 ind#81 ind#82 ind#83 ind#86 ind#87 ind#99 |
102 | 1033.40 | <.0001 |
1 | 60235.1 | <.0001 |
1 | 15.52 | <.0001 |
1 | 0.85 | 0.3564 |
1 | 0.90 | 0.3416 |
1 | 258.63 | <.0001 |
1 | 114.00 | <.0001 |
1 | 73.67 | <.0001 |
1 | 236.22 | <.0001 |
1 | 0.18 | 0.6740 |
1 | 1.48 | 0.2241 |
1 | 48.07 | <.0001 |
1 | 35.20 | <.0001 |
1 | 14.74 | 0.0001 |
1 | 48.24 | <.0001 |
1 | 9807.22 | <.0001 |
1 | 15.13 | 0.0001 |
1 | 628.61 | <.0001 |
1 | 189.43 | <.0001 |
1 | 93.19 | <.0001 |
1 | 248.66 | <.0001 |
1 | 63.28 | <.0001 |
1 | 82.64 | <.0001 |
1 | 206.21 | <.0001 |
1 | 5.64 | 0.0175 |
1 | 3986.35 | <.0001 |
1 | 4753.36 | <.0001 |
1 | 3347.22 | <.0001 |
1 | 2281.84 | <.0001 |
1 | 437.59 | <.0001 |
1 | 66.90 | <.0001 |
1 | 13.05 | 0.0003 |
1 | 7.40 | 0.0065 |
1 | 22.20 | <.0001 |
1 | 69.08 | <.0001 |
1 | 282.74 | <.0001 |
1 | 391.22 | <.0001 |
1 | 394.23 | <.0001 |
1 | 290.01 | <.0001 |
1 | 241.96 | <.0001 |
1 | 183.78 | <.0001 |
1 | 99.10 | <.0001 |
64 | 35.94 | <.0001 |
Note: | The denominator degrees of freedom for the F tests is 7157. |
Unfortunately, the sas output does not provide t-table.
I think you want to add the /SOLUTION option to your MODEL statement.
(If that does help, show us your code)
Thank you so much for Paigemiller. Your code worked well. However, this approach does not work in OLS (proc reg). Your help will be highly appreciated.
proc reg data=m20;
class dummy_sic2;
-----
180
NOTE: The previous statement has been deleted.
ERROR 180-322: Statement is not valid or it is used out of proper order.
model ln_audit = sdd1 sstmat leverage sodebt cspec ln_nonaudit icw restatement gc auchange
merger financing yearend /*abaccrual*/ ln_at mb big4 roa loss fsalepro sq_segs ar_in special_item ln_tenure
y2000-y2016 dummy_sic2 /vif; run;
ERROR: Variable dummy_sic2 in list does not match type prescribed for this list.
NOTE: The previous statement has been deleted.
WARNING: RUN statement ignored due to previous errors. Submit QUIT; to terminate the procedure.
NOTE: PROCEDURE REG used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
NOTE: The SAS System stopped processing this step because of errors.
Joon1
Yes, it doesn't work in PROC REG, there is no CLASS statement. You need to use PROC SURVEYREG (as you were in your earlier message) or you can use PROC GLM instead of PROC REG (for regression problems, REG and GLM are both applicable but GLM allows a class statement while REG does not).
When you change PROCs in the middle of a thread, the previous advice may not apply.
Thank you so much for your help, PaigeMiller. I have one more question. Is there way to show adjusted R-square and variation inflation factor (/vif option in proc reg) in Surveyreg procedure or GLM procedure?
Thank you!
Joon1
PROC SURVEYREG has an ADJRSQ option.
You can run the continuous variables with fake Y values through PROC REG if you really want the VIF values.
Thank you so much for your quick reply, PaigeMiller. Is there any way to include dummy_sic2 in the proc reg procedure to have VIF?
proc reg data=m20;
NOTE: Writing HTML Body file: sashtml1.htm
model ln_audit = sdd1 sstmat leverage sodebt cspec ln_nonaudit icw restatement gc auchange
merger financing yearend ln_at mb big4 roa loss fsalepro sq_segs ar_in special_item ln_tenure y2000-y2016 dummy_sic2 /vif; run;
ERROR: Variable dummy_sic2 in list does not match type prescribed for this list.
NOTE: The previous statement has been deleted.
WARNING: No variables specified for an SSCP matrix. Execution terminating.
NOTE: PROCEDURE REG used (Total process time):
real time 0.52 seconds
cpu time 0.17 seconds
Thank you
Joon1
There's no way to get VIF on categorical variables in PROC REG, unless you create the dummy variables somehow and run that through PROC REG.
If you asbolutely have to do the work to create dummy variables, the easiest way to do this is to use PROC GLMMOD with a CLASS statement to obtain the dummy variables for the class variable, and also all the continuous variables, into a SAS data set which can then be run through PROC REG to get the VIFs. Example: https://documentation.sas.com/?docsetId=statug&docsetTarget=statug_glmmod_examples02.htm&docsetVersi...
Thanks for your information, PaigeMiller.
I run glm procedure and make sas dataset "m21" using output statement below (output out=m21), but dummy variables are not created in the dataset "m21". I have an error message in the proc reg procedure. It will be grateful if you can advise how to save dummy variables in a SAS dataset and run proc reg.
proc glm data=m20;
class dummy_sic2;
model ln_audit = sdd1 sstmat leverage sodebt cspec ln_nonaudit icw restatement gc auchange
merger financing yearend ln_at mb big4 roa loss fsalepro sq_segs ar_in special_item ln_tenure
297! y2000-y2016 dummy_sic2/solution;
output out=m21;
run;
NOTE: The data set WORK.M21 has 46854 observations and 240 variables.
NOTE: PROCEDURE GLM used (Total process time):
real time 17.47 seconds
cpu time 2.50 seconds
proc reg data=m21;
model ln_audit = sdd1 sstmat leverage sodebt cspec ln_nonaudit icw restatement gc auchange
merger financing yearend ln_at mb big4 roa loss fsalepro sq_segs ar_in special_item ln_tenure
y2000-y2016 dummy_sic2 /vif; run;
ERROR: Variable dummy_sic2 in list does not match type prescribed for this list.
NOTE: The previous statement has been deleted.
WARNING: No variables specified for an SSCP matrix. Execution terminating.
NOTE: PROCEDURE REG used (Total process time):
real time 0.03 seconds
cpu time 0.01 seconds
Thanks
Joon1
Please read carefully. I said PROC GLMMOD.
Thanks for your kind reply, PaigeMiller.
How can I create the sas file that contains industry dummies?
"output out=m21" does not work in GLMMOD procedure.
Thanks
Joon1
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.