BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
joon1
Quartz | Level 8

Dear Madam/Sir,

With the help of this community, I have created industry dummies using the following program and obtain the following variables.

data want;
set have;
length dummy_sic2 $ 10;
dummy_sic2=cats('ind#',put(sic2,z2.));
run;

dummy_sic2 Frequency Percent CumulativeFrequency CumulativePercentind#01ind#02ind#07ind#08ind#09ind#10
2110.262110.26
490.062600.33
610.083210.40
300.043510.44
30.003540.44
9711.2213251.66

 

However, I have the following error message in the regressions.

ERROR: Variable dummy_sic2 should be either numeric or specified in the CLASS statement.

 

It will be highly appreciative if you can advise how to convert 'dummy_sic2' into numeric data or other data form that can be included as industry dummies in the regressions. 

 

Thank you 

Joon1

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

Well, I am stumped, and there's something wrong here. This code (using a smaller data set) runs almost instantly, doesn't cause a freeze. Even with your data set of 40,000 records, similar code ought to run quickly (although not instantly) and I can't understand why your SAS freezes.

 

ods output DesignPoints = DesignMatrix;
proc glmmod data=sashelp.cars;
    class make;
	model msrp = make invoice enginesize cylinders horsepower mpg_city mpg_highway weight wheelbase length;
run;

Why do you need the ind# in your dummy_sic2 values, anyway? Although that shouldn't make a difference, what happens if you use 

dummy_sic2=put(sic2,z2.);

Anyway, here is another way to generate dummy variables: https://blogs.sas.com/content/iml/2020/08/31/best-generate-dummy-variables-sas.html

 

--
Paige Miller

View solution in original post

26 REPLIES 26
ballardw
Super User

You don't show what regression you ran. Likely the approach would be to add:

 

Class dummy_sic2;

to the regression code.

 

 

joon1
Quartz | Level 8

Thanks for your quick reply, ballardw.

 

The original error message is as follows:

proc surveyreg data=m20;
cluster gvkey;
model ln_audit = sdd1 sstmat leverage sodebt cspec ln_nonaudit icw restatement gc auchange
 merger financing yearend  ln_at mb big4 roa loss fsalepro sq_segs ar_in special_item ln_tenure
1704! y2000-y2016  dummy_sic2; run;
ERROR: Variable dummy_sic2 should be either numeric or specified in the CLASS statement.
NOTE: The previous statement has been deleted.

ERROR: No MODEL statement.
NOTE: The SAS System stopped processing this step because of errors.

 

The use of "class" in the regression statement has the following error message:

 

 proc surveyreg data=m20;
 cluster gvkey;
 model ln_audit = sdd1 sstmat leverage sodebt cspec ln_nonaudit icw restatement gc auchange
 merger financing yearend ln_at mb big4 roa loss fsalepro sq_segs ar_in special_item ln_tenure
y2000-y2016 Class dummy_sic2; run;
ERROR: Variable Class not found.
NOTE: The previous statement has been deleted.

ERROR: No MODEL statement.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SURVEYREG used (Total process time):
real time 0.00 seconds
cpu time 0.03 seconds

 

Any help will be highly appreciated.

Joon1

mkeintz
PROC Star

Look carefully at @ballardw 's suggestion.   It is a class STATEMENT (note the trailing semi-colon), not a suggestion to insert the word "class" into the model statement.  Use the original model statement, and add a separate 

    class dummy_sic2;

statement.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
joon1
Quartz | Level 8

Thanks, mkeinz, for your reply.

 

Unfortunately, the sas output does not provide t-table. Is there any way to generate dummy variables that is can be included in the regressions like numerical variables (ind#1-ind#99). Thank you.

 
The SAS System

The SURVEYREG Procedure
 
Regression Analysis for Dependent Variable ln_audit
Data SummaryNumber of ObservationsMean of ln_auditSum of ln_audit
48299
13.29416
642094.6

 

Design SummaryNumber of Clusters
7158

 

Fit StatisticsR-SquareRoot MSEDenominator DF
0.8724
0.5177
7157

 

Class Level InformationCLASS Variable Levels Valuesdummy_sic2
65ind#01 ind#02 ind#07 ind#08 ind#09 ind#10 ind#12 ind#13 ind#14 ind#15 ind#16 ind#17 ind#20 ind#21 ind#22 ind#23 ind#24 ind#25 ind#26 ind#27 ind#28 ind#29 ind#30 ind#31 ind#32 ind#33 ind#34 ind#35 ind#36 ind#37 ind#38 ind#39 ind#40 ind#41 ind#42 ind#44 ind#45 ind#46 ind#47 ind#48 ind#49 ind#50 ind#51 ind#52 ind#53 ind#54 ind#55 ind#56 ind#57 ind#58 ind#59 ind#70 ind#72 ind#73 ind#75 ind#76 ind#78 ind#79 ind#80 ind#81 ind#82 ind#83 ind#86 ind#87 ind#99

 

Tests of Model EffectsEffect Num DF F Value Pr > FModelInterceptsdd1sstmatleveragesodebtcspecln_nonauditicwrestatementgcauchangemergerfinancingyearendln_atmbbig4roalossfsaleproSQ_SEGSar_inspecial_itemln_tenurey2000y2001y2002y2003y2004y2005y2006y2007y2008y2009y2010y2011y2012y2013y2014y2015y2016dummy_sic2
1021033.40<.0001
160235.1<.0001
115.52<.0001
10.850.3564
10.900.3416
1258.63<.0001
1114.00<.0001
173.67<.0001
1236.22<.0001
10.180.6740
11.480.2241
148.07<.0001
135.20<.0001
114.740.0001
148.24<.0001
19807.22<.0001
115.130.0001
1628.61<.0001
1189.43<.0001
193.19<.0001
1248.66<.0001
163.28<.0001
182.64<.0001
1206.21<.0001
15.640.0175
13986.35<.0001
14753.36<.0001
13347.22<.0001
12281.84<.0001
1437.59<.0001
166.90<.0001
113.050.0003
17.400.0065
122.20<.0001
169.08<.0001
1282.74<.0001
1391.22<.0001
1394.23<.0001
1290.01<.0001
1241.96<.0001
1183.78<.0001
199.10<.0001
6435.94<.0001

Note:The denominator degrees of freedom for the F tests is 7157.

 

PaigeMiller
Diamond | Level 26

Unfortunately, the sas output does not provide t-table. 

 

I think you want to add the /SOLUTION option to your MODEL statement.

 

(If that does help, show us your code)

--
Paige Miller
joon1
Quartz | Level 8

Thank you so much for Paigemiller. Your code worked well. However, this approach does not work in OLS (proc reg). Your help will be highly appreciated.

 

proc reg data=m20;
class dummy_sic2;
-----
180
NOTE: The previous statement has been deleted.
ERROR 180-322: Statement is not valid or it is used out of proper order.
model ln_audit = sdd1 sstmat leverage sodebt cspec ln_nonaudit icw restatement gc auchange
merger financing yearend /*abaccrual*/ ln_at mb big4 roa loss fsalepro sq_segs ar_in special_item ln_tenure
y2000-y2016 dummy_sic2 /vif; run;

ERROR: Variable dummy_sic2 in list does not match type prescribed for this list.
NOTE: The previous statement has been deleted.

WARNING: RUN statement ignored due to previous errors. Submit QUIT; to terminate the procedure.
NOTE: PROCEDURE REG used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds

NOTE: The SAS System stopped processing this step because of errors.

 

Joon1

 

PaigeMiller
Diamond | Level 26

Yes, it doesn't work in PROC REG, there is no CLASS statement. You need to use PROC SURVEYREG (as you were in your earlier message) or you can use PROC GLM instead of PROC REG (for regression problems, REG and GLM  are both applicable but GLM allows a class statement while REG does not).

 

When you change PROCs in the middle of a thread, the previous advice may not apply.

--
Paige Miller
joon1
Quartz | Level 8

Thank you so much for your help, PaigeMiller. I have one more question. Is there way to show adjusted R-square and variation inflation factor (/vif option in proc reg) in Surveyreg procedure or GLM procedure?

 

Thank you!

Joon1

PaigeMiller
Diamond | Level 26

PROC SURVEYREG has an ADJRSQ option.

 

You can run the continuous variables with fake Y values through PROC REG if you really want the VIF values.

--
Paige Miller
joon1
Quartz | Level 8

Thank you so much for your quick reply, PaigeMiller. Is there any way to include dummy_sic2 in the proc reg procedure to have VIF?

 

 proc reg data=m20;
NOTE: Writing HTML Body file: sashtml1.htm
model ln_audit = sdd1 sstmat leverage sodebt cspec ln_nonaudit icw restatement gc auchange
merger financing yearend ln_at mb big4 roa loss fsalepro sq_segs ar_in special_item ln_tenure  y2000-y2016 dummy_sic2 /vif; run;
ERROR: Variable dummy_sic2 in list does not match type prescribed for this list.
NOTE: The previous statement has been deleted.

WARNING: No variables specified for an SSCP matrix. Execution terminating.
NOTE: PROCEDURE REG used (Total process time):
real time 0.52 seconds
cpu time 0.17 seconds

 

Thank you

Joon1

PaigeMiller
Diamond | Level 26

There's no way to get VIF on categorical variables in PROC REG, unless you create the dummy variables somehow and run that through PROC REG.

 

If you asbolutely have to do the work to create dummy variables, the easiest way to do this is to use PROC GLMMOD with a CLASS statement to obtain the dummy variables for the class variable, and also all the continuous variables, into a SAS data set which can then be run through PROC REG to get the VIFs. Example: https://documentation.sas.com/?docsetId=statug&docsetTarget=statug_glmmod_examples02.htm&docsetVersi...

--
Paige Miller
joon1
Quartz | Level 8

Thanks for your information, PaigeMiller.

 

I run glm procedure and make sas dataset "m21" using output statement below (output out=m21), but dummy variables are not created in the dataset "m21". I have an error message in the proc reg procedure. It will be grateful if you can advise how to save dummy variables in  a SAS dataset and run proc reg.

 

 proc glm data=m20;
class dummy_sic2;
model ln_audit = sdd1 sstmat leverage sodebt cspec ln_nonaudit icw restatement gc auchange
merger financing yearend ln_at mb big4 roa loss fsalepro sq_segs ar_in special_item ln_tenure
297! y2000-y2016 dummy_sic2/solution;
output out=m21;
run;

NOTE: The data set WORK.M21 has 46854 observations and 240 variables.
NOTE: PROCEDURE GLM used (Total process time):
real time 17.47 seconds
cpu time 2.50 seconds


 proc reg data=m21;
 model ln_audit = sdd1 sstmat leverage sodebt cspec ln_nonaudit icw restatement gc auchange
 merger financing yearend  ln_at mb big4 roa loss fsalepro sq_segs ar_in special_item ln_tenure
 y2000-y2016 dummy_sic2 /vif; run; 
ERROR: Variable dummy_sic2 in list does not match type prescribed for this list.
NOTE: The previous statement has been deleted.

WARNING: No variables specified for an SSCP matrix. Execution terminating.
NOTE: PROCEDURE REG used (Total process time):
real time 0.03 seconds
cpu time 0.01 seconds

 

Thanks

Joon1

PaigeMiller
Diamond | Level 26

Please read carefully. I said PROC GLMMOD.

--
Paige Miller
joon1
Quartz | Level 8

Thanks for your kind reply, PaigeMiller.

 

How can I create the sas file that contains industry dummies? 

 

"output out=m21" does not work in GLMMOD procedure.

 

Thanks

Joon1

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 26 replies
  • 3362 views
  • 0 likes
  • 4 in conversation