BookmarkSubscribeRSS Feed
edasdfasdfasdfa
Quartz | Level 8

Hello,

 

So, is there any real benefit to creating dummy variables from your character variables..if procedures like proc logistic have the CLASS statement and options within proc logistic to just create dummy variables automatically (ie param=ref)?

4 REPLIES 4
PaigeMiller
Diamond | Level 26

There are a few procedures that don't generate their own dummy variables (example: PROC OPTMODEL), but most modeling procedures do generate dummy variables internally and so there's no value in doing the work to generate dummy variables for those procedures. I suppose the other reason to do generate your own dummy variables is if you want a specific parameterization of the model that is not provided by the PROC, but I would think that is very rare.

--
Paige Miller
edasdfasdfasdfa
Quartz | Level 8

Thanks!

 

How about if you want to reduce the number of levels in a character variable? Is there a way to do that through a procedure or would that require creating less levels manually?

PaigeMiller
Diamond | Level 26

There may be automated ways to categorize or group levels together, but I'm not able to think of any now. (If the data was continuous, there are binning methods and clustering methods)

 

Most likely you would have to combine levels in the code somehow yourself. PROC FORMAT works to combine levels with many procedures, for example using the SASHELP.CARS data set:

 

proc format;
	value $vf 'All'='All' "Front",'Rear'='Other';
run;

proc glm data=sashelp.cars;
	format drivetrain $vf.;
	class drivetrain;
	model invoice=drivetrain;
run;
quit;

 

--
Paige Miller
ballardw
Super User

@edasdfasdfasdfa wrote:

Thanks!

 

How about if you want to reduce the number of levels in a character variable? Is there a way to do that through a procedure or would that require creating less levels manually?


Depends on your data how much work might be involved. Consider the following code:

data example;
   input x $;
datalines;
FullSize
FullGas
Fun
Strength
String
Super
;
run;

proc freq data=example;
run;
proc freq data=example;
   format x $3.;
run;
Proc freq data=example;
   format x $2.;
run;
Proc freq data=example;
   format x $1.;
run;

The format applied to a variable for the run of a procedure would control the number of dummy variables created.

Some data may be easily grouped this way, otherwise you may need multiple formats. And formats are probably better in general than adding different variables.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 555 views
  • 4 likes
  • 3 in conversation