- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
So, is there any real benefit to creating dummy variables from your character variables..if procedures like proc logistic have the CLASS statement and options within proc logistic to just create dummy variables automatically (ie param=ref)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
There are a few procedures that don't generate their own dummy variables (example: PROC OPTMODEL), but most modeling procedures do generate dummy variables internally and so there's no value in doing the work to generate dummy variables for those procedures. I suppose the other reason to do generate your own dummy variables is if you want a specific parameterization of the model that is not provided by the PROC, but I would think that is very rare.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks!
How about if you want to reduce the number of levels in a character variable? Is there a way to do that through a procedure or would that require creating less levels manually?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
There may be automated ways to categorize or group levels together, but I'm not able to think of any now. (If the data was continuous, there are binning methods and clustering methods)
Most likely you would have to combine levels in the code somehow yourself. PROC FORMAT works to combine levels with many procedures, for example using the SASHELP.CARS data set:
proc format;
value $vf 'All'='All' "Front",'Rear'='Other';
run;
proc glm data=sashelp.cars;
format drivetrain $vf.;
class drivetrain;
model invoice=drivetrain;
run;
quit;
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@edasdfasdfasdfa wrote:
Thanks!
How about if you want to reduce the number of levels in a character variable? Is there a way to do that through a procedure or would that require creating less levels manually?
Depends on your data how much work might be involved. Consider the following code:
data example; input x $; datalines; FullSize FullGas Fun Strength String Super ; run; proc freq data=example; run; proc freq data=example; format x $3.; run; Proc freq data=example; format x $2.; run; Proc freq data=example; format x $1.; run;
The format applied to a variable for the run of a procedure would control the number of dummy variables created.
Some data may be easily grouped this way, otherwise you may need multiple formats. And formats are probably better in general than adding different variables.