02-13-2016 05:00 AM - edited 02-13-2016 05:04 AM
I am confused why I am having to translate 3 of my 4 categorical variables with missing values that were cast as $Char4. into numerical variables in order for the PROC MI statement with FCS to work. I thought the whole point of PROC MI > FCS > LOGISTIC was to impute when the data is categorical. In fact, I've used it in the past to impute a "Job" field on an insurance dataset that was missing values so I know it works --though that was cast as a Text13. field. Does that matter????
I don't get why the PROC MI is failing for the Memory_Technology field which is a $char18. categorical field.
Also, if I use the LOGISTIC model which I thought I was reading was for categorical variables it returns non-categorical variables (eg. 1022.7814 when 1000 is a possibility or the next nearest category 1024).
Please see the attached data file and the code I am using.
I got the right results using this for 3 of the 4 fields (categories were appropriately generated but not for Memory_Technology):
PROC MI DATA=mydata1.comp2 seed=123 nimpute=15 OUT=impRSLTS;
CLASS Memory_Technology Max_Horizontal_Resolution Installed_Memory
Processor_Speed Processor Manufacturer Operating_System;
FCS NBITER=5 DISCRIM(Memory_Technology Max_Horizontal_Resolution Installed_Memory Processor_Speed/details); * Use 5 burn-in iterations;
VAR Memory_Technology Max_Horizontal_Resolution Installed_Memory Processor_Speed
Processor Manufacturer Warranty_days
dv_Infrared dv_Bluetooth dv_DockStnPrtRep_yy dv_DockStnPrtRep_yn
dv_DockStnPrtRep_ny dv_DockStnPrtRep_nn dv_Fingerprint dv_Subwoofer
SQRTprice dv_CDMA Operating_System dv_Ext_Battery;
It's showing the patterns for impute with 'X' never missing for Memory_Technology but it is missing in 63 cases.
02-15-2016 10:12 AM
The problem lies with the data. Run the following syntax:
proc freq data = laptops_dataset_raw;
It shows that MEMORY_TECHNOLOGY has no missing values, but it does have values are coded as '?'. A question mark is not recognized as a missing value in SAS. Recode the question marks to SAS missing values. Note that this should also be done for other variables that use question marks (e.g., MAX_HORIZONTAL_RESOLUTION). After the data are cleaned up, try running PROC MI again.
02-15-2016 03:46 PM - last edited on 02-17-2016 07:39 PM by ChrisHemedinger
Yes...that was done prior to running the PROC. That's how I got the three
values generated that were numerical. It is the categorical variable that
is failing. Even with and missing value replacing the question mark.
02-16-2016 03:03 AM