06-17-2016 01:55 PM
I have missing values in cost variable. I would like to impute the missing values, grouped by two variables. My data structure looks like this:
I tried using Proc MI like this:
proc mi data=Have out=have_impute
by DrugName Rx_yr;
But it returns an error message of "Fewer than two analysis variables".
Anyone know what I am doing wrong, or a better way to do this?
Thanks in advance!
06-17-2016 02:46 PM
I'm not sure that's a good way to do it.
I'd consider some rules that are probably true 90% of the time.
For example for drug1 in the same year I would assume the same price.
If I don't have data for that year, then I would consider an interpolation method, probably something as simple as the average of the years before and after. You probably have some more complex scenarios, such as missing two years in a row or different prices in same year. Regardless, I don't think a straighforward imputation method would be the best way to go in your case. This is assuming you're actually working with drug data and not some other data.
06-17-2016 03:00 PM
06-17-2016 03:05 PM
If you have data for those records make a 'master table' that has the values for the drug/year and then merge the tables on drug/year.
You can probably use PROC STANDARD for this to replace missing values, as long as you're okay with using the mean value of the drugs per year. If not you'll need another method.