BookmarkSubscribeRSS Feed
cbatzi01
Calcite | Level 5

Hi, 

 

I have missing values in cost variable.  I would like to impute the missing values, grouped by two variables.  My data structure looks like this: 

 

DrugNameRX_YrCost;
Drug 12010$50
Drug 12010.
Drug 12011$60
Drug 22010$30
Drug 22010.
Drug 22011$20

 

I tried using Proc MI like this:

proc mi data=Have out=have_impute
by DrugName Rx_yr;
var Cost;
run;

 

But it returns an error message of "Fewer than two analysis variables".  

 

Anyone know what I am doing wrong, or a better way to do this? 

 

Thanks in advance!

Chris

 

4 REPLIES 4
learner
Calcite | Level 5

Maybe you should avoid 'drugname' as the analysis variable since it is character variable.

Reeza
Super User

I'm not sure that's a good way to do it.

 

I'd consider some rules that are probably true 90% of the time.

 

For example for drug1 in the same year I would assume the same price.

If I don't have data for that year, then I would consider an interpolation method, probably something as simple as the average of the years before and after. You probably have some more complex scenarios, such as missing two years in a row or different prices in same year. Regardless, I don't think a straighforward imputation method would be the best way to go in your case. This is assuming you're actually working with drug data and not some other data. 

cbatzi01
Calcite | Level 5
I am working with insurance claims data. I have over 900K records, 5% are missing the cost data. The costs vary a little depening on the patients insurance type, but not much. I definitely have data for a given year, just not sure how to systematically make the updates, other than hardcoding a value like

if DrugNm=Drug 1 and Rx_year=2010 and Cost=. then Cost=X;

Thanks!
Chris
Reeza
Super User

If you have data for those records make a 'master table' that has the values for the drug/year and then merge the tables on drug/year. 

 

You can probably use PROC STANDARD for this to replace missing values, as long as you're okay with using the mean value of the drugs per year. If not you'll need another method.

 

http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a002473725.htm

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1984 views
  • 0 likes
  • 3 in conversation