BookmarkSubscribeRSS Feed
khollid
Fluorite | Level 6

I am trying to complete a multiple imputation for a dataset that has participants at 3 visits (data in short format).  After setting up and running the basic code in SAS 9.4, I receive the warning "An effect for variable X is a linear combination of other effects. The coefficient of the effect will be set to zero in the imputation." for almost all of my variables (but they aren't all linear combinations).  Any ideas why this could be happening? 

 

We have a lot of variables we are trying to impute in a single dataset, although the eventual models will only focus on a subset of the variables.  The 1, 2, and 3 at the very end of the variable names represent visit # for variables that change over time. For pollutant and weather variables, nomenclature is name of pollutant/weather characteristic + MN (for mean) + 2, 7, 28 or 365 for the averaging period of days over which the mean is calculated.  For our actual analyses after the imputation, we would only be using one set of averaging times and one pollutant exposure in one model (i.e. we might have a 2-day mean CO with the 2-day mean temperature, pressure, and dewpoint) along with the other variables that don't end in the 2, 7, 28, or 365.  Could the number of variables be causing this? I'm using a book called "Multiple Imputation of Missing Data Using SAS" that suggested this approach of putting the data in short form would suffice for the longitudinal setting, but maybe this dataset is too complex? 

 

proc mi data=data_short nimpute=10 seed=270 out=data_impute;

class hrtarm dmarm cadarm hseduc ethnic smk_statusn1 smk_statusn2 smk_statusn3 alc_statusn1 alc_statusn2 alc_statusn3 center3dn1;

fcs logistic (hseduc smk_statusn1 smk_statusn2 smk_statusn3 alc_statusn1 alc_statusn2 alc_statusn3)

regression (texpwkn1 texpwkn2 texpwkn3 bmin1 bmin2 bmin3

tempmn2n1 tempmn2n2 tempmn2n3 tempmn7n1 tempmn7n2 tempmn7n3 tempmn28n1 tempmn28n2 tempmn28n3 tempmn365n1 tempmn365n2 tempmn365n3

dewpmn2n1 dewpmn2n2 dewpmn2n3 dewpmn7n1 dewpmn7n2 dewpmn7n3 dewpmn28n1 dewpmn28n2 dewpmn28n3 dewpmn365n1 dewpmn365n2 dewpmn365n3

premn2n1 premn2n2 premn2n3 premn7n1 premn7n2 premn7n3 premn28n1 premn28n2 premn28n3 premn365n1 premn365n2 premn365n3

z_score_sumn1 z_score_sumn2 z_score_sumn3

PM10MNMOn1 PM10MNMOn2 PM10MNYRn1 PM10MNYRn2

PM25MNMOn1 PM25MNMOn2 PM25MNYRn1 PM25MNYRn2

PMcMNMOn1 PMcMNMOn2 PMcMNYRn1 PMcMNYRn2

COMN2n1 COMN2n2 COMN2n3 COMN7n1 COMN7n2 COMN7n3 COMN28n1 COMN28n2 COMN28n3 COMN365n1 COMN365n2 COMN365n3

NO2MN2n1 NO2MN2n2 NO2MN2n3 NO2MN7n1 NO2MN7n2 NO2MN7n3 NO2MN28n1 NO2MN28n2 NO2MN28n3 NO2MN365n1 NO2MN365n2 NO2MN365n3

NOXMN2n1 NOXMN2n2 NOXMN2n3 NOXMN7n1 NOXMN7n2 NOXMN7n3 NOXMN28n1 NOXMN28n2 NOXMN28n3 NOXMN365n1 NOXMN365n2 NOXMN365n3

O3MN2n1 O3MN2n2 O3MN2n3 O3MN7n1 O3MN7n2 O3MN7n3 O3MN28n1 O3MN28n2 O3MN28n3 O3MN365n1 O3MN365n2 O3MN365n3

PM10MN2n1 PM10MN2n2 PM10MN2n3 PM10MN7n1 PM10MN7n2 PM10MN7n3 PM10MN28n1 PM10MN28n2 PM10MN28n3 PM10MN365n1 PM10MN365n2 PM10MN365n3

PM25MN2n2 PM25MN2n3 PM25MN7n2 PM25MN7n3 PM25MN28n2 PM25MN28n3 PM25MN365n2 PM25MN365n3

PMcMN2n2 PMcMN2n3 PMcMN7n2 PMcMN7n3 PMcMN28n2 PMcMN28n3 PMcMN365n2 PMcMN365n3

SO2MN2n1 SO2MN2n2 SO2MN2n3 SO2MN7n1 SO2MN7n2 SO2MN7n3 SO2MN28n1 SO2MN28n2 SO2MN28n3 SO2MN365n1 SO2MN365n2 SO2MN365n3) ;

var hrtarm cadarm dmarm ageDSRn1 ethnic CENTER3Dn1 CENTER3Dn2 CENTER3Dn3 q2n1 q3n1 q4n1 bmin1 hseduc alc_statusn1

smk_statusn1 SO2MN28n1 SO2MN7n1 SO2MN2n1 PM10MN28n1 PM10MN7n1 PM10MN2n1 O3MN28n1 O3MN7n1 O3MN2n1 NOXMN28n1 NOXMN7n1

NOXMN2n1 NO2MN28n1 NO2MN7n1 NO2MN2n1 COMN28n1 COMN7n1 COMN2n1 PMcMNYRn1 PMcMNMOn1 PM25MNYRn1 PM25MNMOn1 PM10MNYRn1

PM10MNMOn1 dewpmn28n1 dewpmn2n1 tempmn28n1 tempmn7n1 tempmn2n1 dewpmn7n1 SO2MN365n1 PM10MN365n1 O3MN365n1

NOXMN365n1 NO2MN365n1 COMN365n1 tempmn365n1 dewpmn365n1 SO2MN365n2 SO2MN28n2 SO2MN7n2 SO2MN2n2 PM10MN365n2

PM10MN28n2 PM10MN7n2 PM10MN2n2 O3MN365n2 O3MN28n2 O3MN7n2 O3MN2n2 NOXMN365n2 NOXMN28n2 NOXMN7n2 NOXMN2n2

NO2MN365n2 NO2MN28n2 NO2MN7n2 NO2MN2n2 COMN365n2 COMN28n2 COMN7n2 COMN2n2 q4n2 q3n2 q2n2 tempmn365n2 tempmn28n2

tempmn7n2 tempmn2n2 ageDSRn2 PMcMNYRn2 PMcMNMOn2 PM25MNYRn2 PM25MNMOn2 PM10MNYRn2 PM10MNMOn2 dewpmn365n2 dewpmn28n2

dewpmn7n2 z_score_sumn2 dewpmn2n2 z_score_sumn1 premn28n2 premn28n1 premn7n1 premn7n2 premn2n2 premn2n1 premn365n2

texpwkn2 premn365n1 alc_statusn2 texpwkn1 smk_statusn2 bmin2 PMcMN2n2 PM25MN2n2 PMcMN7n2 PM25MN7n2 PMcMN28n2

PM25MN28n2 PMcMN365n2 PM25MN365n2 q4n3 q3n3 q2n3 ageDSRn3 z_score_sumn3 tempmn28n3 tempmn7n3 tempmn2n3

dewpmn28n3 dewpmn7n3 dewpmn2n3 tempmn365n3 dewpmn365n3 SO2MN365n3 PMcMN365n3 PM25MN365n3 PM10MN365n3 O3MN365n3

NOXMN365n3 NO2MN365n3 COMN365n3 premn28n3 premn7n3 premn2n3 premn365n3 SO2MN28n3 PMcMN28n3 PM25MN28n3 PM10MN28n3

O3MN28n3 NOXMN28n3 NO2MN28n3 COMN28n3 SO2MN7n3 SO2MN2n3 PMcMN7n3 PMcMN2n3 PM25MN7n3 PM25MN2n3 PM10MN7n3 PM10MN2n3

O3MN7n3 O3MN2n3 NOXMN7n3 NOXMN2n3 NO2MN7n3 NO2MN2n3 COMN7n3 COMN2n3 alc_statusn3 texpwkn3 smk_statusn3 bmin3;

run;

 

 

Thanks for any ideas!

13 REPLIES 13
ckolaja
Calcite | Level 5

Hi @khollid

 

I am having the same thing happen when I run PROC MI.  Did you ever find out why this happens or if there is a solution?

 

khollid
Fluorite | Level 6

Yes.  So for me it was just 1 variable that was an issue, but the way the warning works is to list every variable.  When I took out the 1 problematic variable, the warnings all went away.  In my case, one variable was perfectly predicted by the other two as it is var1=var2-var3.  When I took var1 out of the list, the warnings cleared up.  I would try to figure out if any of your variables can be perfectly predicted by some combination of other varibles, realizing that the warnings don't actually mean that every variable is the issue.  Then you can take out the problematic variable and run the imputation.  After the imputation finished, I then calculated an imputed var1 by having SAS calculate it from the imputed var2 and var3 for any initial missing values of var1.  Hope this helps!

PCG
Fluorite | Level 6 PCG
Fluorite | Level 6

I am having the same issue. Is there a quick way of identifying the variable that produced the problem. I am including over a hundred variables in my imputation. Hence, I need an efficient way of identifying the variable(s) that are causing MI to crash.

 

Cristian

PaigeMiller
Diamond | Level 26

There are two ways to determine linear combinations in the absence of categorical variables

 

  1. Run PROC CORR on all of your variables, the pair that has a correlation of +1 or –1 is the problem
  2. Run PROC PRINCOMP on all of your variables, the linear combination of variables that has an eigenvalue of zero is the problem

 

If you have categorical variables, then the dummy variables (depending how you created them) will sum to 1 across the rows, you need to remove one dummy variable for each original categorical variable.

 

But saying you have the "same" issue really obscures many issues; and if we could see your code, we could be more definitive in our answer.

--
Paige Miller
PCG
Fluorite | Level 6 PCG
Fluorite | Level 6

I use a lot of macro language to automate my code. Hence, the full code is a bit too long to post. Here are the relevant parts. Basically, I am using FCS unless I have only continuous data. If I have continuous, ordinal, and nominal data, I am using FCS. That is, what I have for my current study. I have been able to get the program to run by commenting out offending variables. However, I do need to impute these variables. Should I use a multiple pass system where I impute everything that does not crash in round 1 and then in subsequent rounds I add the commented variables until everything is imputed?

 

Cristian

 

Part 1: Macro program

 

Filename Imp "h:\OSU\Teaching\Factor & Cluster Analysis\SAS macros\Impute macro.sas" ;

%Let Continuous = age /*FamInc*/ house /*SOL amount soc:*/ ;
%Let Ordinal = CG1Empl /*No_Jobs*/ CG1Edu /*InsHealth: aid:*/ children adults savings  ;
%Let Nominal = CG1Gen CG1Rel CG1Mar CG1Race ;
%Let No_impute = StrEv_1--StrEv_30  Ins_: P_:;
%Let Transform  = BoxCox(age/lambda=-.62) /*BoxCox(FamInc/lambda=.38)*/ BoxCox(house/lambda=-.32) /*BoxCox(SOL/lambda=-.47)*/ ;
%Let Filein   = VAM;
%Let Fileout  = demo_imp;
%Let uniq_ID  = ID;
%Let Imp_no  = 5 ;
%Let Seed  = 123456 ; * Use 0 to create a random seed--but results cannot be replicated afterwards. ;

%Include Imp ;  /* Run external SAS code */

 

 

Part 2: Snippet of back-end program

 

%Macro ImputeData;
 proc means data=&Filein noprint ;
  var &Continuous &Ordinal &Nominal;
  output out=_min_ min=&Continuous ;
  output out=_max_ max=&Continuous ;
 run;

 %if "&Continuous"^="" %then %do;
 /* Produce macro variables with minimum and maximum values for variable list. */
  Proc IML;
   use _min_;
   read all var {&Continuous} into min;
   close _min_;

   use _max_;
   read all var {&Continuous} into max;
   close _max_;

   min = compbl(rowcat( char(min) ));
   max = compbl(rowcat( char(max) ));
   call symputx('min',min);
   call symputx('max',max);
  run;quit;
 %end;

 *ods select missPattern;
 proc mi data = &Filein
  seed=&Seed
  nimpute = &Imp_no
  out=&Fileout
  %if "&Continuous"^="" %then %do;
      minimum =   &min
      maximum =  &max
   round   =   1
  %end;
     MINMAXITER = 3000;

  %if "&Ordinal &Nominal"=" " %then
   mcmc IMPUTE=FULL;

  %else %do;
   class &Ordinal &Nominal ;
   fcs  nbiter=75
    %if "&continuous"^="" %then
     REGPMM(&continuous) ;
    %if "&ordinal"^="" %then
     logistic(&Ordinal) ;
    %if "&nominal"^="" %then
     discrim(&Nominal / CLASSEFFECTS=INCLUDE) ; ;
  %end ; ;
  %if "&Transform"^="" %then
   Transform &Transform ; ;
    var &Continuous &Ordinal &Nominal &No_impute;
 run;
%mend;
%ImputeData;

PaigeMiller
Diamond | Level 26

It would sure help if you defined what FCS is ... But as i have never used FCS (as far as I know), I neverthelss think your attempt to impute values even in the case of a linear combintaion of variables is misguided. If the variables are a linear combination of one another in the data that isn't missing, of what value would it be to impute values that destroys the linear combination? I see no value in doing this. If variables are linear combinations of one another in the data that is present, I say eliminate one (or more) of the variables, because there's no value in including it in the analyses.

--
Paige Miller
PCG
Fluorite | Level 6 PCG
Fluorite | Level 6

FCS (fully conditional specification) method is a relatively new statistical procedure for imputing missing data. Originally, the MCMC approach was used but this method is only appropriate for continuous, normal data. I do not understand the mathematical statistics behind FCS but according to SAS Help, this is the method recommended when you have data with an arbitrary missing pattern. The method allows you to organize your variables according to whether they are continuous, ordinal, or nominal. It uses the regression model to impute continuous variables, logistic regress to impute ordinal variables, and the discriminant function to impute nominal variables. Best of all, it can do all this in one step. The downside is that the model is prone to crashing.

 

I agree that one should not impute variables that are linear combination of other effects. My issue is identifying them and then dropping them. PROC CORR only works for continuous data. What SAS does not say is whether the linear combination is due to the continuous variables, ordinal variable, nominal variables, or some combination there of. Since FCS does everything in one step, my bet is on a combination of all the variables. Hence, in order to confirm that a variable is a linear combination of other effects, I would need to run a single procedure that can use continuous, ordinal, and discrete predictors. I am not sure which procedure can do this. It also feels like a lot of work...but that is a side point.

 

I am currently working on dropping the variables that SAS indicates are linear effects. However, I am missing something in my macro code. I am getting an error for the following code. SAS is treating i in my %Let statement as a character rather than reading the numeric value. I am sure there is a simple function that will fix this but I cannot think of it. Any suggestions?

 

Cristian

 

%Macro ExcludeVars;
data _NULL_;
 array Var{*} &Continuous ;
 array Ex{*} &Exclude;
 %Let _drop_= ;
 do i=1 to dim(Ex);
  if Ex(i) in Var then 
   %Let _drop_=&_drop_ %sysfunc(scan(&Exclude,i));
 end;
 %put &=_drop_;
run;
%Mend ExcludeVars;
%ExcludeVars;

 

PCG
Fluorite | Level 6 PCG
Fluorite | Level 6

I forgot to add, the reason I am writing a macro to drop variables, rather than commenting them out as I did before, is because I am working with lots of variables. It is too much work to comment out specific variables when using shorthand references (colon or hyphen) to list your variables. Hence, I am trying to read in the variables and drop select variables from a larger list. Hence, my previous code.

 

C.

PaigeMiller
Diamond | Level 26

As I said earlier, PROC PRINCOMP will find the linear combinations that are constant, these are the linear combinations that have a zero eigenvalue. In the case of nominal or ordinal variables, you need to provide PROC PRINCOMP with appropriate dummy variables.

--
Paige Miller
PCG
Fluorite | Level 6 PCG
Fluorite | Level 6

Unfortunately, this solution would be too inefficient for my data. I have about 175 variables, over 90% of which are ordinal or nominal. Creating dummy codes for all these variables is not worth the cost in programming time. Any suggestions for how to get SAS to read the value of i rather than to treat it as a character variable?

 

C.

PaigeMiller
Diamond | Level 26

@PCG wrote:

Unfortunately, this solution would be too inefficient for my data. I have about 175 variables, over 90% of which are ordinal or nominal. Creating dummy codes for all these variables is not worth the cost in programming time.


PROC GLMMOD makes creation of dummy variables easy.

 

Any suggestions for how to get SAS to read the value of i rather than to treat it as a character variable?

 

 I don't know which part of your code you are referring to. Please be specific.

 

--
Paige Miller
devrant
Obsidian | Level 7
%Macro ExcludeVars;
%local i;
data _NULL_;
 array Var{*} &Continuous ;
 array Ex{*} &Exclude;
 %Let _drop_= ;
 do i=1 to dim(Ex);
  if Ex(&i) in Var then 
   %Let _drop_=&_drop_ %sysfunc(scan(&Exclude,&i));
 end;
 %put &=_drop_;
run;
%Mend ExcludeVars;
%ExcludeVars;

Notice that a local macro variable has to be created (either with let or local statements) in order to use it with the scan sysfunc call, also it has to be referenced as a macro variable to get it resolved into a numerical value, instead of a string "i".

K331
Calcite | Level 5

What exactly does it mean when an effect for a variable is a linear combination of other effects? I understand that it's crucial to have an omitted reference dummy, but I don't understand why in a math sense. In a regression framework, an omitted reference is necessary so that comparisons can be made on the other dummies to that reference. But why would failing to have a reference group result in a "linear combination of other effects?" I guess I need some example or plain language here if possible. 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 13 replies
  • 7996 views
  • 1 like
  • 6 in conversation