BookmarkSubscribeRSS Feed
393310
Obsidian | Level 7

Hi, 

I am having a very weird issue with proc glimmix. I have data with multiple dependent variables and multiple predictors. I am trying to run separate longitudinal analyses for each dependent variable against the categorical variables independently. Earlier, I created an array and do loop to cycle through all of these variables. I called the data set "perm.temp" and set it based on my original data "perm.mentalhealth." When I run proc glimmix with the perm.temp using any of my categorical variables I get the warning "WARNING: The initial estimates did not yield a valid objective function." I have even tried not using the array for the variables and testing one categorical variable at a time ex: model PHQ9_SCORE= age. I still get the same warning. When I run the same model using the data set perm.mentalhealth, the program runs. I have opened the perm.temp data table to see if there were issues with the data, but it looks fine to me. I also tried to make a new data set (perm.temp2) and it also gave me the same warning. Any ideas why it's not working? I appreciate any help!!

data perm.temp; 
 set perm.mentalhealth; 
 array var_list[9] trouble_sleeping hurting_yourself interest depressed little_energy appetite feeling_bad concentrating
 moving_slowly;
 array categorical[22]  gender age scalp_lesions postauricular erythema eyelid_involvement cheilitis flexural_erythema xerosis neck_folds nipple_eczema
keratosis palmar hand_eczema ichthyosis foot_eczema race education_final insurance alopecia pityriasis pain_severeB;
array npredictors[9] SCORAD EASI BSA ADSI POEM_SCORE dlqi_score FIVED_SCORE RL_SCORE flare;
  do i=1 to dim(var_list);
  VarName=vname(var_list(i));
Outcome=var_list[i];
 do j=1 to dim(categorical);
categorical_=vvalue(categorical(j));
do k=1 to dim(npredictors);
npredictors_=vname(npredictors(k));
format Depression Depressionn. Anxiety Anxietyy. interest interestt. depressed _depressedd. trouble_sleeping trouble_sleepingg. little_energy little_energyy. appetite appetitee.
feeling_bad feeling_badd. concentrating concentratingg. moving_slowly moving_slowlyy. hurting_yourself hurting_yourselff. PHQ9_SCORE PHQ9_SCORE_. PHQ2_SCORE PHQ2_SCORE.
 gender gender. race race. education_final education. insurance insurance. scalp_lesions scalp_lesionss. postauricular postauricularr. erythema erythemaa. eyelid_involvement eyelid_involvementt.
cheilitis cheilitiss. flexural_erythema flexural_erythemaa. xerosis xerosiss. neck_folds neck_foldss. nipple_eczema nipple_eczemaa. keratosis keratosiss. palmar palmarr. hand_eczema hand_eczemaa. ichthyosis ichthyosiss.
foot_eczema foot_eczemaa. age age_bin_. alopecia alopeciaa. pityriasis pityriasiss. pain_severeB painn. ;
 output;
end; end; end;
run; 
 proc glimmix data=perm.mentalhealth method=laplace order=internal ; 
 class  record_id_final PHQ9_SCORE  gender ;
 model PHQ9_SCORE= gender  /link=cumlogit dist=multinomial ;
  random visit /subject=record_id_final;
  run;
proc glimmix data=perm.temp method=laplace order=internal ; 
 class  record_id_final PHQ9_SCORE  gender ;
 model PHQ9_SCORE= gender  /link=cumlogit dist=multinomial ;
  random visit /subject=record_id_final;
  run;

LOG:

schatr2_0-1635350422962.png

schatr2_1-1635350469824.png

RESULTS:

I noticed that for some reason the output from perm.temp has significantly more observations than perm.mental health for the exact same variable.. did something go wrong in my do loop that's causing the issue? 

perm.temp

schatr2_0-1635369284306.png

perm.mental health:

schatr2_1-1635369338651.png

 

 

perm.temp:

schatr2_2-1635350505418.png

perm.mentalhealth results: 

schatr2_3-1635350561796.png

 

14 REPLIES 14
sbxkoenk
SAS Super FREQ

Hello,

 

Do you need to use 

METHOD=LAPLACE?

 

Maybe you can try 

method = quad (fastquad qpoints = 6)

and "play" with the number of QPOINTS.

 

[[ Compared to METHOD=LAPLACE, the models for which parameters can be estimated by quadrature are further restricted. In addition to the conditional independence assumption and the absence of R-side covariance parameters, it is required that models suitable for METHOD=QUAD can be processed by subjects. (See the section Processing by Subjects about how the GLIMMIX procedure determines whether the data can be processed by subjects.) This in turn requires that all RANDOM statements have SUBJECT= effects and in the case of multiple SUBJECT= effects that these form a containment hierarchy. ]]

 

Good luck,

Koen

393310
Obsidian | Level 7
Thanks I tried that but did not get any results. I think there is an issue with the dataset. The # of obs for perm.temp are thousands vs for perm.mentalhealth is 950. I have no idea what is causing the discrepancy.
jiltao
SAS Super FREQ

You might try using the PARMS statement to specify a starting value for the covariance parameter to see if that helps. This might take several trial and error.

Convergence issue is often data / model dependent. You might want to open a SAS technical support track (support@sas.com) and send in your data so it can be investigated further.

Thanks,

Jill

393310
Obsidian | Level 7

Hi i just updated the post, but I noticed that my new data set suddenly has significantly more observations than the old one despite using the exact same model. Any reason why this might be happening?

 

Check the original post to see screenshots of what I am referring to!

StatsMan
SAS Super FREQ

Your code has an output statement inside of 3 nested do-loops. The indicies on the do-loops are 9,9,and 22. That means you are outputting each observation from the original data set 1,782 times. If you look at the response profiles from the two GLIMMIX runs, each level of the response in the new data set has 1,782 times more values than the corresponding level of the response in the original data set.

 

That should explain why the new data set has significantly more observations. Fix that and then we can address the convergence issue, if it still exists.

393310
Obsidian | Level 7

Hi @StatsMan

 

Thanks so much for your input. I'm new to SAS, but it was to my understanding that if you don't have the output statement, the loop will only produce results for the very last variable in the array.  Are you saying I should write the loop without an output statement at all?

sbxkoenk
SAS Super FREQ

Hello @393310 ,

 

I had not noticed the issue with the output; statement that @StatsMan discovered.

 

The output; statement is outputting observations, not variables.

By looping over array elements, you are just avoiding that you need to put statements like :
VarName=vname(var_list(i));
several times. Using arrays just makes your code shorter, more generic / dynamic (less hard-coded) and more clear.

The below code makes the data-set 3 times bigger (in n° of observations) if you un-comment the output statement. That's because 3 equals the dimension of the array.

data work.a(drop=i);
 set sashelp.class;
 array numvars_to_double{3} age weight height;
 do i = 1 to dim(numvars_to_double);
  numvars_to_double(i) = numvars_to_double(i) * 2;
 *output;
 end;
run;

You can submit above data step as well as it uses a sample data-set from the SASHELP library that every SAS installation has available.

 

To make your code even more clear, I would take the format ...; statement out of the loop and put it right before the run; statement. There's no need for having hundreds of times the same format statement, just once is enough.

if you still have convergence problems after solving this issue, I can certainly assist with that.
I have tons of info (or let's say dozens of recommendations 🤓) to tackle convergence issues in PROC GLIMMIX.

Good luck,
Koen

393310
Obsidian | Level 7

@StatsMan @sbxkoenk 

 

So I took out the output statement from my code, but then when I run glimmix it only gives me a model for dependent variable anxiety and predictor pain. I essentially want to achieve this: 

model dependent variable 1=independent variable 1, dep var 1=ind var 2 etc. until I cycle through all the combos. 

If I run the output in the loop I get multiple models, but have the issue of the extra observations. I tried to run the model so that it was outcome=categorical_ (without the other set of continuous predictors) and I do get convergence, but the p values are different than if I manually wrote out each model. for example phq9=gender manually gives p value of .69 vs in the looped version gives .91. SO I guess now my question is does having the extra observations make the model inaccurate? If so, how do I get it to loop through all my variables without the output statement?

 

I don't remember if I mentioned it in my original post, but I've been designing this based off of: An easy way to run thousands of regressions in SAS - The DO Loop

 

Thank you guys so so much for helping me!

data perm.temp; 
	set perm.mentalhealth; 
	

	array var_list[12] trouble_sleeping hurting_yourself interest depressed little_energy appetite 
feeling_bad concentrating moving_slowly PHQ9_SCORE Depression Anxiety; do i=1 to dim(var_list); VarName=vname(var_list(i)); put VarName=; Outcome=vvalue(var_list[i]); array categorical[22] gender age scalp_lesions postauricular erythema eyelid_involvement
cheilitis flexural_erythema xerosis neck_folds nipple_eczema
keratosis palmar hand_eczema ichthyosis foot_eczema race
education_final insurance alopecia pityriasis pain_severeB; do j=1 to dim(categorical); categorical_=vname(categorical(j)); put categorical_=; CValue=vvalue(categorical[j]); array npredictors[9] SCORAD EASI BSA ADSI POEM_SCORE dlqi_score FIVED_SCORE RL_SCORE flare;
do k=1 to dim(npredictors); npredictors_=vname(npredictors(k)); end;end;end; format Depression Depressionn. Anxiety Anxietyy. interest interestt. depressed _depressedd. trouble_sleeping
trouble_sleepingg. little_energy little_energyy. appetite appetitee. feeling_bad feeling_badd.
concentrating concentratingg.moving_slowly moving_slowlyy. hurting_yourself hurting_yourselff.
PHQ9_SCORE PHQ9_SCORE_. PHQ2_SCORE PHQ2_SCORE gender gender. race race. education_final education.
insurance insurance. scalp_lesions scalp_lesionss. postauricular postauricularr. erythema erythemaa. eyelid_involvement eyelid_involvementt. cheilitis cheilitiss.
flexural_erythema flexural_erythemaa. xerosis xerosiss. neck_folds neck_foldss. nipple_eczema
nipple_eczemaa. keratosis keratosiss. palmar palmarr. hand_eczema hand_eczemaa. ichthyosis
ichthyosiss. foot_eczema foot_eczemaa. age age_bin_. alopecia alopeciaa. pityriasis pityriasiss.
pain_severeB painn. ; run; proc sort data=perm.temp; by VarName categorical_; run; proc glimmix data=perm.temp method=laplace ; by VarName categorical_; class record_id_final CValue outcome ; model outcome = CValue /link=cumlogit dist=multinomial solution; random visit /subject=record_id_final; run;

LOG:

schatr2_0-1635606585584.png

results:

schatr2_1-1635606623592.png

 

 

sbxkoenk
SAS Super FREQ

Hello,

 

>> I don't remember if I mentioned it in my original post, but I've been designing this based off of:
>> An easy way to run thousands of regressions in SAS - The DO Loop

 

No, you didn't mention that in your original post. 😃
But now I understand why you had (have) that "output;" statement.
It's crucial in this case to run your PROC GLIMMIX with BY-groups of course. Now you are doing that, but I believe that in your original post you did not have a BY-statement in your PROC GLIMMIX.

Anyway, running PROC GLIMMIX 100 times on a different dataset (or making 100 different models with the same data set) is maybe not having the same result as running just 1 PROC GLIMMIX with 100 BY-groups. That may be linked to the presence of missing values.
I am not 100% sure of my above statement, but if you want I can give you an example on how to generate 100 times a PROC GLIMMIX with the model you want (each time a different predictor) instead of generating 1 PROC GLIMMIX with 100 BY-groups.

The former approach is less performant than the latter approach, but at least you know exactly what is going on and you can take subsets of the dynamically generated code and run them separately.

 

Just let me know ...
Cheers,
Koen

393310
Obsidian | Level 7
I would really appreciate the example if you didn't mind!!

I've been struggling with this for a week lol.
sbxkoenk
SAS Super FREQ

OK.


I will do that tomorrow if you do not mind (having no more time this Saturday evening, it's 18h.00 over here).
Just let me know what's published in the LOG window after running / submitting the below 3 lines of code. It might impact the code that I will write.

%PUT &=sysvlong4;
%PUT &=sysscp;
%PUT &=sysencoding;

Thanks,

Koen

393310
Obsidian | Level 7
Sure no problem.

this is what the log gave me
187 %PUT &=sysvlong4;
SYSVLONG4=9.04.01M7P08052020
188 %PUT &=sysscp;
SYSSCP=WIN
189 %PUT &=sysencoding;
SYSENCODING=wlatin1
sbxkoenk
SAS Super FREQ

Hello again, @393310 ,

 

There are many ways to generate code in SAS and subsequently submit this generated code.

I decided to take the macro approach.

 

Below code is modelling log(Salary) with each time another independent variable as input (simple linear regression).
I think you can easily translate to your use case.
You can run the example code as well as each SAS installation has the sample data set : SASHELP.BASEBALL.

 

PROC SQL noprint;
 create table work.num_predictors as
 select name
 from dictionary.columns
 where     libname='SASHELP' and memname='BASEBALL' 
       and upcase(type)='NUM'
       and name NOT CONTAINS 'Salary';
QUIT;

data _NULL_;
 if 0 then set work.num_predictors nobs=count;
 call symputx('number_num_predictors',put(count,8.));
 STOP;
run;
%PUT &=number_num_predictors;

%MACRO LOOP_REG;
%DO i = 1 %TO &number_num_predictors.;

data _NULL_;
 set work.num_predictors(firstobs=&i. obs=&i.);
 call symputx('predictor_i',name);
run;
%PUT &=i;
%PUT &=predictor_i;

ODS SELECT ParameterEstimates;
proc reg data=sashelp.baseball;
title "predictor = &predictor_i.";
   id name team league;
   model logSalary = &predictor_i.;
run;

%END; /* %DO i = 1 %TO &number_num_predictors.; */
%MEND LOOP_REG;

ods graphics off;
title; footnote;
%LOOP_REG;
QUIT; /* [EDIT] : I added 'QUIT;' to make sure last PROC REG is terminated and out-of-memory */
title; footnote;
/* end of program */

Good luck,
Koen

sbxkoenk
SAS Super FREQ

Hello @393310 ,

 

On top of my previous reply (with example program / macro) --> see above

, this may be of interest to you :

 

Data-Driven Programming Techniques Using SAS®
Posted 03-15-2021 09:43 AM | by KirkLafler
https://communities.sas.com/t5/SAS-Global-Forum-Proceedings/Data-Driven-Programming-Techniques-Using...

 

Good luck,

Koen

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 14 replies
  • 4291 views
  • 1 like
  • 4 in conversation