BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
bartoszorzel
Calcite | Level 5

Hello,

 

I am trying to run a proc glimmix model, with an area level variable being my higher/group level term that individual's are nested in. My area level variable is currently a character variable with 530,000 observations (individuals) in my dataset. The individuals are nested in the areas (i.e., many indivudals will belong to the same area). My model is not converging, and one of the possible problems I think I may be having is that my area level variable is a character variable rather than a numeric variable (not entirely sure if this is the problem but it is something that I have considered). So I want to convert the observations for this character variable into numeric variables BUT retain the character labels of each of the observations. 

 

I found a file entitled "Sample 407000: How to convert all character variables to numeric and use the same variable names in the output data set" (http://support.sas.com/kb/40/700.html). I was able to replicate the results in the practice dataset, but cannot replicate it in my actual dataset, getting an error in the last DATA stetp in the input function (produces a dataset with missing observations for all of the character variables that I am trying to convert numeric variables). In the input statement:

 

do i = 1 to dim(ch);                                      
  nu(i)=input(ch(i),8.);                                  
end;

is there something that I need to change to it for it to work in this statement? Will having 530,000 observations create a problem in trying to do this, or should the do loop statement be able to conduct it seamlessly. Is this informat (8.) in the brackets incorrectly specified?

 

Any help would be appreciated

 

May thanks

 

Bart 

1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

 

Yep, this is a task for proc means!  

proc means data=have;
  by fsa;
  var preprob;
  output out=want mean=mean;
run;

Now the by can also be a class (check the docs for differences), and you can have other summary stats, where I have mean that means create a variable mean in the output dataset and put the mean in there (you could have stddev, min/max etc. check the docs).

View solution in original post

8 REPLIES 8
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Sorry, please refer to the guidance found under the post button on post a new question.  Post an example few rows of your data, in the form of a datastep so that we can run it and get to see the data as you see it with structure.  

 

What this code is saying:

do i = 1 to dim(ch);                                      
  nu(i)=input(ch(i),8.);                                  
end;

Is that you have created an array referenced by the name of ch, and for each variable in that array you loop over each one and input() the character value to a numeric and put it in the new numeric array nu.  I can't really tell anything further from wht you have posted, as cannot see the data, nor log output etc.  Has CH and NU been assigned by using array statement, are the CH variables all character, and do they conform to the 8. format.  You talk about many rows, but arrays provess varaibles on a row level, not across rows - this is again why seeing your data is vital.

bartoszorzel
Calcite | Level 5

Hello,

 

Thank you for your input. So I just tried creating a tiny dummy dataset that does not have all of my variables nor my observations, but it includes the one variable (FSA) that I would like to change. This is what I wrote:

 

data test;

input FSA$ agesex hypertension ischemicheartdisease heartfailure;

datalines;

B0A 1 0 0 1

B0A 5 0 0 0

B0C 7 0 0 0

B0C 4 0 1 0

B1J 3 0 1 0

B1J 9 1 0 0 

B1J 2 0 0 0

;

 

 

So FSA(area variable) is the only character variable, whereas agesex (10 age and sex groups) and hypertension (binary), ischemicheartdisease (binary) and heartfailure (binary) are all numeric variables. I would like to convert the FSA variable to numeric as well, but retain the observations as written in the rows (so B0A, B0C, B0C, etc.). Is this even possible using that instruction file with the link that I posted? I tried rerunning it on this example dataset, and although I did not experience any errors in the input function statement, the resulting dataset had missing observations '.' for all FSA observations. 

 

Alternatively, I was wondering like I mentioned before, the reason I am trying to do this is because I am having trouble running a proc glimmix model. The FSA variable is specified in several steps of the regular model statements:

 

proc glimmix data=have;

class agesex FSA;

model ischemicheartdisease(event="1")=FSA / dist=binary link=logit s;

lsmeans FSA/ cl ilink;

random in/ subject=FSA;

run;

 

The model does not converge. I was thinking that mabye this is because the FSA variable must be a numeric variable for this model to run, but is this actually true, and if not, can I first ignore all of what I am trying to accomplish up top? And then instead on figuring out what I may be misspecifying in my proc glimmix model and why it is not converging? Any possible ideas on what it does not converge and what I would need to share to try to diagnose this?

 

Thanks again,

 

Bart 

Kurt_Bremser
Super User

The macro is designed to put numerical values stored in character variables into numerical variables.

Values like "B0A" are clearly not numeric and cannot be directly converted to numeric.

You would need to create a lookup table that contains a 1:1 relationship between character and numeric value:

proc sort
  data=test (keep=fsa))
  out=lookup
  nodupkey
;
by fsa;
run;

data lookup;
set lookup;
fsa_n = _n_;
run;

You can now do a join on fsa, or create a format from that dataset.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Yes, as @Kurt_Bremser mentions, you can only convert numeric data to numeric.  You could apply a format or lookup as suggested.  However, not sure that is your issue and I don't know enough glimmix to answer.  I will move this to the stats subforum, maybe the statos can help you further.

bartoszorzel
Calcite | Level 5

Hello everyone,

 

Thank you for the responses. I tried the solution of converting the character variable to numeric as suggested, but my model still did not converge, so that does not seem to be the fix. I did however get a workaround. Now I have another issue. I now have a dataset of 520,000 individuals spread out in 77 FSAs (So many individuals in each FSA). Each individual has a predicted probability of my outcome ischemic heart disease. So my dataset in SAS looks basically like this (ommitting other variables):

 

FSA    PredProb

B0A      0.07

B0A      0.04

B0A      0.02

B1J      0.08

B1J      0.01

B4H     0.02

B4H     0.02

B4H     0.05

B4H     0.08

B4H     0.07

...

 

So I was wondering how I could get a table output where I would just have all of the 77 FSAs in one column and their mean value for the PredProb in another column. Is there a way to create a variable or maybe dataset to do this?

 

Any feedback would again be appreciated. 

bartoszorzel
Calcite | Level 5

Because right now I have 520,000 observations representing all individual with their own predicted probabilities, rather than 77 observations for the 77 FSAs with a mean probability for each based on the mean of the individual probabilities in that FSA.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

 

Yep, this is a task for proc means!  

proc means data=have;
  by fsa;
  var preprob;
  output out=want mean=mean;
run;

Now the by can also be a class (check the docs for differences), and you can have other summary stats, where I have mean that means create a variable mean in the output dataset and put the mean in there (you could have stddev, min/max etc. check the docs).

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 2017 views
  • 0 likes
  • 3 in conversation