Solved: Create dataset of observations used in proc glm when absorb is used

mct2181 · Posted 04-17-2021 12:19 PM

I am running a series of nested fixed effects models using proc glm with the "absorb" command. To ensure each model is analyzing the same set of observations, I am trying to create a new dataset/output of only the observations that are included in my most restricted model (most variables in the model). Apparently, "Output data set not available when absorption is used."

Are there any other ways to do this??

My most restricted model looks like:

proc glm data=long0f_b;
absorb hhidpn;
model cog = soc6 age baseage*age dummy2020 soc6*dummy2020 / solution;
run;
quit;

PaigeMiller · Posted 04-17-2021 03:06 PM

So you can find the list of IDs that have 2 records where all variables in the model are non-missing like this:

I am going to type

... list of variables ...

where you should replace it with the actual list of variables (separated by a comma) in the next block of code

data is_missing;
    set have;
    miss = nmiss(... list of variables ...)>0;
run;
proc freq data=is_missing;
    tables hhidpn*miss/noprint out=_a_;
run;

This output data set named _A_ is what you want. You want to select all the HHDIPNs where MISS=0 and COUNT>=2.

--
Paige Miller

View solution in original post

PaigeMiller · Posted 04-17-2021 12:32 PM

@mct2181 wrote:

I am running a series of nested fixed effects models using proc glm with the "absorb" command. To ensure each model is analyzing the same set of observations, I am trying to create a new dataset/output of only the observations that are included in my most restricted model (most variables in the model). Apparently, "Output data set not available when absorption is used."

I think that message is pretty clear. But ... the documentation says the same thing: "The GLM procedure cannot produce predicted values or least squares means (LS-means) or create an output data set of diagnostic values if an ABSORB statement is used."

Are there any other ways to do this??

Do what? Why do you feel you need ABSORB? What is the end goal of all of this analysis?

--
Paige Miller

mct2181 · Posted 04-17-2021 12:55 PM

My dataset has repeated measures in long format; I'm using absorb so that the analysis models avg within-person variation (hhidpn is the person-ID).

The end goal of this particular question is to ensure that the model: cog = soc6 is only analyzing the same subsample of observations that are included in the multivariate models. I had originally thought to use output just to get the list of obs that are included in the most restricted model; I don't need any of the associated statistics like predicted values.

PaigeMiller · Posted 04-17-2021 01:01 PM

Follow-up questions ... how many levels of the person ID?

I still don't see why you need to get "the list of obs that are included in the most restricted model" via the method you are describing. And why wouldn't an observation be included in that model?

--
Paige Miller

mct2181 · Posted 04-17-2021 01:19 PM

There are some 4300 different hhidpns. A given hhidpn (participant) will only be included in this proc glm model if there are at least two separate observations (two different rows) with non-missing data for each variable in the model. As I include new variables, some of the HHIDPNs will get dropped from the model, because they don't have enough non-missing data to be included. I want to make sure I'm running every model on the exact same sample of participants. I definitely don't have to use output to get the that list - I just don't have an idea of another way to do it. Any suggestions are welcome and thank you so much for your help!

PaigeMiller · Posted 04-17-2021 02:53 PM

@mct2181 wrote:

A given hhidpn (participant) will only be included in this proc glm model if there are at least two separate observations (two different rows) with non-missing data for each variable in the model.

Should this say "... at least two separate observations (two different rows) where all model variables are non-missing"?

--
Paige Miller

mct2181 · Posted 04-17-2021 03:00 PM

Yes, that's correct!

PaigeMiller · Posted 04-17-2021 03:06 PM

So you can find the list of IDs that have 2 records where all variables in the model are non-missing like this:

I am going to type

... list of variables ...

where you should replace it with the actual list of variables (separated by a comma) in the next block of code

data is_missing;
    set have;
    miss = nmiss(... list of variables ...)>0;
run;
proc freq data=is_missing;
    tables hhidpn*miss/noprint out=_a_;
run;

This output data set named _A_ is what you want. You want to select all the HHDIPNs where MISS=0 and COUNT>=2.

--
Paige Miller

mct2181 · Posted 04-22-2021 02:34 PM

Thank you very much!

Create dataset of observations used in proc glm when absorb is used

Re: Create dataset of observations used in proc glm when absorb is used

Re: Create dataset of observations used in proc glm when absorb is used

Re: Create dataset of observations used in proc glm when absorb is used

Re: Create dataset of observations used in proc glm when absorb is used

Re: Create dataset of observations used in proc glm when absorb is used

Re: Create dataset of observations used in proc glm when absorb is used

Re: Create dataset of observations used in proc glm when absorb is used

Re: Create dataset of observations used in proc glm when absorb is used

Re: Create dataset of observations used in proc glm when absorb is used

Classroom Training Available!