BookmarkSubscribeRSS Feed
dculpepp
Calcite | Level 5

Hello,

I have a dataset that has multiple measurements of air pollution over time that were originally arranged in wide format
Ex. pollution_month1 pollution_month2 pollution_month3 etc.
In addition I have a huge number of other variables (potentially confounding or explanatory exposures, additional outcomes) including age, bmi, etc. I used Proc transpose to create a long format version of the pollution variables with a copy statement to maintain the presence of the other covariates for use in a GEE or GLMM, but I noticed that the datasets only copied the information for the non-pollution data on a single line for each subject.

Here's an example with subject ID, age, and the repeated measures for pollution:

data WORK.IPNEW2;
infile datalines dsd truncover;
input id:11. age:32. NO2:32.;
format id 11.;
label id="IDCode" age="Mother's age at baseline";
datalines;
ID Age Pollution
10001 32 47.813976991
10001 . 60.016236077
10001 . 53.004974541
10001 . 48.732284383
10002 36 55.695500415
10002 . 43.528170487

Any time I have used either proc genmod or proc glimmix with repeated measures before, the data has been structured such that entire entry has been copied except for whichever variable is being repeated. I suppose two questions here:

1: is it necessary for the data to be completely filled? I'm assuming it is, because when I ran a logistic regression on this, it had a ton of missing entries, which seems completely wrong.
2: Is there any way that I can take the values that have been copied over to the first entry and apply them (except for the repeated measures of course) to the every entry in my new dataset?
in the above example, I would like to apply the age 32 to every listing of subject 10001 and 36 to every listing of 10002 as follows:
ID Age Pollution
10001 32 47.813976991
10001 32 60.016236077
10001 32 53.004974541
10001 32 48.732284383
10002 36 55.695500415
10002 36 43.528170487

I'd really appreciate any help available, please let me know if there is any additional information necessary that I can provide.

If it's any help, here is a makeshift version of the code that I used to transpose the variables I used for pollution:

proc transpose data=wide out=long prefix=n;
by ID;
var N1-N4;
copy 
age;
run;


Thanks for any help!
Dan

5 REPLIES 5
PaigeMiller
Diamond | Level 26

Normally, long format is vastly superior to wide format, except in the case of statistical modelling. Are you trying to create a model in which some response variable is predicted by a model that contains pollution_month1 and pollution_month1 and polllution_month3, etc.?

--
Paige Miller
dculpepp
Calcite | Level 5
Hi Paige, thanks for replying!
I'm looking to create a mixed model that models the odds of disease as a
function of pollution (repeated measures) and other covariates/confounders.
My preference for doing this is driven partially by the fact that I've
tried modeling the data with things like individual months of pollution, or
the average pollution level over several months with inconclusive results,
and partially by a desire to fit a model that most accurately fits the
nature of the data.
PaigeMiller
Diamond | Level 26

Ok, I still need more clarification.

 

When you speak of pollution_month1 pollution_month2 pollution_month3, are these independent (x) variables, or dependent (y) variables?

--
Paige Miller
dculpepp
Calcite | Level 5
They are independent variables.

Example model:
Hypertension= pollution_month 1 pollution_month 2 pollution_month 3 age bmi
PaigeMiller
Diamond | Level 26

So, I have re-read the original problem description, and I'm still not 100% sure I have the answer you are looking for, but I think this gets you close.

 

data WORK.IPNEW2;
    retain age_non_blank;
    infile datalines;
    input id  age NO2;
    if not missing(age) then age_non_blank=age;
    label id="IDCode" age="Mother's age at baseline";
datalines;
10001 32 47.813976991
10001 . 60.016236077
10001 . 53.004974541
10001 . 48.732284383
10002 36 55.695500415
10002 . 43.528170487
;
proc transpose data=ipnew2 out=ipnew2_t prefix=pollution_;
    by id age_non_blank;
    var no2;
run;
--
Paige Miller

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 421 views
  • 0 likes
  • 2 in conversation