- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have a data set in which data from the field was collected in two different ways. One part of the data was collected at the plot level (pH) and other data that was collected in transects in those plots, with as many as 8 data points in the plot (soil-N, litter-N etc). Unfortunately, there is quite a bit of missing data in the transect portion so that when i run model selection for AIC, BIC, SBC etc, only ~50 data points out of about 1,200 actually contain all of the variables for the same points, even though there is enough data for a mean value to be calculated for each plot. Needless to say 50 data points probably do not accurately reflect which combination of variables are the best to use in a multiple regression analysis.
So, what i would like to be able to do is generate a mean value for each of the variables at the plot level, and use that mean value to run PROC REG for model selection and then multiple regression.
Here's some code to help it all make sense:
data AllEvents;
set import;
if Block="A" and plot=1 then plotID=1;
...
if Block="D" and plot=4 then plotID=16;
logN2O=log(0.1+Slope);
run;
proc reg data =AllEvents outest=est;
model logN2O = Soil_N Soil_C SoilC_N LitterN LitterC LitterC_N / selection=adjrsq sse aic bic sbc cp rmse;
run;
Thanks
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here is one way to get means by a grouping variable.
proc summary data=Allevents nway; class plotid; var Soil_N Soil_C SoilC_N LitterN LitterC LitterC_N; output out=plotmeans (drop=_: ) mean=; run;
Use the Plotmeans data set in the regression.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here is one way to get means by a grouping variable.
proc summary data=Allevents nway; class plotid; var Soil_N Soil_C SoilC_N LitterN LitterC LitterC_N; output out=plotmeans (drop=_: ) mean=; run;
Use the Plotmeans data set in the regression.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The code you supplied worked well, withe exception of the drop= portion. SAS wouldn't run anything if i used Drop=Block Plot, which is what the plotID was subsituting for. So i deleted that part of the code and the rest worked fine.
I was then able to plug the data=plotmeans into the proc reg statement and it worked well.
Thanks for the help!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The drop statement I have suppresses two automatic variables that are added to the data set when you use CLASS statement.
One variable is _freq_ that is the number of records and _type_ which is an indicator of which combination of class variables are used to group for the statistics. The use of summary means only the class variables and the requested statistics are in the output set besides those two automatic variables. So there isn't any need to drop other varaibles from the input set.
I use proc summary partially from ancient habit and partially because without setting other options the output layout from proc means wouldn't have been the correct one for the regression.