BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AaronJ
Obsidian | Level 7

I have a data set in which data from the field was collected in two different ways. One part of the data was collected at the plot level (pH) and other data that was collected in transects in those plots, with as many as 8 data points in the plot (soil-N, litter-N etc). Unfortunately, there is quite a bit of missing data in the transect portion so that when i run model selection for AIC, BIC, SBC etc, only ~50 data points out of about 1,200 actually contain all of the variables for the same points, even though there is enough data for a mean value to be calculated for each plot. Needless to say 50 data points probably do not accurately reflect which combination of variables are the best to use in a multiple regression analysis.

So, what i would like to be able to do is generate a mean value for each of the variables at the plot level, and use that mean value to run PROC REG for model selection and then multiple regression.

Here's some code to help it all make sense:
data AllEvents; 

set import;

if Block="A" and plot=1 then plotID=1;

...

if Block="D" and plot=4 then plotID=16;

logN2O=log(0.1+Slope);

run;

proc reg data =AllEvents outest=est;

model logN2O = Soil_N Soil_C SoilC_N LitterN LitterC LitterC_N / selection=adjrsq sse aic bic sbc cp rmse;

run;

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Here is one way to get means by a grouping variable.

proc summary data=Allevents nway;
   class plotid;
   var Soil_N Soil_C SoilC_N LitterN LitterC LitterC_N;
   output out=plotmeans (drop=_: ) mean=;
run;

Use the Plotmeans data set in the regression.

 

View solution in original post

3 REPLIES 3
ballardw
Super User

Here is one way to get means by a grouping variable.

proc summary data=Allevents nway;
   class plotid;
   var Soil_N Soil_C SoilC_N LitterN LitterC LitterC_N;
   output out=plotmeans (drop=_: ) mean=;
run;

Use the Plotmeans data set in the regression.

 

AaronJ
Obsidian | Level 7

The code you supplied worked well, withe exception of the drop=   portion. SAS wouldn't run anything if i used Drop=Block Plot, which is what the plotID was subsituting for. So i deleted that part of the code and the rest worked fine. 
I was then able to plug the data=plotmeans into the proc reg statement and it worked well.

Thanks for the help!

ballardw
Super User

The drop statement I have suppresses two automatic variables that are added to the data set when you use CLASS statement.

One variable is _freq_ that is the number of records and _type_ which is an indicator of which combination of class variables are used to group for the statistics. The use of summary means only the class variables and the requested statistics are in the output set besides those two automatic variables. So there isn't any need to drop other varaibles from the input set.

 

I use proc summary partially from ancient habit and partially because without setting other options the output layout from proc means wouldn't have been the correct one for the regression.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 4285 views
  • 1 like
  • 2 in conversation