BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AaronJ
Obsidian | Level 7

I have a data set in which data from the field was collected in two different ways. One part of the data was collected at the plot level (pH) and other data that was collected in transects in those plots, with as many as 8 data points in the plot (soil-N, litter-N etc). Unfortunately, there is quite a bit of missing data in the transect portion so that when i run model selection for AIC, BIC, SBC etc, only ~50 data points out of about 1,200 actually contain all of the variables for the same points, even though there is enough data for a mean value to be calculated for each plot. Needless to say 50 data points probably do not accurately reflect which combination of variables are the best to use in a multiple regression analysis.

So, what i would like to be able to do is generate a mean value for each of the variables at the plot level, and use that mean value to run PROC REG for model selection and then multiple regression.

Here's some code to help it all make sense:
data AllEvents; 

set import;

if Block="A" and plot=1 then plotID=1;

...

if Block="D" and plot=4 then plotID=16;

logN2O=log(0.1+Slope);

run;

proc reg data =AllEvents outest=est;

model logN2O = Soil_N Soil_C SoilC_N LitterN LitterC LitterC_N / selection=adjrsq sse aic bic sbc cp rmse;

run;

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Here is one way to get means by a grouping variable.

proc summary data=Allevents nway;
   class plotid;
   var Soil_N Soil_C SoilC_N LitterN LitterC LitterC_N;
   output out=plotmeans (drop=_: ) mean=;
run;

Use the Plotmeans data set in the regression.

 

View solution in original post

3 REPLIES 3
ballardw
Super User

Here is one way to get means by a grouping variable.

proc summary data=Allevents nway;
   class plotid;
   var Soil_N Soil_C SoilC_N LitterN LitterC LitterC_N;
   output out=plotmeans (drop=_: ) mean=;
run;

Use the Plotmeans data set in the regression.

 

AaronJ
Obsidian | Level 7

The code you supplied worked well, withe exception of the drop=   portion. SAS wouldn't run anything if i used Drop=Block Plot, which is what the plotID was subsituting for. So i deleted that part of the code and the rest worked fine. 
I was then able to plug the data=plotmeans into the proc reg statement and it worked well.

Thanks for the help!

ballardw
Super User

The drop statement I have suppresses two automatic variables that are added to the data set when you use CLASS statement.

One variable is _freq_ that is the number of records and _type_ which is an indicator of which combination of class variables are used to group for the statistics. The use of summary means only the class variables and the requested statistics are in the output set besides those two automatic variables. So there isn't any need to drop other varaibles from the input set.

 

I use proc summary partially from ancient habit and partially because without setting other options the output layout from proc means wouldn't have been the correct one for the regression.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 4515 views
  • 1 like
  • 2 in conversation