Standardizing Output in Proc Means

Reply
Contributor
Posts: 20

Standardizing Output in Proc Means

Hello all, I would really appreciate some help on this.

I have a data set that contains measurements on tree cross sectional area, and this is an example of the format, where "SpCode" identifies the species in question, and CSA_m2 is the cross sectional area.  The other fields identify the sampling unit.

SitePerTransSideInt_EdgeSpCodeCSA_M2
EVN1NEDGEAcRu0.063718
EVN1NEDGEAcRu0.003847
EVN1NEDGEAcRu0.024593
EVN1NEDGEAcRu0.093435
EVN1NEDGEAcRu0.021818
EVN1NEDGEAcRu0.012266
EVN1NEDGEAcRu0.008328
EVN1NEDGEAcRu0.003737
EVN1NEDGEAcRu0.000754
EVN1NEDGECeltis0.00363
EVN1NEDGECeltis0.010382
EVN1NEDGECeltis0.004069

What I am trying to do is use PROC MEANS to summarize average cross sectional area (and other measures) for every combination of the variables SITE, SIDE, PERTRANS, INT_EDGE and SPECIES. 

Below is my code and output. 

proc means data=adults_csa sum mean nway missing;

class site  side pertrans int_edge species;

output out=meanAD sum= stddev= stderr= MEAN=/AUTONAME ;

run;

OUTPUT:

ObsSiteSidePerTransInt_EdgeSpecies_TYPE__FREQ_CSA_M2_SumCSA_M2_StdDevCSA_M2_StdErrCSA_M2_Mean
1EVNN1EDGEAcRu3190.23250.031850.010620.02583
2EVNN1EDGECeltis3190.023270.003210.001070.00259
3EVNN1EDGEPlOc3110.16828..0.16828
4EVNN1EDGEQuPa3120.00123000.00062
5EVNN1EDGEUlRu3160.033580.003720.001520.0056
6EVNN1INTAcNe3110.00385..0.00385
7EVNN1INTAcRu3190.053260.008640.002880.00592
8EVNN1INTFrPe3180.253860.026460.009350.03173
9EVNN1INTPoDe3140.529940.103290.051640.13248
10EVNN1INTUlRu3110.01188..0.01188

This output is ACCURATE, but I have a problem.  Since all species did not occur in all plots, I don't have entries indicating where "Species" had values of "0" for all cross sectional area statistics.  This is a problem when I am trying to graph the data using this dataset, because I end up with different sample sizes depending on the plots in question.

SO MY BIG QUESTION IS:

How can I standardize the output so that summary data for each species is reported, for each sampling unit, even if that species is not present in the sampling unit?  I am hoping there is a way to do this without just adding "zero" entries for each species the original dataset.

Thank you in advance.

Sincerely

Meghan

Contributor
Posts: 20

Re: Standardizing Output in Proc Means

Ok, I just figured out that using the COMPLETETYPES option in first line creates the missing variables I needed.  Now I just need to figure out how to give the missing values values of 0.

Grand Advisor
Posts: 10,210

Re: Standardizing Output in Proc Means

take a pass through your output data set and set the . to 0.

A quick and dirty:

data want;

     set have;

     array n CSA:;

     do _i_=1 to dim(n); if n[_i_] = . then n[_i_]=0;end;

run;

Contributor
Posts: 20

Re: Standardizing Output in Proc Means

Ok, using the advice from this thread

I used the the following code to change the missing values to zero. 

I suppose, you want to change all numeric variables. Try this:

data yourdata;

   set yourdata;

   array change _numeric_;

            do over change;

            if change=. then change=0;

            end;

   run ;

Trusted Advisor
Posts: 1,203

Re: Standardizing Output in Proc Means

Another way to impute missing values with zero

proc stdize data=meanAD reponly missing=0 out=meanAD_imputed;

var CSA:;

run;

Ask a Question
Discussion stats
  • 4 replies
  • 209 views
  • 6 likes
  • 3 in conversation