turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- Proc means summary statistics, join etc

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-16-2017 09:08 PM

Ok, so here is the situation-

I have a dataset called Fish which has data onfish in 120 lakes, detailed in 120 rows and 15 columns including lake name. The scope of my project is to compare each lake to other similar lakes to see whether there are differences in eight numeric characteristics (mercury level, number offish, elevation, surface area, depth, drainage area, runoff factor, and flushing rate). Because not much is known about the distribution of the values we want to compare a lake's value to both the mean and median of all other lakes in the data set with the same value of lake type (LT) and dam status (DAM).

The first step is to use PROC MEANS to calculate the mean and median for each of the remaining numeric variables (all eight of

them) separately for each combination of LT and DAM. Lakes that don't have a value of LT or DAM reported need to

be included in this analysis. I need to use ODS to save the results to a data set named STATZ and ensure the results are not

printed to any other destination. Here is my code to do this task and I believe its correct, but just wanted to confirm from you guys.

```
Proc means data = fish ;
class lt/missing; class dam/missing;
var hg n elv sa z da rf fr ;
output out = statz1 mean = median = /autoname;
run;
```

The next step is to combine these summary statistics with the original data set using an appropriate data step merge/join technique. This new data set is FISH2.

Im confused here as Im not sure on what exactly I have to join summary dataset and original one on?

The final step, after the data sets are joined, but in the same DATA step , I need to do the following using arrays.

Compute the following for each 8 variables.

{ Difference from the mean [variable - mean.of.variable]

{ Percent difference from the mean [(variable - mean.of.variable)/mean.of.variable]

{ Difference from the median [variable - median.of.variable]

{ Percent difference from the median [(variable - median.of.variable)/median.of.variable]

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to devsas

10-16-2017 09:38 PM - edited 10-16-2017 09:38 PM

Merging data

Arrays:

https://stats.idre.ucla.edu/sas/seminars/sas-arrays/

devsas wrote:

Ok, so here is the situation-

I have a dataset called Fish which has data onfish in 120 lakes, detailed in 120 rows and 15 columns including lake name. The scope of my project is to compare each lake to other similar lakes to see whether there are differences in eight numeric characteristics (mercury level, number offish, elevation, surface area, depth, drainage area, runoff factor, and flushing rate). Because not much is known about the distribution of the values we want to compare a lake's value to both the mean and median of all other lakes in the data set with the same value of lake type (LT) and dam status (DAM).

The first step is to use PROC MEANS to calculate the mean and median for each of the remaining numeric variables (all eight of

them) separately for each combination of LT and DAM. Lakes that don't have a value of LT or DAM reported need to

be included in this analysis. I need to use ODS to save the results to a data set named STATZ and ensure the results are not

printed to any other destination. Here is my code to do this task and I believe its correct, but just wanted to confirm from you guys.`Proc means data = fish ; class lt/missing; class dam/missing; var hg n elv sa z da rf fr ; output out = statz1 mean = median = /autoname; run;`

The next step is to combine these summary statistics with the original data set using an appropriate data step merge/join technique. This new data set is FISH2.

Im confused here as Im not sure on what exactly I have to join summary dataset and original one on?

The final step, after the data sets are joined, but in the same DATA step , I need to do the following using arrays.

Compute the following for each 8 variables.

{ Difference from the mean [variable - mean.of.variable]

{ Percent difference from the mean [(variable - mean.of.variable)/mean.of.variable]

{ Difference from the median [variable - median.of.variable]

{ Percent difference from the median [(variable - median.of.variable)/median.of.variable]

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to devsas

10-16-2017 09:40 PM

Like this?

```
proc summary data=FISH missing nway;
class LT DAM;
var HG N ELV SA Z DA RF FR ;
output out=SUM mean= median= /autoname;
run;
proc sql;
```

create table FISH2 as
select LAKENAME
,HG-HG_MEAN as DIFF_AVG_HG
,HG-HG_MEDIAN as DIFF_MED_HG
,etc
from FISH
left join
SUM
on HAVE.LT=SUM.LT and HAVE.DAM=SUM.DAM ;
quit;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ChrisNZ

10-17-2017 12:37 PM

Thanks so much! I see that proc summary gets pretty much the same results as my code using proc means do, except that _type_ = 3 is the only one in your results. Can you please explain the significance of that and how can I get the same result using proc means?

I believe I will do the merge using the data step and use arrays for repetitive statements.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to devsas

10-17-2017 12:41 PM

Then you need to post what you're using/trying.

I suspect you're missing the NWAY option in PROC MEANS. Note that MEANS/SUMMARY/UNIVARIATE are all very similar in how they work and in a situation like yours pretty much interchangeable.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

10-17-2017 03:11 PM

Yes, thanks you are right. I used the nway option and got the same results. Im still trying to understand though what it means. Doing some reading, but still not clear.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to devsas

10-17-2017 03:25 PM

The documentation is pretty clear.

## NWAY

limits the output statistics to the observations with the highest _TYPE_ value.

_TYPE_ is defined when you use CLASS statements. You can also control the levels reported using the WAYS and TYPES statements.

I suggest searching PROC MEANS on the LexJansen website and see what it's capable of, its a really powerful procedure.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to devsas

10-16-2017 11:51 PM

Looks like you could do the whole thing with a single SQL query (I added a macro to save on typing)

```
%macro stats(v);
&v. - mean(&v.) as &v._err,
(&v. - mean(&v.)) / mean(&v.) as &v._err_pct format=percent7.1,
&v. - median(&v.) as &v._diff,
(&v. - median(&v.)) / median(&v.) as &v._diff_pct format=percent7.1
%mend stats;
proc sql;
create table fish2 as
select
*,
%stats(hg),
%stats(n),
%stats(elv),
%stats(sa),
%stats(z),
%stats(da),
%stats(rf),
%stats(fr)
from fish
group by lt, dam;
quit;
```

PG

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PGStats

10-17-2017 12:39 PM

Thanks PGSTATS, your solution works too, its just that for this exercise I want to do it using proc means and arrays only to understand these better.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PGStats

10-17-2017 03:09 PM

Ok, I tried doing it with data step and arrays, but got stuck again. Here is my data step merge code, not sure how can I integrate array statement here to calculate mean, median differences and percent differences?

```
data fish2_ (keep = name hg n elv sa z da rf fr hg_mean n_mean elv_mean sa_mean z_mean da_mean rf_mean fr_mean hg_median n_median elv_median sa_median z_median da_median rf_median fr_median) ;
merge fish (in = x) statz2 (in = y);
by lt dam;
if x = 1;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to devsas

10-17-2017 04:53 PM

Ok, so it seems I have nailed the first two steps using proc means and data step merge, but the last step using arrays im still not getting it. Here is my complete code-

```
Proc means data = fish missing nway noprint ;
class lt/missing; class dam/missing;
var hg n elv sa z da rf fr ;
output out = statz2 mean = median = /autoname;
run;
data fish2_ (keep = name hg n elv sa z da rf fr hg_mean n_mean elv_mean sa_mean z_mean da_mean rf_mean fr_mean hg_median n_median elv_median sa_median z_median da_median rf_median fr_median) ;
merge fish (in = x) statz2 (in = y);
by lt dam;
if x = 1;
run;
*using arrays to calculate mean difference only;
data fish_f;
set fish2_;
array all{*} hg n elv sa z da rf fr;
array all_mean{*} hg_mean n_mean elv_mean sa_mean z_mean da_mean rf_mean fr_mean ;
do i=1 to dim(all);
diff[i]= all_mean{i} - all{i};
end;
run;
```

Please note that I only want to use data step and arrays to solve this problem

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to devsas

10-17-2017 05:15 PM

`diff[i]`

Where did you define a DIFF array. I'm sure your log has errors....read the log. ALWAYS.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

10-17-2017 09:48 PM

Yes, that was a mistake. But I defined it now and still get errors-

```
data fish_f;
set fish2_;
array all{*} hg n elv sa z da rf fr;
array all_mean{*} hg_mean n_mean elv_mean sa_mean z_mean da_mean rf_mean fr_mean ;
array diff{*} diff_all_mean-diff_all;
do i=1 to dim(all);
diff[i]= all_mean{i} - all{i};
end;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to devsas

10-17-2017 09:55 PM

Posting the log would be really helpful.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to error_prone

10-17-2017 09:57 PM

319 array diff{*} diff_all_mean-diff_all; ERROR: Missing numeric suffix on a numbered variable list (diff_all_mean-diff_all). WARNING: Defining an array with zero elements. 320 do i=1 to dim(all);