Pyrite | Level 9

## Calculating the mean for the restricted sample using data statement

Hello All,

1. I would like to calculate the mean for a restricted sample of Final_claims where tag_sample=1.  How would I build this into the following code?

2. Also, I would like to create a variable N_mean that records the number of observations (per PC4 and year) that were used in the above calculation of the mean.

data final_claims;

set final_claims;

by PC4 year;

y=mean(x);

run;

Thanks

8 REPLIES 8
Super User

## Re: Calculating the mean for the restricted sample using data statement

Such is best done in SQL:

``````proc sql;
create table want as
select
PC4, year,
avg(x) as y,
count(*) as n_mean
from final_claims
where tag_sample = 1
group by PC4, year
;
quit;``````

Edit: moved the where condition up.

Pyrite | Level 9

## Re: Calculating the mean for the restricted sample using data statement

1.  When running KurtBremser's code I get the following error message after the "where"-statement

``````proc sql;
create table want as
select
PC4, year,
avg(x) as y,
count(*) as n_mean
from final_claims
where tag_sample = 1
group by PC4, year
;
quit;``````

ERROR 22-322: Syntax error, expecting one of the following: ;, !, !!, &, (, *, **, +, ',', -, '.', /, <, <=, <>, =, >, >=, ?, AND,

BETWEEN, CONTAINS, EQ, EQT, EXCEPT, GE, GET, GT, GTT, HAVING, IN, INTERSECT, IS, LE, LET, LIKE, LT, LTT, NE, NET,

NOT, NOTIN, OR, ORDER, OUTER, UNION, ^, ^=, |, ||, ~, ~=.

ERROR 76-322: Syntax error, statement will be ignored.

2. Also, I was hoping to get a VARIABLE with these values. This seems to refer to the separate table. Is it possible to directly calculate these as a variable?

Super User

## Re: Calculating the mean for the restricted sample using data statement

Ups. Please move the where condition up one line, before the group by.

I edited my post accordingly.

Super User

## Re: Calculating the mean for the restricted sample using data statement

If you want your stat re-merged, do this:

``````proc sql;
create table want as
select
*,
avg(x) as y,
count(*) as n_mean
from final_claims
where tag_sample = 1
group by PC4, year
;
quit;``````

I advise against overwriting a dataset in the same step. If something happens, you need to recreate the dataset.

Pyrite | Level 9

Super User

## Re: Calculating the mean for the restricted sample using data statement

Try this:

``````proc sql;
create table want as
select
a.*,
avg(x) as y,
count(*) as n_mean
from final_claims a
where tag_sample = 1
group by PC4, year
union all
select
b.*,
0 as y,
0 as n_mean
from final_claims b
where tag_sample ne 1
;
quit;``````
Amethyst | Level 16

## Re: Calculating the mean for the restricted sample using data statement

Please use the proc sql if you want to get the mean in this case

``````data final_claims;

set final_claims;

by PC4 year;

y=mean(x);

run;

proc sql;
create table final_claims2 as select * , mean(x) as y from final_claims group by PC4, year;
quit;
``````
Thanks,
Jag
Super User

## Re: Calculating the mean for the restricted sample using data statement

You should also be aware of the procedures designed for such:

proc summary data final_claims nway;

class PC4 year;

where tag_sample=1;

var x;

output out=want  mean=x_mean n=x_n;

run;

Note that without the nway you can get summaries for all the data meeting the where clause, PC4 only, Year only in addition to the combinations of PC4 and year. The automatic variable _type_ is used to indicate which combination the resulting record represents.

Also if you have many variables you can place them all on the var statement and use Mean= n= std= /autoname to create output variables with the statistic added to the variable name or even specific statistics for only some of the variables.

Discussion stats
• 8 replies
• 1559 views
• 1 like
• 4 in conversation