Help using Base SAS procedures

Calculating the mean for the restricted sample using data statement

Reply
Contributor
Posts: 61

Calculating the mean for the restricted sample using data statement

Hello All,

 

1. I would like to calculate the mean for a restricted sample of Final_claims where tag_sample=1.  How would I build this into the following code?

 

2. Also, I would like to create a variable N_mean that records the number of observations (per PC4 and year) that were used in the above calculation of the mean. 

 

data final_claims;

set final_claims;

by PC4 year;

y=mean(x);

run;

 

Thanks

 

Super User
Posts: 7,831

Re: Calculating the mean for the restricted sample using data statement

[ Edited ]

Such is best done in SQL:

proc sql;
create table want as
select
  PC4, year,
  avg(x) as y,
  count(*) as n_mean
from final_claims
where tag_sample = 1
group by PC4, year
;
quit;

 Edit: moved the where condition up.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 61

Re: Calculating the mean for the restricted sample using data statement

[ Edited ]
Posted in reply to KurtBremser

1.  When running KurtBremser's code I get the following error message after the "where"-statement  

proc sql;
create table want as
select
  PC4, year,
  avg(x) as y,
  count(*) as n_mean
from final_claims
where tag_sample = 1
group by PC4, year
;
quit;

ERROR 22-322: Syntax error, expecting one of the following: ;, !, !!, &, (, *, **, +, ',', -, '.', /, <, <=, <>, =, >, >=, ?, AND,

              BETWEEN, CONTAINS, EQ, EQT, EXCEPT, GE, GET, GT, GTT, HAVING, IN, INTERSECT, IS, LE, LET, LIKE, LT, LTT, NE, NET,

              NOT, NOTIN, OR, ORDER, OUTER, UNION, ^, ^=, |, ||, ~, ~=. 

ERROR 76-322: Syntax error, statement will be ignored.

 

2. Also, I was hoping to get a VARIABLE with these values. This seems to refer to the separate table. Is it possible to directly calculate these as a variable?

Super User
Posts: 7,831

Re: Calculating the mean for the restricted sample using data statement

Ups. Please move the where condition up one line, before the group by.

I edited my post accordingly.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 7,831

Re: Calculating the mean for the restricted sample using data statement

If you want your stat re-merged, do this:

proc sql;
create table want as
select
  *,
  avg(x) as y,
  count(*) as n_mean
from final_claims
where tag_sample = 1
group by PC4, year
;
quit;

I advise against overwriting a dataset in the same step. If something happens, you need to recreate the dataset.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 61

Re: Calculating the mean for the restricted sample using data statement

Posted in reply to KurtBremser

 

 

Super User
Posts: 7,831

Re: Calculating the mean for the restricted sample using data statement

Try this:

proc sql;
create table want as
select
  a.*,
  avg(x) as y,
  count(*) as n_mean
from final_claims a
where tag_sample = 1
group by PC4, year
union all
select
  b.*,
  0 as y,
  0 as n_mean
from final_claims b
where tag_sample ne 1
;
quit;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Trusted Advisor
Posts: 1,137

Re: Calculating the mean for the restricted sample using data statement

Please use the proc sql if you want to get the mean in this case

 

data final_claims;

set final_claims;

by PC4 year;

y=mean(x);

run;

proc sql;
create table final_claims2 as select * , mean(x) as y from final_claims group by PC4, year;
quit;
Thanks,
Jag
Super User
Posts: 11,343

Re: Calculating the mean for the restricted sample using data statement

You should also be aware of the procedures designed for such:

 

proc summary data final_claims nway;

   class PC4 year;

   where tag_sample=1;

   var x;

   output out=want  mean=x_mean n=x_n;

run;

 

Note that without the nway you can get summaries for all the data meeting the where clause, PC4 only, Year only in addition to the combinations of PC4 and year. The automatic variable _type_ is used to indicate which combination the resulting record represents.

 

Also if you have many variables you can place them all on the var statement and use Mean= n= std= /autoname to create output variables with the statistic added to the variable name or even specific statistics for only some of the variables.

 

Ask a Question
Discussion stats
  • 8 replies
  • 211 views
  • 1 like
  • 4 in conversation