## total visits and average score in a datastep

Regular Contributor
Posts: 182

# total visits and average score in a datastep

Hi all,

I need to sum encounter by ID and then average the answers to the Q3  (will always be 1 or 0).  Looking for an easy way...preferable in a data step.  I can do in many separate steps...but end up having to use sql, proc tabulate etc. and then join it all together.  Is there an easier way in a data step?

Want a table that looks like this:

 ID Q3 Total Q3_Avg A 1 5 0.6 A 1 5 0.6 A 0 5 0.6 A 1 5 0.6 A 0 5 0.6 B 1 4 2 B 1 4 2 B 0 4 2 B 0 4 2

Help is appreciated!!

Super User
Posts: 23,683

## Re: total visits and average score in a datastep

What does your data start out looking like?

Please post data in a data step and the corresponding output expected. Anything you've tried is helpful too, so we don't suggest things you've already attempted.

Regular Contributor
Posts: 182

## Re: total visits and average score in a datastep

It basically starts out as:

 ID Q3 A 1 A 1 A 0 A 1 A 0 B 1 B 1 B 0 B 0

So need to add two new variables Total and Average.  I did this to get the total encounters by ID, but I'm trying to avoid all these micro steps i know how to do and just do one datastep.

proc sql;
select ID,
Count(ID) as Total
from diary004_4
group by ID;
quit;

Super User
Posts: 23,683

## Re: total visits and average score in a datastep

Do you have a data step process and you want to build this calculation into that process?

Because you're merging the data with a summary statistic you must have two passes of the data. The most efficient can vary, perhaps a DoW loop is the best in your circumstance, but another approach would be to use a single SQL step or to do the calculation separately via PROC MEANS/SQL and merge the data in.

``````******************************************************;
*Add average value to a dataset - with grouping variables;
*Solution 1 - PROC MEANS + Data step;
******************************************************;
proc means data=sashelp.class noprint nway;
class sex;
output out=avg_values mean(height)=avg_height;
run;

*sort data before merge;
proc sort data=sashelp.class out=class;
by sex;
run;

data class_data;
merge class avg_values;
by sex;

run;

proc print data=class_data;
run;

*Solution 2 - PROC SQL - note the warning in the log;
PROC SQL;
Create table class_sql as
select *, mean(height) as avg_height
from sashelp.class
group by sex;
quit;``````

Super User
Posts: 6,754

## Re: total visits and average score in a datastep

@Reeza mentioned a DOW loop in passing.  That would be my approach.  For example, assuming your data set is sorted:

data want;

total = 0;

q3_avg = 0;

do until (last.id);

set have;

by id;

total + 1;

q3_avg + q3;

end;

q3_avg = q3_avg / total;

do until (last.id);

set have;

by id;

output;

end;

run;

It's a little complex, but gets  you there in one step.

Regular Contributor
Posts: 182

## Re: total visits and average score in a datastep

This is so close. I think I miss communicated a little. I need the Q3_avg to the the SUM of Q3/Total. Right now, the output for on ID looks like:
Total=8
Q3_avg=1
when all answers to q3 is '1'...so the Q3_Ave should be 8. Make sense?
Super User
Posts: 23,683

## Re: total visits and average score in a datastep

Have you looked at PROC FREQ output?

Super User
Posts: 6,754

## Re: total visits and average score in a datastep

Not sure if that makes sense ...  when all the Q3 are 1, the sum of Q3 is 8.  And total is 8.  So Q3_avg would be 8 / 8 = 1.

I generalized the results for the first ID where the average is 0.6.  (The Q3_avg for the second ID didn't really make sense to me.)

If that's not right, maybe you could show 3 IDs in your example, all with the right Q3_avg values.

Contributor
Posts: 54

## Re: total visits and average score in a datastep

you can join these two tables:

create first table;
proc sql;
create table tablo as
count(ID) as total,
avg(q3) as aver
from tablo t1
group by
total,
aver;
*********
then inner join between fist table and given above.

(proc summarize also do this but i didnt remember)
Valued Guide
Posts: 590

## Re: total visits and average score in a datastep

Hi,

You can achieve this in many simple ways:

one way is :

proc sql;
create table need as
select id,Q3,count(id) as total,sum(Q3)/count(id) as Avg
from have
group by id;
quit;

Thanks,
Suryakiran
Super User
Posts: 13,508

## Re: total visits and average score in a datastep

jenim514 wrote:

Hi all,

I need to sum encounter by ID and then average the answers to the Q3  (will always be 1 or 0).  Looking for an easy way...preferable in a data step.  I can do in many separate steps...but end up having to use sql, proc tabulate etc. and then join it all together.  Is there an easier way in a data step?

Want a table that looks like this:

 ID Q3 Total Q3_Avg A 1 5 0.6 A 1 5 0.6 A 0 5 0.6 A 1 5 0.6 A 0 5 0.6 B 1 4 2 B 1 4 2 B 0 4 2 B 0 4 2

Help is appreciated!!

Please explain how you get a total of 5 for ID=A. This looks a lot more like a COUNT of non-missing values than a SUM. So the SUM=3 / Count=5 does yield 0.6 for Id A. But then for B you would have Sum=2 / Count=4 = 0.5 not 2. So there is something missing in your description.

Regular Contributor
Posts: 182

## Re: total visits and average score in a datastep

ok, let me think about this.  you are all right...I am looking at a table on my end that I need to duplicate, and it seemed like my explanation was right, but clearly its not.  i will look at it again and re-frame my question.   thanks all!

Super User
Posts: 13,508

## Re: total visits and average score in a datastep

jenim514 wrote:

ok, let me think about this.  you are all right...I am looking at a table on my end that I need to duplicate, and it seemed like my explanation was right, but clearly its not.  i will look at it again and re-frame my question.   thanks all!

You might ask if there is a weighting variable involved they forgot to mention. If the weight changes between A and B that could account for different results with the same calculations I proposed. Since a variable named Q3 seem like a candidate for "question 3" and questions often occur in surveys and surveys often weight data ....

PROC Star
Posts: 1,772

## Re: total visits and average score in a datastep

@jenim514  This is just for fun. I wanted to experiment how a hash alternative would work at my college lab this afternoon.

1. I concur the PROC readymeals approaches

2. "Double DOW" is slick

3. Reemerging stats in Proc sql is great

If you are a learner like me, just play

data have;

input ID \$ Q3;

datalines;

A    1

A    1

A    0

A    1

A    0

B    1

B    1

B    0

B    0

;

data _null_;

if _n_=1 then do;

if 0 then set have;

length total q3_avg  8;

dcl hash h(multidata: 'y',ordered: 'y');

h.definekey('id');

h.definedata('id', 'q3','total','q3_avg');

h.definedone();

call missing (total, q3_avg);

end;

_total = 0;

_q3_avg = 0;

do until(last.id);

set have end=last;

by id;

_total + 1;

_q3_avg + q3;

if last.id then

do;

temp = _q3_avg/_total;

rc = h.find();

q3_avg = temp;

total=_total;

if (rc = 0) then

do;

rc = h.replacedup();

rc = h.find_next();

q3_avg = temp;

total=_total;

do while(rc = 0);

rc = h.replacedup();

rc = h.find_next();

q3_avg = temp;

total=_total;

end;

end;

end;

end;

if last then h.output(dataset:'want');

run;

Cheers!

Super User
Posts: 23,683

## Re: total visits and average score in a datastep

[ Edited ]

This is just for fun. I wanted to experiment how a hash alternative would work at my college lab this afternoon.

@novinosrin If there was any uncertainty about your 'geek' status or SAS expertise consider it certain now

Discussion stats
• 14 replies
• 294 views
• 3 likes
• 7 in conversation