total visits and average score in a datastep

jenim514 · Posted 11-08-2017 03:20 PM

Hi all,

I need to sum encounter by ID and then average the answers to the Q3 (will always be 1 or 0). Looking for an easy way...preferable in a data step. I can do in many separate steps...but end up having to use sql, proc tabulate etc. and then join it all together. Is there an easier way in a data step?

Want a table that looks like this:

ID	Q3	Total	Q3_Avg
A	1	5	0.6
A	1	5	0.6
A	0	5	0.6
A	1	5	0.6
A	0	5	0.6
B	1	4	2
B	1	4	2
B	0	4	2
B	0	4	2

Help is appreciated!!

Reeza · Posted 11-08-2017 03:31 PM

What does your data start out looking like?

Please post data in a data step and the corresponding output expected. Anything you've tried is helpful too, so we don't suggest things you've already attempted.

jenim514 · Posted 11-08-2017 03:35 PM

It basically starts out as:

ID	Q3
A	1
A	1
A	0
A	1
A	0
B	1
B	1
B	0
B	0

So need to add two new variables Total and Average. I did this to get the total encounters by ID, but I'm trying to avoid all these micro steps i know how to do and just do one datastep.

proc sql;
select ID,
Count(ID) as Total
from diary004_4
group by ID;
quit;

Reeza · Posted 11-08-2017 03:59 PM

Do you have a data step process and you want to build this calculation into that process?

Because you're merging the data with a summary statistic you must have two passes of the data. The most efficient can vary, perhaps a DoW loop is the best in your circumstance, but another approach would be to use a single SQL step or to do the calculation separately via PROC MEANS/SQL and merge the data in.

******************************************************;
*Add average value to a dataset - with grouping variables;
*Solution 1 - PROC MEANS + Data step;
******************************************************;
proc means data=sashelp.class noprint nway;
class sex;
    output out=avg_values mean(height)=avg_height;
run;

*sort data before merge;
proc sort data=sashelp.class out=class;
by sex;
run;

data class_data;
 merge class avg_values;
 by sex;


run;

proc print data=class_data;
run;

*Solution 2 - PROC SQL - note the warning in the log;
PROC SQL;
Create table class_sql as
select *, mean(height) as avg_height
from sashelp.class
group by sex;
quit;

https://github.com/statgeek/SAS-Tutorials/blob/master/add_average_value_to_dataset.sas

Astounding · Posted 11-08-2017 04:06 PM

@Reeza mentioned a DOW loop in passing. That would be my approach. For example, assuming your data set is sorted:

data want;

total = 0;

q3_avg = 0;

do until (last.id);

set have;

by id;

total + 1;

q3_avg + q3;

end;

q3_avg = q3_avg / total;

do until (last.id);

set have;

by id;

output;

end;

run;

It's a little complex, but gets you there in one step.

jenim514 · Posted 11-08-2017 04:29 PM

This is so close. I think I miss communicated a little. I need the Q3_avg to the the SUM of Q3/Total. Right now, the output for on ID looks like:
Total=8
Q3_avg=1
when all answers to q3 is '1'...so the Q3_Ave should be 8. Make sense?

Reeza · Posted 11-08-2017 04:40 PM

Have you looked at PROC FREQ output?

Astounding · Posted 11-08-2017 04:40 PM

Not sure if that makes sense ... when all the Q3 are 1, the sum of Q3 is 8. And total is 8. So Q3_avg would be 8 / 8 = 1.

I generalized the results for the first ID where the average is 0.6. (The Q3_avg for the second ID didn't really make sense to me.)

If that's not right, maybe you could show 3 IDs in your example, all with the right Q3_avg values.

Yavuz · Posted 11-08-2017 04:10 PM

you can join these two tables:

create first table;
proc sql;
create table tablo as
count(ID) as total,
avg(q3) as aver
from tablo t1
group by
total,
aver;
*********
then inner join between fist table and given above.

(proc summarize also do this but i didnt remember)

SuryaKiran · Posted 11-08-2017 04:14 PM

Hi,

You can achieve this in many simple ways:

one way is :

proc sql;
create table need as
select id,Q3,count(id) as total,sum(Q3)/count(id) as Avg
from have
group by id;
quit;

Thanks,
Suryakiran

ballardw · Posted 11-08-2017 06:24 PM

@jenim514 wrote:

Hi all,

I need to sum encounter by ID and then average the answers to the Q3 (will always be 1 or 0). Looking for an easy way...preferable in a data step. I can do in many separate steps...but end up having to use sql, proc tabulate etc. and then join it all together. Is there an easier way in a data step?

Want a table that looks like this:

ID Q3 Total Q3_Avg

A 1 5 0.6

A 1 5 0.6

A 0 5 0.6

A 1 5 0.6

A 0 5 0.6

B 1 4 2

B 1 4 2

B 0 4 2

B 0 4 2

Help is appreciated!!

Please explain how you get a total of 5 for ID=A. This looks a lot more like a COUNT of non-missing values than a SUM. So the SUM=3 / Count=5 does yield 0.6 for Id A. But then for B you would have Sum=2 / Count=4 = 0.5 not 2. So there is something missing in your description.

jenim514 · Posted 11-09-2017 12:10 PM

ok, let me think about this. you are all right...I am looking at a table on my end that I need to duplicate, and it seemed like my explanation was right, but clearly its not. i will look at it again and re-frame my question. thanks all!

ballardw · Posted 11-09-2017 12:59 PM

@jenim514 wrote:

ok, let me think about this. you are all right...I am looking at a table on my end that I need to duplicate, and it seemed like my explanation was right, but clearly its not. i will look at it again and re-frame my question. thanks all!

Been there, got the headaches.

You might ask if there is a weighting variable involved they forgot to mention. If the weight changes between A and B that could account for different results with the same calculations I proposed. Since a variable named Q3 seem like a candidate for "question 3" and questions often occur in surveys and surveys often weight data ....

novinosrin · Posted 11-09-2017 04:00 PM

@jenim514 This is just for fun. I wanted to experiment how a hash alternative would work at my college lab this afternoon.

1. I concur the PROC readymeals approaches

2. "Double DOW" is slick

3. Reemerging stats in Proc sql is great

If you are a learner like me, just play

data have;

input ID $ Q3;

datalines;

A 1

A 0

A 1

A 0

B 1

B 0

;

data _null_;

if _n_=1 then do;

if 0 then set have;

length total q3_avg 8;

dcl hash h(multidata: 'y',ordered: 'y');

h.definekey('id');

h.definedata('id', 'q3','total','q3_avg');

h.definedone();

call missing (total, q3_avg);

end;

_total = 0;

_q3_avg = 0;

do until(last.id);

set have end=last;

by id;

_total + 1;

_q3_avg + q3;

h.add();

if last.id then

do;

temp = _q3_avg/_total;

rc = h.find();

q3_avg = temp;

total=_total;

if (rc = 0) then

do;

rc = h.replacedup();

rc = h.find_next();

q3_avg = temp;

total=_total;

do while(rc = 0);

rc = h.replacedup();

rc = h.find_next();

q3_avg = temp;

total=_total;

end;

if last then h.output(dataset:'want');

run;

Cheers!

Reeza · Posted 11-09-2017 04:17 PM

This is just for fun. I wanted to experiment how a hash alternative would work at my college lab this afternoon.

@novinosrin If there was any uncertainty about your 'geek' status or SAS expertise consider it certain now 🙂

total visits and average score in a datastep

Re: total visits and average score in a datastep

Re: total visits and average score in a datastep

Re: total visits and average score in a datastep

Re: total visits and average score in a datastep

Re: total visits and average score in a datastep

Re: total visits and average score in a datastep

Re: total visits and average score in a datastep

Re: total visits and average score in a datastep

Re: total visits and average score in a datastep

Re: total visits and average score in a datastep

Re: total visits and average score in a datastep

Re: total visits and average score in a datastep

Re: total visits and average score in a datastep

Re: total visits and average score in a datastep

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away