DATA Step, Macro, Functions and more

total visits and average score in a datastep

Reply
Frequent Contributor
Posts: 142

total visits and average score in a datastep

Hi all,

 

I need to sum encounter by ID and then average the answers to the Q3  (will always be 1 or 0).  Looking for an easy way...preferable in a data step.  I can do in many separate steps...but end up having to use sql, proc tabulate etc. and then join it all together.  Is there an easier way in a data step?

 

Want a table that looks like this:

 

ID Q3 Total Q3_Avg
A 1 5 0.6
A 1 5 0.6
A 0 5 0.6
A 1 5 0.6
A 0 5 0.6
B 1 4 2
B 1 4 2
B 0 4 2
B 0 4 2

 

Help is appreciated!!

Super User
Posts: 19,878

Re: total visits and average score in a datastep

What does your data start out looking like?

 

Please post data in a data step and the corresponding output expected. Anything you've tried is helpful too, so we don't suggest things you've already attempted.

Frequent Contributor
Posts: 142

Re: total visits and average score in a datastep

It basically starts out as:

 

ID Q3
A 1
A 1
A 0
A 1
A 0
B 1
B 1
B 0
B 0

 

So need to add two new variables Total and Average.  I did this to get the total encounters by ID, but I'm trying to avoid all these micro steps i know how to do and just do one datastep.

 

proc sql;
select ID,
Count(ID) as Total
from diary004_4
group by ID;
quit;

Super User
Posts: 19,878

Re: total visits and average score in a datastep

Do you have a data step process and you want to build this calculation into that process?

Because you're merging the data with a summary statistic you must have two passes of the data. The most efficient can vary, perhaps a DoW loop is the best in your circumstance, but another approach would be to use a single SQL step or to do the calculation separately via PROC MEANS/SQL and merge the data in.

 

******************************************************;
*Add average value to a dataset - with grouping variables;
*Solution 1 - PROC MEANS + Data step;
******************************************************;
proc means data=sashelp.class noprint nway;
class sex;
    output out=avg_values mean(height)=avg_height;
run;

*sort data before merge;
proc sort data=sashelp.class out=class;
by sex;
run;

data class_data;
 merge class avg_values;
 by sex;


run;

proc print data=class_data;
run;

*Solution 2 - PROC SQL - note the warning in the log;
PROC SQL;
Create table class_sql as
select *, mean(height) as avg_height
from sashelp.class
group by sex;
quit;

https://github.com/statgeek/SAS-Tutorials/blob/master/add_average_value_to_dataset.sas

Super User
Posts: 5,518

Re: total visits and average score in a datastep

@Reeza mentioned a DOW loop in passing.  That would be my approach.  For example, assuming your data set is sorted:

 

data want;

total = 0;

q3_avg = 0;

do until (last.id);

   set have;

   by id;

   total + 1;

   q3_avg + q3;

end;

q3_avg = q3_avg / total;

do until (last.id);

   set have;

   by id;

   output;

end;

run;

 

It's a little complex, but gets  you there in one step.

Frequent Contributor
Posts: 142

Re: total visits and average score in a datastep

Posted in reply to Astounding
This is so close. I think I miss communicated a little. I need the Q3_avg to the the SUM of Q3/Total. Right now, the output for on ID looks like:
Total=8
Q3_avg=1
when all answers to q3 is '1'...so the Q3_Ave should be 8. Make sense?
Super User
Posts: 19,878

Re: total visits and average score in a datastep

Have you looked at PROC FREQ output? 

Super User
Posts: 5,518

Re: total visits and average score in a datastep

Not sure if that makes sense ...  when all the Q3 are 1, the sum of Q3 is 8.  And total is 8.  So Q3_avg would be 8 / 8 = 1. 

 

I generalized the results for the first ID where the average is 0.6.  (The Q3_avg for the second ID didn't really make sense to me.)

 

If that's not right, maybe you could show 3 IDs in your example, all with the right Q3_avg values.

Contributor
Posts: 54

Re: total visits and average score in a datastep

you can join these two tables:

create first table;
proc sql;
create table tablo as
count(ID) as total,
avg(q3) as aver
from tablo t1
group by
total,
aver;
*********
then inner join between fist table and given above.

(proc summarize also do this but i didnt remember)
Frequent Contributor
Posts: 141

Re: total visits and average score in a datastep

Hi,

 

You can achieve this in many simple ways:

one way is :

 

proc sql;
create table need as
select id,Q3,count(id) as total,sum(Q3)/count(id) as Avg
from have
group by id;
quit;

Thanks,
Suryakiran
Super User
Posts: 11,343

Re: total visits and average score in a datastep


jenim514 wrote:

Hi all,

 

I need to sum encounter by ID and then average the answers to the Q3  (will always be 1 or 0).  Looking for an easy way...preferable in a data step.  I can do in many separate steps...but end up having to use sql, proc tabulate etc. and then join it all together.  Is there an easier way in a data step?

 

Want a table that looks like this:

 

ID Q3 Total Q3_Avg
A 1 5 0.6
A 1 5 0.6
A 0 5 0.6
A 1 5 0.6
A 0 5 0.6
B 1 4 2
B 1 4 2
B 0 4 2
B 0 4 2

 

Help is appreciated!!


Please explain how you get a total of 5 for ID=A. This looks a lot more like a COUNT of non-missing values than a SUM. So the SUM=3 / Count=5 does yield 0.6 for Id A. But then for B you would have Sum=2 / Count=4 = 0.5 not 2. So there is something missing in your description.

Frequent Contributor
Posts: 142

Re: total visits and average score in a datastep

ok, let me think about this.  you are all right...I am looking at a table on my end that I need to duplicate, and it seemed like my explanation was right, but clearly its not.  i will look at it again and re-frame my question.   thanks all!

  

Super User
Posts: 11,343

Re: total visits and average score in a datastep


jenim514 wrote:

ok, let me think about this.  you are all right...I am looking at a table on my end that I need to duplicate, and it seemed like my explanation was right, but clearly its not.  i will look at it again and re-frame my question.   thanks all!

  


Been there, got the headaches.

You might ask if there is a weighting variable involved they forgot to mention. If the weight changes between A and B that could account for different results with the same calculations I proposed. Since a variable named Q3 seem like a candidate for "question 3" and questions often occur in surveys and surveys often weight data ....

PROC Star
Posts: 283

Re: total visits and average score in a datastep

@jenim514  This is just for fun. I wanted to experiment how a hash alternative would work at my college lab this afternoon.

 

1. I concur the PROC readymeals approaches

2. "Double DOW" is slick

3. Reemerging stats in Proc sql is great

 

If you are a learner like me, just play

 

data have;

input ID $ Q3;

datalines;

A    1

A    1

A    0

A    1

A    0

B    1

B    1

B    0

B    0

;

 

data _null_;

if _n_=1 then do;

  if 0 then set have;

   length total q3_avg  8;

   dcl hash h(multidata: 'y',ordered: 'y');

   h.definekey('id');

   h.definedata('id', 'q3','total','q3_avg');

   h.definedone();

   call missing (total, q3_avg);

   end;

   _total = 0;

   _q3_avg = 0;

   do until(last.id);

     set have end=last;

           by id;

           _total + 1;

           _q3_avg + q3;

           h.add();

           if last.id then

        do;

                temp = _q3_avg/_total;    

            rc = h.find();

                q3_avg = temp; 

            total=_total;

                if (rc = 0) then

           do;

           rc = h.replacedup();

           rc = h.find_next();

                q3_avg = temp;

                total=_total;

            do while(rc = 0);

                rc = h.replacedup();

                rc = h.find_next();

                q3_avg = temp;

                total=_total;

                end;

           end;      

           end;

   end;

     if last then h.output(dataset:'want');          

 run;

 

Cheers!

Super User
Posts: 19,878

Re: total visits and average score in a datastep

[ Edited ]
Posted in reply to novinosrin

  This is just for fun. I wanted to experiment how a hash alternative would work at my college lab this afternoon.

 

 

@novinosrin If there was any uncertainty about your 'geek' status or SAS expertise consider it certain now Smiley Happy

Ask a Question
Discussion stats
  • 14 replies
  • 204 views
  • 3 likes
  • 7 in conversation