BookmarkSubscribeRSS Feed
jenim514
Pyrite | Level 9

Hi all,

 

I need to sum encounter by ID and then average the answers to the Q3  (will always be 1 or 0).  Looking for an easy way...preferable in a data step.  I can do in many separate steps...but end up having to use sql, proc tabulate etc. and then join it all together.  Is there an easier way in a data step?

 

Want a table that looks like this:

 

ID Q3 Total Q3_Avg
A 1 5 0.6
A 1 5 0.6
A 0 5 0.6
A 1 5 0.6
A 0 5 0.6
B 1 4 2
B 1 4 2
B 0 4 2
B 0 4 2

 

Help is appreciated!!

14 REPLIES 14
Reeza
Super User

What does your data start out looking like?

 

Please post data in a data step and the corresponding output expected. Anything you've tried is helpful too, so we don't suggest things you've already attempted.

jenim514
Pyrite | Level 9

It basically starts out as:

 

ID Q3
A 1
A 1
A 0
A 1
A 0
B 1
B 1
B 0
B 0

 

So need to add two new variables Total and Average.  I did this to get the total encounters by ID, but I'm trying to avoid all these micro steps i know how to do and just do one datastep.

 

proc sql;
select ID,
Count(ID) as Total
from diary004_4
group by ID;
quit;

Reeza
Super User

Do you have a data step process and you want to build this calculation into that process?

Because you're merging the data with a summary statistic you must have two passes of the data. The most efficient can vary, perhaps a DoW loop is the best in your circumstance, but another approach would be to use a single SQL step or to do the calculation separately via PROC MEANS/SQL and merge the data in.

 

******************************************************;
*Add average value to a dataset - with grouping variables;
*Solution 1 - PROC MEANS + Data step;
******************************************************;
proc means data=sashelp.class noprint nway;
class sex;
    output out=avg_values mean(height)=avg_height;
run;

*sort data before merge;
proc sort data=sashelp.class out=class;
by sex;
run;

data class_data;
 merge class avg_values;
 by sex;


run;

proc print data=class_data;
run;

*Solution 2 - PROC SQL - note the warning in the log;
PROC SQL;
Create table class_sql as
select *, mean(height) as avg_height
from sashelp.class
group by sex;
quit;

https://github.com/statgeek/SAS-Tutorials/blob/master/add_average_value_to_dataset.sas

Astounding
PROC Star

@Reeza mentioned a DOW loop in passing.  That would be my approach.  For example, assuming your data set is sorted:

 

data want;

total = 0;

q3_avg = 0;

do until (last.id);

   set have;

   by id;

   total + 1;

   q3_avg + q3;

end;

q3_avg = q3_avg / total;

do until (last.id);

   set have;

   by id;

   output;

end;

run;

 

It's a little complex, but gets  you there in one step.

jenim514
Pyrite | Level 9
This is so close. I think I miss communicated a little. I need the Q3_avg to the the SUM of Q3/Total. Right now, the output for on ID looks like:
Total=8
Q3_avg=1
when all answers to q3 is '1'...so the Q3_Ave should be 8. Make sense?
Reeza
Super User

Have you looked at PROC FREQ output? 

Astounding
PROC Star

Not sure if that makes sense ...  when all the Q3 are 1, the sum of Q3 is 8.  And total is 8.  So Q3_avg would be 8 / 8 = 1. 

 

I generalized the results for the first ID where the average is 0.6.  (The Q3_avg for the second ID didn't really make sense to me.)

 

If that's not right, maybe you could show 3 IDs in your example, all with the right Q3_avg values.

Yavuz
Quartz | Level 8
you can join these two tables:

create first table;
proc sql;
create table tablo as
count(ID) as total,
avg(q3) as aver
from tablo t1
group by
total,
aver;
*********
then inner join between fist table and given above.

(proc summarize also do this but i didnt remember)
SuryaKiran
Meteorite | Level 14

Hi,

 

You can achieve this in many simple ways:

one way is :

 

proc sql;
create table need as
select id,Q3,count(id) as total,sum(Q3)/count(id) as Avg
from have
group by id;
quit;

Thanks,
Suryakiran
ballardw
Super User

@jenim514 wrote:

Hi all,

 

I need to sum encounter by ID and then average the answers to the Q3  (will always be 1 or 0).  Looking for an easy way...preferable in a data step.  I can do in many separate steps...but end up having to use sql, proc tabulate etc. and then join it all together.  Is there an easier way in a data step?

 

Want a table that looks like this:

 

ID Q3 Total Q3_Avg
A 1 5 0.6
A 1 5 0.6
A 0 5 0.6
A 1 5 0.6
A 0 5 0.6
B 1 4 2
B 1 4 2
B 0 4 2
B 0 4 2

 

Help is appreciated!!


Please explain how you get a total of 5 for ID=A. This looks a lot more like a COUNT of non-missing values than a SUM. So the SUM=3 / Count=5 does yield 0.6 for Id A. But then for B you would have Sum=2 / Count=4 = 0.5 not 2. So there is something missing in your description.

jenim514
Pyrite | Level 9

ok, let me think about this.  you are all right...I am looking at a table on my end that I need to duplicate, and it seemed like my explanation was right, but clearly its not.  i will look at it again and re-frame my question.   thanks all!

  

ballardw
Super User

@jenim514 wrote:

ok, let me think about this.  you are all right...I am looking at a table on my end that I need to duplicate, and it seemed like my explanation was right, but clearly its not.  i will look at it again and re-frame my question.   thanks all!

  


Been there, got the headaches.

You might ask if there is a weighting variable involved they forgot to mention. If the weight changes between A and B that could account for different results with the same calculations I proposed. Since a variable named Q3 seem like a candidate for "question 3" and questions often occur in surveys and surveys often weight data ....

novinosrin
Tourmaline | Level 20

@jenim514  This is just for fun. I wanted to experiment how a hash alternative would work at my college lab this afternoon.

 

1. I concur the PROC readymeals approaches

2. "Double DOW" is slick

3. Reemerging stats in Proc sql is great

 

If you are a learner like me, just play

 

data have;

input ID $ Q3;

datalines;

A    1

A    1

A    0

A    1

A    0

B    1

B    1

B    0

B    0

;

 

data _null_;

if _n_=1 then do;

  if 0 then set have;

   length total q3_avg  8;

   dcl hash h(multidata: 'y',ordered: 'y');

   h.definekey('id');

   h.definedata('id', 'q3','total','q3_avg');

   h.definedone();

   call missing (total, q3_avg);

   end;

   _total = 0;

   _q3_avg = 0;

   do until(last.id);

     set have end=last;

           by id;

           _total + 1;

           _q3_avg + q3;

           h.add();

           if last.id then

        do;

                temp = _q3_avg/_total;    

            rc = h.find();

                q3_avg = temp; 

            total=_total;

                if (rc = 0) then

           do;

           rc = h.replacedup();

           rc = h.find_next();

                q3_avg = temp;

                total=_total;

            do while(rc = 0);

                rc = h.replacedup();

                rc = h.find_next();

                q3_avg = temp;

                total=_total;

                end;

           end;      

           end;

   end;

     if last then h.output(dataset:'want');          

 run;

 

Cheers!

Reeza
Super User

  This is just for fun. I wanted to experiment how a hash alternative would work at my college lab this afternoon.

 

 

@novinosrin If there was any uncertainty about your 'geek' status or SAS expertise consider it certain now 🙂

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 14 replies
  • 3258 views
  • 3 likes
  • 7 in conversation