Hi all,
I need to sum encounter by ID and then average the answers to the Q3 (will always be 1 or 0). Looking for an easy way...preferable in a data step. I can do in many separate steps...but end up having to use sql, proc tabulate etc. and then join it all together. Is there an easier way in a data step?
Want a table that looks like this:
ID | Q3 | Total | Q3_Avg |
A | 1 | 5 | 0.6 |
A | 1 | 5 | 0.6 |
A | 0 | 5 | 0.6 |
A | 1 | 5 | 0.6 |
A | 0 | 5 | 0.6 |
B | 1 | 4 | 2 |
B | 1 | 4 | 2 |
B | 0 | 4 | 2 |
B | 0 | 4 | 2 |
Help is appreciated!!
What does your data start out looking like?
Please post data in a data step and the corresponding output expected. Anything you've tried is helpful too, so we don't suggest things you've already attempted.
It basically starts out as:
ID | Q3 |
A | 1 |
A | 1 |
A | 0 |
A | 1 |
A | 0 |
B | 1 |
B | 1 |
B | 0 |
B | 0 |
So need to add two new variables Total and Average. I did this to get the total encounters by ID, but I'm trying to avoid all these micro steps i know how to do and just do one datastep.
proc sql;
select ID,
Count(ID) as Total
from diary004_4
group by ID;
quit;
Do you have a data step process and you want to build this calculation into that process?
Because you're merging the data with a summary statistic you must have two passes of the data. The most efficient can vary, perhaps a DoW loop is the best in your circumstance, but another approach would be to use a single SQL step or to do the calculation separately via PROC MEANS/SQL and merge the data in.
******************************************************;
*Add average value to a dataset - with grouping variables;
*Solution 1 - PROC MEANS + Data step;
******************************************************;
proc means data=sashelp.class noprint nway;
class sex;
output out=avg_values mean(height)=avg_height;
run;
*sort data before merge;
proc sort data=sashelp.class out=class;
by sex;
run;
data class_data;
merge class avg_values;
by sex;
run;
proc print data=class_data;
run;
*Solution 2 - PROC SQL - note the warning in the log;
PROC SQL;
Create table class_sql as
select *, mean(height) as avg_height
from sashelp.class
group by sex;
quit;
https://github.com/statgeek/SAS-Tutorials/blob/master/add_average_value_to_dataset.sas
@Reeza mentioned a DOW loop in passing. That would be my approach. For example, assuming your data set is sorted:
data want;
total = 0;
q3_avg = 0;
do until (last.id);
set have;
by id;
total + 1;
q3_avg + q3;
end;
q3_avg = q3_avg / total;
do until (last.id);
set have;
by id;
output;
end;
run;
It's a little complex, but gets you there in one step.
Have you looked at PROC FREQ output?
Not sure if that makes sense ... when all the Q3 are 1, the sum of Q3 is 8. And total is 8. So Q3_avg would be 8 / 8 = 1.
I generalized the results for the first ID where the average is 0.6. (The Q3_avg for the second ID didn't really make sense to me.)
If that's not right, maybe you could show 3 IDs in your example, all with the right Q3_avg values.
Hi,
You can achieve this in many simple ways:
one way is :
proc sql;
create table need as
select id,Q3,count(id) as total,sum(Q3)/count(id) as Avg
from have
group by id;
quit;
@jenim514 wrote:
Hi all,
I need to sum encounter by ID and then average the answers to the Q3 (will always be 1 or 0). Looking for an easy way...preferable in a data step. I can do in many separate steps...but end up having to use sql, proc tabulate etc. and then join it all together. Is there an easier way in a data step?
Want a table that looks like this:
ID Q3 Total Q3_Avg A 1 5 0.6 A 1 5 0.6 A 0 5 0.6 A 1 5 0.6 A 0 5 0.6 B 1 4 2 B 1 4 2 B 0 4 2 B 0 4 2
Help is appreciated!!
Please explain how you get a total of 5 for ID=A. This looks a lot more like a COUNT of non-missing values than a SUM. So the SUM=3 / Count=5 does yield 0.6 for Id A. But then for B you would have Sum=2 / Count=4 = 0.5 not 2. So there is something missing in your description.
ok, let me think about this. you are all right...I am looking at a table on my end that I need to duplicate, and it seemed like my explanation was right, but clearly its not. i will look at it again and re-frame my question. thanks all!
@jenim514 wrote:
ok, let me think about this. you are all right...I am looking at a table on my end that I need to duplicate, and it seemed like my explanation was right, but clearly its not. i will look at it again and re-frame my question. thanks all!
Been there, got the headaches.
You might ask if there is a weighting variable involved they forgot to mention. If the weight changes between A and B that could account for different results with the same calculations I proposed. Since a variable named Q3 seem like a candidate for "question 3" and questions often occur in surveys and surveys often weight data ....
@jenim514 This is just for fun. I wanted to experiment how a hash alternative would work at my college lab this afternoon.
1. I concur the PROC readymeals approaches
2. "Double DOW" is slick
3. Reemerging stats in Proc sql is great
If you are a learner like me, just play
data have;
input ID $ Q3;
datalines;
A 1
A 1
A 0
A 1
A 0
B 1
B 1
B 0
B 0
;
data _null_;
if _n_=1 then do;
if 0 then set have;
length total q3_avg 8;
dcl hash h(multidata: 'y',ordered: 'y');
h.definekey('id');
h.definedata('id', 'q3','total','q3_avg');
h.definedone();
call missing (total, q3_avg);
end;
_total = 0;
_q3_avg = 0;
do until(last.id);
set have end=last;
by id;
_total + 1;
_q3_avg + q3;
h.add();
if last.id then
do;
temp = _q3_avg/_total;
rc = h.find();
q3_avg = temp;
total=_total;
if (rc = 0) then
do;
rc = h.replacedup();
rc = h.find_next();
q3_avg = temp;
total=_total;
do while(rc = 0);
rc = h.replacedup();
rc = h.find_next();
q3_avg = temp;
total=_total;
end;
end;
end;
end;
if last then h.output(dataset:'want');
run;
Cheers!
This is just for fun. I wanted to experiment how a hash alternative would work at my college lab this afternoon.
@novinosrin If there was any uncertainty about your 'geek' status or SAS expertise consider it certain now 🙂
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.