event_id attendance
1 17
1 17
1 17
2 3
2 3
2 3
How do I sum "attendance" only for unique even_id? Total attendance in this example should equal 20 (not 17+17+17+3+3+3).
Thanks
UNTESTED CODE
proc sql;
create table unique as select distinct event_id,attendance from have;
run;
proc means data=unique;
var attendance;
run;
UNTESTED CODE
proc sql;
create table unique as select distinct event_id,attendance from have;
run;
proc means data=unique;
var attendance;
run;
This gives me the count (N) of how many attendences there are but does not display the summary of how many people attended.
output:
N Mean StdD Min Max
940 29 30 5 100
To be clear, Summarizing all 940 (N) is not what I want. I want summary of attendance per event without duplicate. Thanks a lot!
@jgraupner wrote:
This gives me the count (N) of how many attendences there are but does not display the summary of how many people attended.
output:
N Mean StdD Min Max
940 29 30 5 100
You can add the SUM option to the PROC MEANS statement and get the sum. Or you can multiply the mean (29) by the N (940) to get the sum.
@jgraupner wrote:
event_id attendance
1 171 17
1 17
2 3
2 3
2 3
How do I sum "attendance" only for unique even_id? Total attendance in this example should equal 20 (not 17+17+17+3+3+3).
Thanks
Will attendance always be the same for each EVENT_ID?
If so, use PROC SORT with NODUPKEY to remove the duplicates and then sum using PROC MEANS or SUMMARY
Hopefully, somebody already advised you that this is not a good format to use for saving your data. Next time, listen to that person. Here's a way to "undo" the repetition:
data want;
set have;
by event_id;
if last.event_id;
total_attendance_so_far + attendance;
run;
This assumes that your data is in order by event_ID. The final total is part of the final observation.
event_id attendance respondent_id demographics
1 17 1 4
1 17 2 6
1 17 3 7
2 3 1 2
2 3 2 4
2 3 3 5
Here's a clearer picture of the data I'm working with. I merged attendance data and demographic data on event_id, thus the repetition. How exactly would you recommend a 'good format' to use to save my data. Thanks.
I would recommend keeping them separate ... one data set for event-level data and a second data set for demographic data. There's no need to merge everything together in order to analyze the data. If you come to a point where the analysis requires that you merge them, go ahead and merge them as a temporary data set for that analysis. If that were the plan, you could have gone directly to PROC MEANS on the event-level data, with no complications.
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.