event_id attendance
1 17
1 17
1 17
2 3
2 3
2 3
How do I sum "attendance" only for unique even_id? Total attendance in this example should equal 20 (not 17+17+17+3+3+3).
Thanks
UNTESTED CODE
proc sql;
create table unique as select distinct event_id,attendance from have;
run;
proc means data=unique;
var attendance;
run;
UNTESTED CODE
proc sql;
create table unique as select distinct event_id,attendance from have;
run;
proc means data=unique;
var attendance;
run;
This gives me the count (N) of how many attendences there are but does not display the summary of how many people attended.
output:
N Mean StdD Min Max
940 29 30 5 100
To be clear, Summarizing all 940 (N) is not what I want. I want summary of attendance per event without duplicate. Thanks a lot!
@jgraupner wrote:
This gives me the count (N) of how many attendences there are but does not display the summary of how many people attended.
output:
N Mean StdD Min Max
940 29 30 5 100
You can add the SUM option to the PROC MEANS statement and get the sum. Or you can multiply the mean (29) by the N (940) to get the sum.
@jgraupner wrote:
event_id attendance
1 171 17
1 17
2 3
2 3
2 3
How do I sum "attendance" only for unique even_id? Total attendance in this example should equal 20 (not 17+17+17+3+3+3).
Thanks
Will attendance always be the same for each EVENT_ID?
If so, use PROC SORT with NODUPKEY to remove the duplicates and then sum using PROC MEANS or SUMMARY
Hopefully, somebody already advised you that this is not a good format to use for saving your data. Next time, listen to that person. Here's a way to "undo" the repetition:
data want;
set have;
by event_id;
if last.event_id;
total_attendance_so_far + attendance;
run;
This assumes that your data is in order by event_ID. The final total is part of the final observation.
event_id attendance respondent_id demographics
1 17 1 4
1 17 2 6
1 17 3 7
2 3 1 2
2 3 2 4
2 3 3 5
Here's a clearer picture of the data I'm working with. I merged attendance data and demographic data on event_id, thus the repetition. How exactly would you recommend a 'good format' to use to save my data. Thanks.
I would recommend keeping them separate ... one data set for event-level data and a second data set for demographic data. There's no need to merge everything together in order to analyze the data. If you come to a point where the analysis requires that you merge them, go ahead and merge them as a temporary data set for that analysis. If that were the plan, you could have gone directly to PROC MEANS on the event-level data, with no complications.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.