DATA Step, Macro, Functions and more

Sum based on unique ID

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 7
Accepted Solution

Sum based on unique ID

event_id          attendance 
1                        17

1                        17

1                        17

2                         3

2                         3

2                         3

 

 

How do I sum "attendance" only for unique even_id? Total attendance in this example should equal 20 (not 17+17+17+3+3+3).

 

Thanks


Accepted Solutions
Solution
‎02-20-2018 04:33 PM
Respected Advisor
Posts: 3,275

Re: Sum based on unique ID

Posted in reply to jgraupner

UNTESTED CODE

 

proc sql;
      create table unique as select distinct event_id,attendance from have;
run;
proc means data=unique;
    var attendance;
run;
--
Paige Miller

View solution in original post


All Replies
Solution
‎02-20-2018 04:33 PM
Respected Advisor
Posts: 3,275

Re: Sum based on unique ID

Posted in reply to jgraupner

UNTESTED CODE

 

proc sql;
      create table unique as select distinct event_id,attendance from have;
run;
proc means data=unique;
    var attendance;
run;
--
Paige Miller
Occasional Contributor
Posts: 7

Re: Sum based on unique ID

Posted in reply to PaigeMiller

This gives me the count (N) of how many attendences there are but does not display the summary of how many people attended. 

 

output:

 

N    Mean  StdD   Min   Max

940    29     30        5      100

 

To be clear, Summarizing all 940 (N) is not what I want. I want summary of attendance per event without duplicate. Thanks a lot!

Respected Advisor
Posts: 3,275

Re: Sum based on unique ID

[ Edited ]
Posted in reply to jgraupner

jgraupner wrote:

This gives me the count (N) of how many attendences there are but does not display the summary of how many people attended. 

 

output:

 

N    Mean  StdD   Min   Max

940    29     30        5      100

 

 


You can add the SUM option to the PROC MEANS statement and get the sum. Or you can multiply the mean (29) by the N (940) to get the sum. 

 

--
Paige Miller
Super User
Posts: 24,004

Re: Sum based on unique ID

Posted in reply to jgraupner

jgraupner wrote:

event_id          attendance 
1                        17

1                        17

1                        17

2                         3

2                         3

2                         3

 

 

How do I sum "attendance" only for unique even_id? Total attendance in this example should equal 20 (not 17+17+17+3+3+3).

 

Thanks


Will attendance always be the same  for each EVENT_ID?

If so, use PROC SORT with NODUPKEY to remove the duplicates and then sum using PROC MEANS or SUMMARY

Super User
Posts: 6,933

Re: Sum based on unique ID

Posted in reply to jgraupner

Hopefully, somebody already advised you that this is not a good format to use for saving your data.  Next time, listen to that person.  Here's a way to "undo" the repetition:

 

data want;

set have;

by event_id;

if last.event_id;

total_attendance_so_far + attendance;

run;

 

This assumes that your data is in order by event_ID.  The final total is part of the final observation.

Occasional Contributor
Posts: 7

Re: Sum based on unique ID

Posted in reply to Astounding

event_id          attendance      respondent_id            demographics
1                        17                           1                                4

1                        17                           2                                6

1                        17                           3                                7

2                         3                            1                                2

2                         3                            2                                4

2                         3                            3                                5

 

 

Here's a clearer picture of the data I'm working with. I merged attendance data and demographic data on event_id, thus the repetition. How exactly would you recommend a 'good format' to use to save my data. Thanks.

Super User
Posts: 6,933

Re: Sum based on unique ID

Posted in reply to jgraupner

I would recommend keeping them separate ... one data set for event-level data and a second data set for demographic data.  There's no need to merge everything together in order to analyze the data.  If you come to a point where the analysis requires that you merge them, go ahead and merge them as a temporary data set for that analysis.  If that were the plan, you could have gone directly to PROC MEANS on the event-level data, with no complications.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 187 views
  • 1 like
  • 4 in conversation