Summing over observations...first.id

Reply
N/A
Posts: 0

Summing over observations...first.id

I have data in the following format (two variables: id and dispensed):

id dispensed
1 yes
1 yes
1 yes
1 yes
2 no
2 no
3 yes
....and so on.

I need to calculate the total number of 'dispensed' (sum of 'yes' + 'no's). Each id should only be counted once. So I am looking to calculate something like the following:

total_dispensed=3
dispensed_yes=2
dispensed_no=1

How do I calculate this? Thank you very much!
SAS Super FREQ
Posts: 8,645

Re: Summing over observations...first.id

Hi:
It sort of depends on whether you want a summary dataset as the final result or want the information to be in the detail data. However, assuming that you want just a file of information that can give you the totals and information by ID and then has a summary line, look at the DATA step program and the first PROC REPORT which produces this output:[pre]
Dispensed Information by ID
Dispensed Dispensed Total Total
Flag =Yes =No Yes No
Num Obs Dispensed This This This This
ID This ID This ID ID ID ID ID
1 4 1 1 0 4 0
2 2 1 0 1 0 2
3 1 1 1 0 1 0
========= ========= ========= ========= ========= =========
7 3 2 1 5 2
========= ========= ========= ========= ========= =========
[/pre]

To just produce a line of SUMMARY information, look at the DATA step program and the second PROC REPORT which produces this output.[pre]

Overall Summary Dispensed Information
Dispensed Dispensed Total Total
Flag =Yes =No Yes No
Num Obs Dispensed All All All All
All IDs All IDs IDs IDs IDs IDs
7 3 2 1 5 2
[/pre]
There are other ways to approach this, such as setting flags in the original data, when you read the file, but it sounds like you already have a data file in SAS format. You probably will not want all the calculated variables in the final output, however, I thought it might be useful to show the difference between accumulating a counter for each ID versus setting a numeric flag of 0 or 1 for each ID.

Good luck!
cynthia

[pre]
*** the code;
** 1) Make some data;
data disp;
input id dispensed $;
return;
datalines;
1 yes
1 yes
1 yes
1 yes
2 no
2 no
3 yes
;
run;

** 2) Sort by ID;
proc sort data=disp;
by id;
run;

** 3) Set up some flags to catch the yes, no;
** and at the end of each ID, add them up.;
** Note that logic does not account for scenario;
** where an ID had both yes AND no -- for example,
** if that was an error condition.;
** CALCDISP file is a SUMMARY file.;
data calcdisp;
set disp; by id;
retain disp_yes disp_no
num_yes num_no id_cnt;

if first.id then do;
disp_yes = 0;
disp_no = 0;
num_yes = 0;
num_no = 0;
id_cnt = 0;
end;
id_cnt + 1;
if dispensed = 'yes' then num_yes + 1;
else if dispensed = 'no' then num_no + 1;
if last.id then do;
if num_yes ge 1 then disp_yes = 1;
else disp_yes = 0;
if num_no ge 1 then disp_no = 1;
else disp_no = 0;
tot_disp = disp_yes + disp_no;
output;
end;
run;

** 4) Now use Proc Report and add up the flags and counters;
** to show the difference between the flags with 1 or 0;
** and the counters for each ID.;
proc report data=calcdisp split='*' nowd;
title 'Dispensed Information by ID';
column id id_cnt tot_disp disp_yes
disp_no num_yes num_no;
define id / order 'ID';
define id_cnt / 'Num Obs*This ID';
define tot_disp / 'Flag*Dispensed*This ID';
define disp_yes / 'Dispensed*=Yes*This*ID';
define disp_no / 'Dispensed*=No*This*ID';
define num_yes / 'Total*Yes*This*ID';
define num_no / 'Total*No*This*ID';
rbreak after /summarize dol dul;
run;

proc report data=calcdisp split='*' nowd;
title 'Overall Summary Dispensed Information';
column id_cnt tot_disp disp_yes
disp_no num_yes num_no;
define id_cnt / 'Num Obs*All IDs';
define tot_disp / 'Flag*Dispensed*All IDs';
define disp_yes / 'Dispensed*=Yes*All*IDs';
define disp_no / 'Dispensed*=No*All*IDs';
define num_yes / 'Total*Yes*All*IDs';
define num_no / 'Total*No*All*IDs';
run;
[/pre]
Post a Question
Discussion Stats
  • 1 reply
  • 260 views
  • 0 likes
  • 2 in conversation