DATA Step, Macro, Functions and more

How to calculate sum mean of a set of OBS if duplicates present

Reply
Super Contributor
Posts: 270

How to calculate sum mean of a set of OBS if duplicates present

Dear,

I am running the following code for my the data one. I got the out I need. But in my data sevearl similar OBS present. I need to only one average value by id visit test date

 

For the subject I need to have only average for first.date (2010-05-27). In my output I am getting avg for the dates 2010-05-28, 2010-05-29. Please help.Thank you

 

 

data one;
input id visit test $ val date $19.;
datalines;
1 1 dbp 88 2010-04-28
1 1 dbp 90 2010-04-28
1 1 dbp 92 2010-04-28
2 1 dbp 100 2010-05-28
2 1 dbp 98 2010-05-28
2 1 dbp 96 2010-05-28
2 1 dbp 94 2010-05-29
2 1 dbp 92 2010-05-29
2 1 dbp 90 2010-05-29

2 1 dbp 100 2010-05-27 
2 1 dbp 98 2010-05-27
2 1 dbp 96 2010-05-27
;
proc sort data=one;
by id visit test date;
run;
data two(drop=n vsum);
set one;
by id visit test date;

vsum+val;
n+1;
if first.date then do;
n=1; vsum=val;
end;
output;
if last.date then do;
test=cats(test,'7');
val=vsum/n;
output;
end;
run;

 

output needed:

1 1 dbp 88 2010-04-28
1 1 dbp 90 2010-04-28
1 1 dbp 92 2010-04-28

1 1 dbp7 90 2010-04-28

2 1 dbp 100 2010-05-27 
2 1 dbp 98 2010-05-27
2 1 dbp 96 2010-05-27

2 1 dbp7 98 2010-05-27
2 1 dbp 100 2010-05-28
2 1 dbp 98 2010-05-28
2 1 dbp 96 2010-05-28
2 1 dbp 94 2010-05-29
2 1 dbp 92 2010-05-29
2 1 dbp 90 2010-05-29

 

Super User
Posts: 10,466

Re: How to calculate sum mean of a set of OBS if duplicates present

What is the purpose of the resulting data set?

I ask because placing summary values in side a dataset can cause all sorts of problems if you forget, or some else uses the set unaware, of the presence of the summary rows. Using that data with the inserted values in any other analysis is very likely to yield incorrect results.

 

Or suppose you accidentally re-run this step on the same data. You will then incorporate your previous average in calculating a new average.

 

data two(drop=n vsum group ldate);
   set one;
   by id visit test date;

   vsum+val;
   n+1;
   ldate=lag(date);
   if first.id then group=1;
   else if first.date and ldate ne date then group+1;
   if first.date then do;
      n=1; 
      vsum=val;
   end;
   if group = 1 then output;
   if last.date and group= 1then do;
      test=cats(test,'7');
      val=vsum/n;
      output;
   end;
run;

Please post code in the forum code box opened with the {i} menu icon. It is hard to read when formatting is removed by posting in the main message window.

 

 

By the way, you are relying on a default length for Test to be longer than your final desired with that 7 appended. If the actual length of values for TEST is 8 characters then the

test=cats(test,'7')

fails as the result is too long to fit in the default 8 characters assigned by the input in the data one.

Super Contributor
Posts: 270

Re: How to calculate sum mean of a set of OBS if duplicates present

Dear,

 

Thank you very much. That works for me. But the output data has only OBS with AVG calculated. I need to have all the OBS in the output. Please help. Thank you very much.

 

Output getting with the code:

 

1 1 dbp 88 2010-04-28
1 1 dbp 90 2010-04-28
1 1 dbp 92 2010-04-28

1 1 dbp7 90 2010-04-28

2 1 dbp 100 2010-05-27 
2 1 dbp 98 2010-05-27
2 1 dbp 96 2010-05-27

2 1 dbp7 98 2010-05-27

 

output needed:

1 1 dbp 88 2010-04-28
1 1 dbp 90 2010-04-28
1 1 dbp 92 2010-04-28

1 1 dbp7 90 2010-04-28

2 1 dbp 100 2010-05-27 
2 1 dbp 98 2010-05-27
2 1 dbp 96 2010-05-27

2 1 dbp7 98 2010-05-27
2 1 dbp 100 2010-05-28
2 1 dbp 98 2010-05-28
2 1 dbp 96 2010-05-28
2 1 dbp 94 2010-05-29
2 1 dbp 92 2010-05-29
2 1 dbp 90 2010-05-29

Super Contributor
Posts: 270

Re: How to calculate sum mean of a set of OBS if duplicates present

Dear,

 

I just posted for asking help. I modified your code to get output I need. Could please check and suggest if this is ok. Thank you very much.

 

modified code:

data two(drop=n vsum group ldate);
   set one;
   by id visit test date;
output;
   vsum+val;
   n+1;
   ldate=lag(date);
   if first.id then group=1;
   else if first.date and ldate ne date then group+1;
   if first.date then do;
      n=1; 
      vsum=val;
   end;
/*   if group = 1 then output;*/
   if last.date and group= 1 then do;
      test=cats(test,'7');
      val=vsum/n;
      output;
   end;
run;

 

Ask a Question
Discussion stats
  • 3 replies
  • 106 views
  • 1 like
  • 2 in conversation