HI,
I need help to get codes for data aggregating
I have a dataset of lab results where some of the values are repeating on the same date. I would like to create a new variable where I can keep mean/median value if two or three data values are present otherwise it should be same as old variable.
My dataset looks like this:
Patient Id date result age gender
1 1/1/2011 3.50 23 1
1 1/1/2011 3.60 23 1
1 2/1/2011 3.00 23 1
1 2/1/2011 2.82 23 1
2 1/1/2011 3.42 20 1
2 2/1/2011 3.23 20 1
2 3/1/2011 4.11 20 1
2 3/1/2011 2.82 20 1
3 1/1/2011 3.18 33 2
3 2/1/2011 4.21 33 2
3 3/1/2011 2.12 33 2
I want new dataset look like this:
Patient Id date result age gender Re_results
1 1/1/2011 3.50 23 1 3.55
1 1/1/2011 3.60 23 1 3.55
1 2/1/2011 3.00 23 1 3.00
1 3/1/2011 2.82 23 1 2.82
2 1/1/2011 3.42 20 1 3.42
2 2/1/2011 3.23 20 1 3.23
2 3/1/2011 4.11 20 1 3.47
2 3/1/2011 2.82 20 1 3.47
3 1/1/2011 3.18 33 2 3.18
3 2/1/2011 4.21 33 2 4.21
3 3/1/2011 2.12 33 2 2.12
Kindly suggest
PROC SQL makes this easy. It will remerge aggregate results back to the detail records automatically.
Most statistics on a single value will return that value as the result. min, max, mean, median.
data have;
input id date result age gender ;
informat date mmddyy. ;
format date yymmdd10. ;
cards;
1 1/1/2011 3.50 23 1
1 1/1/2011 3.60 23 1
1 2/1/2011 3.00 23 1
1 2/1/2011 2.82 23 1
2 1/1/2011 3.42 20 1
2 2/1/2011 3.23 20 1
2 3/1/2011 4.11 20 1
2 3/1/2011 2.82 20 1
3 1/1/2011 3.18 33 2
3 2/1/2011 4.21 33 2
3 3/1/2011 2.12 33 2
;
proc sql ;
create table want as
select *,mean(result) as re_results
from have
group by id,date
;
quit;
Obs id date result age gender re_results 1 1 2011-01-01 3.60 23 1 3.550 2 1 2011-01-01 3.50 23 1 3.550 3 1 2011-02-01 3.00 23 1 2.910 4 1 2011-02-01 2.82 23 1 2.910 5 2 2011-01-01 3.42 20 1 3.420 6 2 2011-02-01 3.23 20 1 3.230 7 2 2011-03-01 4.11 20 1 3.465 8 2 2011-03-01 2.82 20 1 3.465 9 3 2011-01-01 3.18 33 2 3.180 10 3 2011-02-01 4.21 33 2 4.210 11 3 2011-03-01 2.12 33 2 2.120
PROC SQL makes this easy. It will remerge aggregate results back to the detail records automatically.
Most statistics on a single value will return that value as the result. min, max, mean, median.
data have;
input id date result age gender ;
informat date mmddyy. ;
format date yymmdd10. ;
cards;
1 1/1/2011 3.50 23 1
1 1/1/2011 3.60 23 1
1 2/1/2011 3.00 23 1
1 2/1/2011 2.82 23 1
2 1/1/2011 3.42 20 1
2 2/1/2011 3.23 20 1
2 3/1/2011 4.11 20 1
2 3/1/2011 2.82 20 1
3 1/1/2011 3.18 33 2
3 2/1/2011 4.21 33 2
3 3/1/2011 2.12 33 2
;
proc sql ;
create table want as
select *,mean(result) as re_results
from have
group by id,date
;
quit;
Obs id date result age gender re_results 1 1 2011-01-01 3.60 23 1 3.550 2 1 2011-01-01 3.50 23 1 3.550 3 1 2011-02-01 3.00 23 1 2.910 4 1 2011-02-01 2.82 23 1 2.910 5 2 2011-01-01 3.42 20 1 3.420 6 2 2011-02-01 3.23 20 1 3.230 7 2 2011-03-01 4.11 20 1 3.465 8 2 2011-03-01 2.82 20 1 3.465 9 3 2011-01-01 3.18 33 2 3.180 10 3 2011-02-01 4.21 33 2 4.210 11 3 2011-03-01 2.12 33 2 2.120
Thank you so much for the detailed reply
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.