HI,
I need help to get codes for data aggregating
I have a dataset of lab results where some of the values are repeating on the same date. I would like to create a new variable where I can keep mean/median value if two or three data values are present otherwise it should be same as old variable.
My dataset looks like this:
Patient Id date result age gender
1 1/1/2011 3.50 23 1
1 1/1/2011 3.60 23 1
1 2/1/2011 3.00 23 1
1 2/1/2011 2.82 23 1
2 1/1/2011 3.42 20 1
2 2/1/2011 3.23 20 1
2 3/1/2011 4.11 20 1
2 3/1/2011 2.82 20 1
3 1/1/2011 3.18 33 2
3 2/1/2011 4.21 33 2
3 3/1/2011 2.12 33 2
I want new dataset look like this:
Patient Id date result age gender Re_results
1 1/1/2011 3.50 23 1 3.55
1 1/1/2011 3.60 23 1 3.55
1 2/1/2011 3.00 23 1 3.00
1 3/1/2011 2.82 23 1 2.82
2 1/1/2011 3.42 20 1 3.42
2 2/1/2011 3.23 20 1 3.23
2 3/1/2011 4.11 20 1 3.47
2 3/1/2011 2.82 20 1 3.47
3 1/1/2011 3.18 33 2 3.18
3 2/1/2011 4.21 33 2 4.21
3 3/1/2011 2.12 33 2 2.12
Kindly suggest
PROC SQL makes this easy. It will remerge aggregate results back to the detail records automatically.
Most statistics on a single value will return that value as the result. min, max, mean, median.
data have;
input id date result age gender ;
informat date mmddyy. ;
format date yymmdd10. ;
cards;
1 1/1/2011 3.50 23 1
1 1/1/2011 3.60 23 1
1 2/1/2011 3.00 23 1
1 2/1/2011 2.82 23 1
2 1/1/2011 3.42 20 1
2 2/1/2011 3.23 20 1
2 3/1/2011 4.11 20 1
2 3/1/2011 2.82 20 1
3 1/1/2011 3.18 33 2
3 2/1/2011 4.21 33 2
3 3/1/2011 2.12 33 2
;
proc sql ;
create table want as
select *,mean(result) as re_results
from have
group by id,date
;
quit;
Obs id date result age gender re_results 1 1 2011-01-01 3.60 23 1 3.550 2 1 2011-01-01 3.50 23 1 3.550 3 1 2011-02-01 3.00 23 1 2.910 4 1 2011-02-01 2.82 23 1 2.910 5 2 2011-01-01 3.42 20 1 3.420 6 2 2011-02-01 3.23 20 1 3.230 7 2 2011-03-01 4.11 20 1 3.465 8 2 2011-03-01 2.82 20 1 3.465 9 3 2011-01-01 3.18 33 2 3.180 10 3 2011-02-01 4.21 33 2 4.210 11 3 2011-03-01 2.12 33 2 2.120
PROC SQL makes this easy. It will remerge aggregate results back to the detail records automatically.
Most statistics on a single value will return that value as the result. min, max, mean, median.
data have;
input id date result age gender ;
informat date mmddyy. ;
format date yymmdd10. ;
cards;
1 1/1/2011 3.50 23 1
1 1/1/2011 3.60 23 1
1 2/1/2011 3.00 23 1
1 2/1/2011 2.82 23 1
2 1/1/2011 3.42 20 1
2 2/1/2011 3.23 20 1
2 3/1/2011 4.11 20 1
2 3/1/2011 2.82 20 1
3 1/1/2011 3.18 33 2
3 2/1/2011 4.21 33 2
3 3/1/2011 2.12 33 2
;
proc sql ;
create table want as
select *,mean(result) as re_results
from have
group by id,date
;
quit;
Obs id date result age gender re_results 1 1 2011-01-01 3.60 23 1 3.550 2 1 2011-01-01 3.50 23 1 3.550 3 1 2011-02-01 3.00 23 1 2.910 4 1 2011-02-01 2.82 23 1 2.910 5 2 2011-01-01 3.42 20 1 3.420 6 2 2011-02-01 3.23 20 1 3.230 7 2 2011-03-01 4.11 20 1 3.465 8 2 2011-03-01 2.82 20 1 3.465 9 3 2011-01-01 3.18 33 2 3.180 10 3 2011-02-01 4.21 33 2 4.210 11 3 2011-03-01 2.12 33 2 2.120
Thank you so much for the detailed reply
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.