HI,
I need help to get codes for data aggregating
I have a dataset of lab results where some of the values are repeating on the same date. I would like to create a new variable where I can keep mean/median value if two or three data values are present otherwise it should be same as old variable.
My dataset looks like this:
Patient Id date result age gender
1 1/1/2011 3.50 23 1
1 1/1/2011 3.60 23 1
1 2/1/2011 3.00 23 1
1 2/1/2011 2.82 23 1
2 1/1/2011 3.42 20 1
2 2/1/2011 3.23 20 1
2 3/1/2011 4.11 20 1
2 3/1/2011 2.82 20 1
3 1/1/2011 3.18 33 2
3 2/1/2011 4.21 33 2
3 3/1/2011 2.12 33 2
I want new dataset look like this:
Patient Id date result age gender Re_results
1 1/1/2011 3.50 23 1 3.55
1 1/1/2011 3.60 23 1 3.55
1 2/1/2011 3.00 23 1 3.00
1 3/1/2011 2.82 23 1 2.82
2 1/1/2011 3.42 20 1 3.42
2 2/1/2011 3.23 20 1 3.23
2 3/1/2011 4.11 20 1 3.47
2 3/1/2011 2.82 20 1 3.47
3 1/1/2011 3.18 33 2 3.18
3 2/1/2011 4.21 33 2 4.21
3 3/1/2011 2.12 33 2 2.12
Kindly suggest
PROC SQL makes this easy. It will remerge aggregate results back to the detail records automatically.
Most statistics on a single value will return that value as the result. min, max, mean, median.
data have;
input id date result age gender ;
informat date mmddyy. ;
format date yymmdd10. ;
cards;
1 1/1/2011 3.50 23 1
1 1/1/2011 3.60 23 1
1 2/1/2011 3.00 23 1
1 2/1/2011 2.82 23 1
2 1/1/2011 3.42 20 1
2 2/1/2011 3.23 20 1
2 3/1/2011 4.11 20 1
2 3/1/2011 2.82 20 1
3 1/1/2011 3.18 33 2
3 2/1/2011 4.21 33 2
3 3/1/2011 2.12 33 2
;
proc sql ;
create table want as
select *,mean(result) as re_results
from have
group by id,date
;
quit;
Obs id date result age gender re_results 1 1 2011-01-01 3.60 23 1 3.550 2 1 2011-01-01 3.50 23 1 3.550 3 1 2011-02-01 3.00 23 1 2.910 4 1 2011-02-01 2.82 23 1 2.910 5 2 2011-01-01 3.42 20 1 3.420 6 2 2011-02-01 3.23 20 1 3.230 7 2 2011-03-01 4.11 20 1 3.465 8 2 2011-03-01 2.82 20 1 3.465 9 3 2011-01-01 3.18 33 2 3.180 10 3 2011-02-01 4.21 33 2 4.210 11 3 2011-03-01 2.12 33 2 2.120
PROC SQL makes this easy. It will remerge aggregate results back to the detail records automatically.
Most statistics on a single value will return that value as the result. min, max, mean, median.
data have;
input id date result age gender ;
informat date mmddyy. ;
format date yymmdd10. ;
cards;
1 1/1/2011 3.50 23 1
1 1/1/2011 3.60 23 1
1 2/1/2011 3.00 23 1
1 2/1/2011 2.82 23 1
2 1/1/2011 3.42 20 1
2 2/1/2011 3.23 20 1
2 3/1/2011 4.11 20 1
2 3/1/2011 2.82 20 1
3 1/1/2011 3.18 33 2
3 2/1/2011 4.21 33 2
3 3/1/2011 2.12 33 2
;
proc sql ;
create table want as
select *,mean(result) as re_results
from have
group by id,date
;
quit;
Obs id date result age gender re_results 1 1 2011-01-01 3.60 23 1 3.550 2 1 2011-01-01 3.50 23 1 3.550 3 1 2011-02-01 3.00 23 1 2.910 4 1 2011-02-01 2.82 23 1 2.910 5 2 2011-01-01 3.42 20 1 3.420 6 2 2011-02-01 3.23 20 1 3.230 7 2 2011-03-01 4.11 20 1 3.465 8 2 2011-03-01 2.82 20 1 3.465 9 3 2011-01-01 3.18 33 2 3.180 10 3 2011-02-01 4.21 33 2 4.210 11 3 2011-03-01 2.12 33 2 2.120
Thank you so much for the detailed reply
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.