Hi Everyone,
Thanks in advance for the help! I have a dataset with 827,304 observations and over 170 yes(1)/no(0) variables and I am trying to get one line per baby, per visit type, for each of the 170 different diagnoses.
This is a sample of what I have
Baby_ID | Visit_ID | Type of visit | Dx_date | DX_Jaundice | DX_Zoster | DX_Sepsis | DX_Hepatitis | DX_Well_baby |
123 | 1 | Delivery | 4/20/2019 | 1 | 0 | 0 | 0 | 0 |
123 | 1 | Delivery | 4/20/2019 | 0 | 0 | 1 | 0 | 0 |
123 | 3 | Ever | 5/8/2019 | 0 | 0 | 0 | 1 | 0 |
123 | 2 | Postnatal | 4/25/2019 | 0 | 1 | 0 | 0 | 0 |
And this is what I want the data to look like
Baby_ID | Visit_ID | Type of visit | Dx_date | DX_Jaundice | DX_Zoster | DX_Sepsis | DX_Hepatitis | DX_Well_baby |
123 | 1 | Delivery | 4/20/2019 | 1 | 0 | 1 | 0 | 0 |
123 | 3 | Ever | 5/8/2019 | 0 | 0 | 0 | 1 | 0 |
123 | 2 | Postnatal | 4/25/2019 | 0 | 1 | 0 | 0 | 0 |
I would usually transpose the variables of interest to get one line per ID, but with over 170 variables, that doesn't seem feasible. And
I think I am going down the wrong track here, but I have tried to use RETAIN
data Test_retain;
set test;
by baby_id;
retain highest;
if first.baby_id then highest=.;
highest=max(highest,DX_Jaundice);
if last.baby_ID then output;
run;
and some SQL code
Proc sql;
Create table Max as
Select baby_ID, Type_of_visit, DX_date, visit_ID, max(DX_Baby_Hear_Screen_Fail) as DX_Baby_Hear_Screen_Fail,
max(DX_Baby_Single_Live) as DX_Baby_Single_Live,
max(DX_baby_Hep_Vaccination) as DX_baby_Hep_Vaccination
From test2
Group by visit_ID ;
Quit;
If someone could point me in the right direction it would be greatly appreciated!