So I have a dataset with many variables and multiple observations for one ID. I want to keep the unique observations but at the same time populate the non-missing value from that observation. Below is the sample of the dataset:
ID CLINIC ADM_DT DIS_DT DIAG_CD
1 ABC 03SEP2005 06SEP2005
1 ABC 03SEP2005 T10.009
1 ABC 19DEC2006 21DEC2006
1 ABC 19DEC2006 21DEC2006 F23.091
2 XYZ 06OCT2006 10OCT2006 J09.01
2 XYZ 06OCT2006 10OCT2006
3 LMN 21JAN2005 25JAN2005
3 LMN 21JAN2005 25JAN2005
3 LMN 04OCT2006 J09.2
3 LMN 04OCT2006 05OCT2006
ADM_DT is always populated in the dataset the issues are DIS_DT and DIAG_CD which are missing at some places. My aim is to get the DIS_DT and DIAG_DT for each patient for each ADM_DT.
Output expected:
ID CLINIC ADM_DT DIS_DT DIAG_CD
1 ABC 03SEP2005 06SEP2005 T10.009
1 ABC 19DEC2006 21DEC2006 F23.091
2 XYZ 06OCT2006 10OCT2006 J09.01
3 LMN 21JAN2005 25JAN2005
3 LMN 04OCT2006 05OCT2006 J09.2
proc sql;
create table want as
select id, clinic, adm_dt, max(dis_dt) as dis_dt, max(diag_cd) as diag_cd
from have
group by id, clinic, adm_dt;
quit;
Use SQL aggregation functions to take the maximum which will keep the values.
@NickS2 wrote:
So I have a dataset with many variables and multiple observations for one ID. I want to keep the unique observations but at the same time populate the non-missing value from that observation. Below is the sample of the dataset:
ID CLINIC ADM_DT DIS_DT DIAG_CD
1 ABC 03SEP2005 06SEP2005
1 ABC 03SEP2005 T10.009
1 ABC 19DEC2006 21DEC2006
1 ABC 19DEC2006 21DEC2006 F23.091
2 XYZ 06OCT2006 10OCT2006 J09.01
2 XYZ 06OCT2006 10OCT2006
3 LMN 21JAN2005 25JAN2005
3 LMN 21JAN2005 25JAN2005
3 LMN 04OCT2006 J09.2
3 LMN 04OCT2006 05OCT2006
ADM_DT is always populated in the dataset the issues are DIS_DT and DIAG_CD which are missing at some places. My aim is to get the DIS_DT and DIAG_DT for each patient for each ADM_DT.
Output expected:
ID CLINIC ADM_DT DIS_DT DIAG_CD
1 ABC 03SEP2005 06SEP2005 T10.009
1 ABC 19DEC2006 21DEC2006 F23.091
2 XYZ 06OCT2006 10OCT2006 J09.01
3 LMN 21JAN2005 25JAN2005
3 LMN 04OCT2006 05OCT2006 J09.2
can you post this as datalines?
proc sql;
create table want as
select id, clinic, adm_dt, max(dis_dt) as dis_dt, max(diag_cd) as diag_cd
from have
group by id, clinic, adm_dt;
quit;
Use SQL aggregation functions to take the maximum which will keep the values.
@NickS2 wrote:
So I have a dataset with many variables and multiple observations for one ID. I want to keep the unique observations but at the same time populate the non-missing value from that observation. Below is the sample of the dataset:
ID CLINIC ADM_DT DIS_DT DIAG_CD
1 ABC 03SEP2005 06SEP2005
1 ABC 03SEP2005 T10.009
1 ABC 19DEC2006 21DEC2006
1 ABC 19DEC2006 21DEC2006 F23.091
2 XYZ 06OCT2006 10OCT2006 J09.01
2 XYZ 06OCT2006 10OCT2006
3 LMN 21JAN2005 25JAN2005
3 LMN 21JAN2005 25JAN2005
3 LMN 04OCT2006 J09.2
3 LMN 04OCT2006 05OCT2006
ADM_DT is always populated in the dataset the issues are DIS_DT and DIAG_CD which are missing at some places. My aim is to get the DIS_DT and DIAG_DT for each patient for each ADM_DT.
Output expected:
ID CLINIC ADM_DT DIS_DT DIAG_CD
1 ABC 03SEP2005 06SEP2005 T10.009
1 ABC 19DEC2006 21DEC2006 F23.091
2 XYZ 06OCT2006 10OCT2006 J09.01
3 LMN 21JAN2005 25JAN2005
3 LMN 04OCT2006 05OCT2006 J09.2
you should be able to accomplish this with proc sql.
proc sql number;
select id, clinic, adm_dt, max(dis_dt) as dis_dt format=date9., max(diag_cd) as diag_cd
from have
group by id, clinic, adm_dt;
quit;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.