So I have a dataset with many variables and multiple observations for one ID. I want to keep the unique observations but at the same time populate the non-missing value from that observation. Below is the sample of the dataset:
ID CLINIC ADM_DT DIS_DT DIAG_CD
1 ABC 03SEP2005 06SEP2005
1 ABC 03SEP2005 T10.009
1 ABC 19DEC2006 21DEC2006
1 ABC 19DEC2006 21DEC2006 F23.091
2 XYZ 06OCT2006 10OCT2006 J09.01
2 XYZ 06OCT2006 10OCT2006
3 LMN 21JAN2005 25JAN2005
3 LMN 21JAN2005 25JAN2005
3 LMN 04OCT2006 J09.2
3 LMN 04OCT2006 05OCT2006
ADM_DT is always populated in the dataset the issues are DIS_DT and DIAG_CD which are missing at some places. My aim is to get the DIS_DT and DIAG_DT for each patient for each ADM_DT.
Output expected:
ID CLINIC ADM_DT DIS_DT DIAG_CD
1 ABC 03SEP2005 06SEP2005 T10.009
1 ABC 19DEC2006 21DEC2006 F23.091
2 XYZ 06OCT2006 10OCT2006 J09.01
3 LMN 21JAN2005 25JAN2005
3 LMN 04OCT2006 05OCT2006 J09.2
proc sql;
create table want as
select id, clinic, adm_dt, max(dis_dt) as dis_dt, max(diag_cd) as diag_cd
from have
group by id, clinic, adm_dt;
quit;
Use SQL aggregation functions to take the maximum which will keep the values.
@NickS2 wrote:
So I have a dataset with many variables and multiple observations for one ID. I want to keep the unique observations but at the same time populate the non-missing value from that observation. Below is the sample of the dataset:
ID CLINIC ADM_DT DIS_DT DIAG_CD
1 ABC 03SEP2005 06SEP2005
1 ABC 03SEP2005 T10.009
1 ABC 19DEC2006 21DEC2006
1 ABC 19DEC2006 21DEC2006 F23.091
2 XYZ 06OCT2006 10OCT2006 J09.01
2 XYZ 06OCT2006 10OCT2006
3 LMN 21JAN2005 25JAN2005
3 LMN 21JAN2005 25JAN2005
3 LMN 04OCT2006 J09.2
3 LMN 04OCT2006 05OCT2006
ADM_DT is always populated in the dataset the issues are DIS_DT and DIAG_CD which are missing at some places. My aim is to get the DIS_DT and DIAG_DT for each patient for each ADM_DT.
Output expected:
ID CLINIC ADM_DT DIS_DT DIAG_CD
1 ABC 03SEP2005 06SEP2005 T10.009
1 ABC 19DEC2006 21DEC2006 F23.091
2 XYZ 06OCT2006 10OCT2006 J09.01
3 LMN 21JAN2005 25JAN2005
3 LMN 04OCT2006 05OCT2006 J09.2
can you post this as datalines?
proc sql;
create table want as
select id, clinic, adm_dt, max(dis_dt) as dis_dt, max(diag_cd) as diag_cd
from have
group by id, clinic, adm_dt;
quit;
Use SQL aggregation functions to take the maximum which will keep the values.
@NickS2 wrote:
So I have a dataset with many variables and multiple observations for one ID. I want to keep the unique observations but at the same time populate the non-missing value from that observation. Below is the sample of the dataset:
ID CLINIC ADM_DT DIS_DT DIAG_CD
1 ABC 03SEP2005 06SEP2005
1 ABC 03SEP2005 T10.009
1 ABC 19DEC2006 21DEC2006
1 ABC 19DEC2006 21DEC2006 F23.091
2 XYZ 06OCT2006 10OCT2006 J09.01
2 XYZ 06OCT2006 10OCT2006
3 LMN 21JAN2005 25JAN2005
3 LMN 21JAN2005 25JAN2005
3 LMN 04OCT2006 J09.2
3 LMN 04OCT2006 05OCT2006
ADM_DT is always populated in the dataset the issues are DIS_DT and DIAG_CD which are missing at some places. My aim is to get the DIS_DT and DIAG_DT for each patient for each ADM_DT.
Output expected:
ID CLINIC ADM_DT DIS_DT DIAG_CD
1 ABC 03SEP2005 06SEP2005 T10.009
1 ABC 19DEC2006 21DEC2006 F23.091
2 XYZ 06OCT2006 10OCT2006 J09.01
3 LMN 21JAN2005 25JAN2005
3 LMN 04OCT2006 05OCT2006 J09.2
you should be able to accomplish this with proc sql.
proc sql number;
select id, clinic, adm_dt, max(dis_dt) as dis_dt format=date9., max(diag_cd) as diag_cd
from have
group by id, clinic, adm_dt;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.