So I have a dataset with many variables and multiple observations for one ID. I want to keep the unique observations but at the same time populate the non-missing value from that observation. Below is the sample of the dataset:
ID CLINIC ADM_DT DIS_DT DIAG_CD
1 ABC 03SEP2005 06SEP2005
1 ABC 03SEP2005 T10.009
1 ABC 19DEC2006 21DEC2006
1 ABC 19DEC2006 21DEC2006 F23.091
2 XYZ 06OCT2006 10OCT2006 J09.01
2 XYZ 06OCT2006 10OCT2006
3 LMN 21JAN2005 25JAN2005
3 LMN 21JAN2005 25JAN2005
3 LMN 04OCT2006 J09.2
3 LMN 04OCT2006 05OCT2006
ADM_DT is always populated in the dataset the issues are DIS_DT and DIAG_CD which are missing at some places. My aim is to get the DIS_DT and DIAG_DT for each patient for each ADM_DT.
Output expected:
ID CLINIC ADM_DT DIS_DT DIAG_CD
1 ABC 03SEP2005 06SEP2005 T10.009
1 ABC 19DEC2006 21DEC2006 F23.091
2 XYZ 06OCT2006 10OCT2006 J09.01
3 LMN 21JAN2005 25JAN2005
3 LMN 04OCT2006 05OCT2006 J09.2
proc sql;
create table want as
select id, clinic, adm_dt, max(dis_dt) as dis_dt, max(diag_cd) as diag_cd
from have
group by id, clinic, adm_dt;
quit;
Use SQL aggregation functions to take the maximum which will keep the values.
@NickS2 wrote:
So I have a dataset with many variables and multiple observations for one ID. I want to keep the unique observations but at the same time populate the non-missing value from that observation. Below is the sample of the dataset:
ID CLINIC ADM_DT DIS_DT DIAG_CD
1 ABC 03SEP2005 06SEP2005
1 ABC 03SEP2005 T10.009
1 ABC 19DEC2006 21DEC2006
1 ABC 19DEC2006 21DEC2006 F23.091
2 XYZ 06OCT2006 10OCT2006 J09.01
2 XYZ 06OCT2006 10OCT2006
3 LMN 21JAN2005 25JAN2005
3 LMN 21JAN2005 25JAN2005
3 LMN 04OCT2006 J09.2
3 LMN 04OCT2006 05OCT2006
ADM_DT is always populated in the dataset the issues are DIS_DT and DIAG_CD which are missing at some places. My aim is to get the DIS_DT and DIAG_DT for each patient for each ADM_DT.
Output expected:
ID CLINIC ADM_DT DIS_DT DIAG_CD
1 ABC 03SEP2005 06SEP2005 T10.009
1 ABC 19DEC2006 21DEC2006 F23.091
2 XYZ 06OCT2006 10OCT2006 J09.01
3 LMN 21JAN2005 25JAN2005
3 LMN 04OCT2006 05OCT2006 J09.2
can you post this as datalines?
proc sql;
create table want as
select id, clinic, adm_dt, max(dis_dt) as dis_dt, max(diag_cd) as diag_cd
from have
group by id, clinic, adm_dt;
quit;
Use SQL aggregation functions to take the maximum which will keep the values.
@NickS2 wrote:
So I have a dataset with many variables and multiple observations for one ID. I want to keep the unique observations but at the same time populate the non-missing value from that observation. Below is the sample of the dataset:
ID CLINIC ADM_DT DIS_DT DIAG_CD
1 ABC 03SEP2005 06SEP2005
1 ABC 03SEP2005 T10.009
1 ABC 19DEC2006 21DEC2006
1 ABC 19DEC2006 21DEC2006 F23.091
2 XYZ 06OCT2006 10OCT2006 J09.01
2 XYZ 06OCT2006 10OCT2006
3 LMN 21JAN2005 25JAN2005
3 LMN 21JAN2005 25JAN2005
3 LMN 04OCT2006 J09.2
3 LMN 04OCT2006 05OCT2006
ADM_DT is always populated in the dataset the issues are DIS_DT and DIAG_CD which are missing at some places. My aim is to get the DIS_DT and DIAG_DT for each patient for each ADM_DT.
Output expected:
ID CLINIC ADM_DT DIS_DT DIAG_CD
1 ABC 03SEP2005 06SEP2005 T10.009
1 ABC 19DEC2006 21DEC2006 F23.091
2 XYZ 06OCT2006 10OCT2006 J09.01
3 LMN 21JAN2005 25JAN2005
3 LMN 04OCT2006 05OCT2006 J09.2
you should be able to accomplish this with proc sql.
proc sql number;
select id, clinic, adm_dt, max(dis_dt) as dis_dt format=date9., max(diag_cd) as diag_cd
from have
group by id, clinic, adm_dt;
quit;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.