BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
NickS2
Obsidian | Level 7

So I have a dataset with many variables and multiple observations for one ID. I want to keep the unique observations but at the same time populate the non-missing value from that observation. Below is the sample of the dataset:

 

ID   CLINIC                  ADM_DT                DIS_DT               DIAG_CD   

1     ABC                      03SEP2005           06SEP2005         

1     ABC                      03SEP2005                                       T10.009

1     ABC                      19DEC2006          21DEC2006         

1     ABC                      19DEC2006          21DEC2006         F23.091

2     XYZ                      06OCT2006          10OCT2006         J09.01

2     XYZ                      06OCT2006          10OCT2006

3     LMN                      21JAN2005           25JAN2005         

3     LMN                      21JAN2005           25JAN2005 

3     LMN                      04OCT2006                                       J09.2

3     LMN                      04OCT2006          05OCT2006

 

ADM_DT is always populated in the dataset the issues are DIS_DT and DIAG_CD which are missing at some places. My aim is to get the DIS_DT and DIAG_DT for each patient for each ADM_DT.

 

Output expected:

 

ID   CLINIC                  ADM_DT                DIS_DT               DIAG_CD   

1     ABC                      03SEP2005           06SEP2005         T10.009

1     ABC                      19DEC2006          21DEC2006         F23.091     

2     XYZ                      06OCT2006          10OCT2006         J09.01

3     LMN                      21JAN2005           25JAN2005          

3     LMN                      04OCT2006          05OCT2006         J09.2

        

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
proc sql;
create table want as
select id, clinic, adm_dt, max(dis_dt) as dis_dt, max(diag_cd) as diag_cd
from have
group by id, clinic, adm_dt;
quit;

Use SQL aggregation functions to take the maximum which will keep the values. 

 


@NickS2 wrote:

So I have a dataset with many variables and multiple observations for one ID. I want to keep the unique observations but at the same time populate the non-missing value from that observation. Below is the sample of the dataset:

 

ID   CLINIC                  ADM_DT                DIS_DT               DIAG_CD   

1     ABC                      03SEP2005           06SEP2005         

1     ABC                      03SEP2005                                       T10.009

1     ABC                      19DEC2006          21DEC2006         

1     ABC                      19DEC2006          21DEC2006         F23.091

2     XYZ                      06OCT2006          10OCT2006         J09.01

2     XYZ                      06OCT2006          10OCT2006

3     LMN                      21JAN2005           25JAN2005         

3     LMN                      21JAN2005           25JAN2005 

3     LMN                      04OCT2006                                       J09.2

3     LMN                      04OCT2006          05OCT2006

 

ADM_DT is always populated in the dataset the issues are DIS_DT and DIAG_CD which are missing at some places. My aim is to get the DIS_DT and DIAG_DT for each patient for each ADM_DT.

 

Output expected:

 

ID   CLINIC                  ADM_DT                DIS_DT               DIAG_CD   

1     ABC                      03SEP2005           06SEP2005         T10.009

1     ABC                      19DEC2006          21DEC2006         F23.091     

2     XYZ                      06OCT2006          10OCT2006         J09.01

3     LMN                      21JAN2005           25JAN2005          

3     LMN                      04OCT2006          05OCT2006         J09.2

        

 

 


 

View solution in original post

3 REPLIES 3
tarheel13
Rhodochrosite | Level 12

can you post this as datalines? 

Reeza
Super User
proc sql;
create table want as
select id, clinic, adm_dt, max(dis_dt) as dis_dt, max(diag_cd) as diag_cd
from have
group by id, clinic, adm_dt;
quit;

Use SQL aggregation functions to take the maximum which will keep the values. 

 


@NickS2 wrote:

So I have a dataset with many variables and multiple observations for one ID. I want to keep the unique observations but at the same time populate the non-missing value from that observation. Below is the sample of the dataset:

 

ID   CLINIC                  ADM_DT                DIS_DT               DIAG_CD   

1     ABC                      03SEP2005           06SEP2005         

1     ABC                      03SEP2005                                       T10.009

1     ABC                      19DEC2006          21DEC2006         

1     ABC                      19DEC2006          21DEC2006         F23.091

2     XYZ                      06OCT2006          10OCT2006         J09.01

2     XYZ                      06OCT2006          10OCT2006

3     LMN                      21JAN2005           25JAN2005         

3     LMN                      21JAN2005           25JAN2005 

3     LMN                      04OCT2006                                       J09.2

3     LMN                      04OCT2006          05OCT2006

 

ADM_DT is always populated in the dataset the issues are DIS_DT and DIAG_CD which are missing at some places. My aim is to get the DIS_DT and DIAG_DT for each patient for each ADM_DT.

 

Output expected:

 

ID   CLINIC                  ADM_DT                DIS_DT               DIAG_CD   

1     ABC                      03SEP2005           06SEP2005         T10.009

1     ABC                      19DEC2006          21DEC2006         F23.091     

2     XYZ                      06OCT2006          10OCT2006         J09.01

3     LMN                      21JAN2005           25JAN2005          

3     LMN                      04OCT2006          05OCT2006         J09.2

        

 

 


 

tarheel13
Rhodochrosite | Level 12

you should be able to accomplish this with proc sql.

proc sql number;
	select id, clinic, adm_dt, max(dis_dt) as dis_dt format=date9., max(diag_cd) as diag_cd
		from have
		group by id, clinic, adm_dt;
quit;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 734 views
  • 1 like
  • 3 in conversation