Solved: Want to have only unique rows

LCart · Posted 08-13-2015 10:32 AM

I have a dataset with IDs and some variables. Currently the IDs are not unique and may repeat example data below:

ID	Date	Dx
1	1/1/2001	A
1	5/7/2009	A
1	3/4/1989	B
2	9/18/2013	B
2	7/2/2011	C
3	8/9/1956	A
4	10/1/2001	D

I would like to write code to make the dataset look like below instead:

ID	Date	Dx	Date2	Dx2	Date3	Dx3
1	3/4/1989	B	1/1/2001	A	5/7/2009	A
2	7/2/2011	C	9/18/2013	B	.	.
3	8/9/1956	A	.	.	.	.
4	10/1/2001	D	.	.	.	.

I'm fairly new to SAS but what I think might do it is a code that essentially says "if MRN is a duplicate and date1=. then date1=date else if MRN is a duplicate and date1 ne "." and date2=. then date2=date etc...". If there's a way to have the dates in chronological order that would be magical.

This seems pretty messy/complicated so any guidance would really be appreciated.

data_null__ · Posted 08-13-2015 02:17 PM

Creating so may variables from data is messy. You can use the PROC SUMMARY trick if you have LE 100 obs per ID.

data have;
   format MRN 3. DxDt mmddyy10. Dx $2. TX $4. TxDate mmddyy10.; 
   informat MRN 3. DxDt mmddyy10. Dx $2. TX $4. TxDate mmddyy10.; 
   input MRN DxDt Dx $ TX $ TxDate;
   datalines; 
   1 1/15/2001 A Rad 2/1/2001
   2 11/6/1999 A Sur 12/16/1999
   2 7/11/2005 B Che 9/25/2005
   1 4/2/2010 C Sur 6/8/2010
   1 4/15/1989 D . .
   3 9/17/2014 A Rad 9/20/2014
   4 10/11/2007 B Che 11/1/2007
   ;
   run; 

 
proc sort data=have;
   by mrn dxdt;
   run; 
%let obs=1;
proc sql noprint; 
   select max(obs) into :obs from (select count(*) as obs from have group by mrn);
   quit; 
   run; 
proc summary data=have;
   by mrn;
   output out=wide(drop=_type_)
      idgroup(out[&obs](dxdt--txdate)=);
   run; 
proc print; 
   run;

View solution in original post

dcruik · Posted 08-13-2015 10:58 AM

You can actually use the TRANSPOSE procedure and then merge the two data sets. May have to change the reorder the columns to get it the way you want, but this will set it up initially for you:

ballardw · Posted 08-13-2015 11:27 AM

Of course the question is always HOW will you use the result? A vast majority of processes work much better with data in your original form (which shows a level of something called normalization in the database world), so don't lose it.

LCart · Posted 08-13-2015 01:18 PM

Thank you so much dcruik for helping me think of the data another way!

Ballardw - to elaborate on my problem:

I have a dateset with MRN numbers, dx codes, dates of diagnoses and dates of treatment. So patient 1 could be diagnosed with condition X on 1/15/2001, receive treatment Y on 1/25/2001. This person could later be diagnosed with condition Z on 5/16/2015, and receive treatment Y on 6/1/2015.

What I would like to do is create a dataset where each row represents a patient so that I can merge this information with other demographic information I have for the patients.

Using dcruik's thinking I sorted the data by mrn and diagnosis date. I created a macro and transposed the data for each variable. I then merged all of the newly created datasets by MRN number. I know this is clunky but it seems to have worked? I'm still checking for accuracy and would welcome suggestions if my process raises any immediate redflags!

Below is simplified/dummy data

data have;

format MRN 3. DxDt mmddyy10. Dx $2. TX $4. TxDate mmddyy10.;

informat MRN 3. DxDt mmddyy10. Dx $2. TX $4. TxDate mmddyy10.;

input MRN DxDt Dx $ TX $ TxDate;

datalines;

1 1/15/2001 A Rad 2/1/2001

2 11/6/1999 A Sur 12/16/1999

2 7/11/2005 B Che 9/25/2005

1 4/2/2010 C Sur 6/8/2010

1 4/15/1989 D . .

3 9/17/2014 A Rad 9/20/2014

4 10/11/2007 B Che 11/1/2007

;

run;

proc sort data=have;

by mrn dxdt;

run;

%macro transpose(x);

proc transpose data=have out=want_&x (drop=_Name_)

prefix=&x;

var &x;

by mrn;

run;

%mend transpose;

%transpose(DxDt)

%transpose(Dx)

%transpose(TX)

%transpose(TxDate)

data want;

merge want_dxdt want_dx want_tx want_txdate;

by mrn;

run;

data_null__ · Posted 08-13-2015 02:17 PM

Creating so may variables from data is messy. You can use the PROC SUMMARY trick if you have LE 100 obs per ID.

data have;
   format MRN 3. DxDt mmddyy10. Dx $2. TX $4. TxDate mmddyy10.; 
   informat MRN 3. DxDt mmddyy10. Dx $2. TX $4. TxDate mmddyy10.; 
   input MRN DxDt Dx $ TX $ TxDate;
   datalines; 
   1 1/15/2001 A Rad 2/1/2001
   2 11/6/1999 A Sur 12/16/1999
   2 7/11/2005 B Che 9/25/2005
   1 4/2/2010 C Sur 6/8/2010
   1 4/15/1989 D . .
   3 9/17/2014 A Rad 9/20/2014
   4 10/11/2007 B Che 11/1/2007
   ;
   run; 

 
proc sort data=have;
   by mrn dxdt;
   run; 
%let obs=1;
proc sql noprint; 
   select max(obs) into :obs from (select count(*) as obs from have group by mrn);
   quit; 
   run; 
proc summary data=have;
   by mrn;
   output out=wide(drop=_type_)
      idgroup(out[&obs](dxdt--txdate)=);
   run; 
proc print; 
   run;

Want to have only unique rows

Re: Want to have only unique rows

Re: Want to have only unique rows

Re: Want to have only unique rows

Re: Want to have only unique rows

Re: Want to have only unique rows

Want to have only unique rows

Re: Want to have only unique rows

Re: Want to have only unique rows

Re: Want to have only unique rows

Re: Want to have only unique rows

Re: Want to have only unique rows

Ready to join fellow brilliant minds for the SAS Hackathon?

Click image to register for webinar

Classroom Training Available!