Help using Base SAS procedures

Want to have only unique rows

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

Want to have only unique rows

I have a dataset with IDs and some variables. Currently the IDs are not unique and may repeat example data below:

IDDateDx
11/1/2001A
15/7/2009A
13/4/1989B
29/18/2013B
27/2/2011C
38/9/1956A
410/1/2001

D

I would like to write code to make the dataset look like below instead:

IDDateDxDate2Dx2Date3Dx3
13/4/1989B1/1/2001A5/7/2009A
27/2/2011C9/18/2013B..
38/9/1956A....
410/1/2001D...

.

I'm fairly new to SAS but what I think might do it is a code that essentially says "if MRN is a duplicate and date1=. then date1=date else if MRN is a duplicate and date1 ne "." and date2=. then date2=date etc...". If there's a way to have the dates in chronological order that would be magical.

This seems pretty messy/complicated so any guidance would really be appreciated.


Accepted Solutions
Solution
‎08-13-2015 02:17 PM
Respected Advisor
Posts: 3,799

Re: Want to have only unique rows

Creating so may variables from data is messy.  You can use the PROC SUMMARY trick if you have LE 100 obs per ID.

data have;
   format MRN 3. DxDt mmddyy10. Dx $2. TX $4. TxDate mmddyy10.;
  
informat MRN 3. DxDt mmddyy10. Dx $2. TX $4. TxDate mmddyy10.;
  
input MRN DxDt Dx $ TX $ TxDate;
   datalines;
   1 1/15/2001 A Rad 2/1/2001
   2 11/6/1999 A Sur 12/16/1999
   2 7/11/2005 B Che 9/25/2005
   1 4/2/2010 C Sur 6/8/2010
   1 4/15/1989 D . .
   3 9/17/2014 A Rad 9/20/2014
   4 10/11/2007 B Che 11/1/2007
   ;
   run;


proc sort data=have;
   by mrn dxdt;
   run;
%let obs=1;
proc sql noprint;
  
select max(obs) into Smiley Surprisedbs from (select count(*) as obs from have group by mrn);
   quit;
  
run;
proc summary data=have;
   by mrn;
   output out=wide(drop=_type_)
     
idgroup(out[&obs](dxdt--txdate)=);
   run;
proc print;
  
run;
Capture.PNG

View solution in original post


All Replies
Frequent Contributor
Posts: 130

Re: Want to have only unique rows

You can actually use the TRANSPOSE procedure and then merge the two data sets.  May have to change the reorder the columns to get it the way you want, but this will set it up initially for you:

Super User
Posts: 11,343

Re: Want to have only unique rows

Of course the question is always HOW will you use the result? A vast majority of processes work much better with data in your original form (which shows a level of something called normalization in the database world), so don't lose it.

New Contributor
Posts: 2

Re: Want to have only unique rows

Thank you so much dcruik for helping me think of the data another way!

Ballardw - to elaborate on my problem:

I have a dateset with MRN numbers, dx codes, dates of diagnoses and dates of treatment. So patient 1 could be diagnosed with condition X on 1/15/2001, receive treatment Y on 1/25/2001. This person could later be diagnosed with condition Z on 5/16/2015, and receive treatment Y on 6/1/2015.

What I would like to do is create a dataset where each row represents a patient so that I can merge this information with other demographic information I have for the patients.

Using dcruik's  thinking I sorted the data by mrn and diagnosis date. I created a macro and transposed the data for each variable. I then merged all of the newly created datasets by MRN number. I know this is clunky but it seems to have worked? I'm still checking for accuracy and would welcome suggestions if my process raises any immediate redflags!

Below is simplified/dummy data

data have;

format MRN 3. DxDt mmddyy10. Dx $2. TX $4. TxDate mmddyy10.;

informat MRN 3. DxDt mmddyy10. Dx $2. TX $4. TxDate mmddyy10.;

input MRN DxDt Dx $ TX $ TxDate;

datalines;

1 1/15/2001 A Rad 2/1/2001

2 11/6/1999 A Sur 12/16/1999

2 7/11/2005 B Che 9/25/2005

1 4/2/2010 C Sur 6/8/2010

1 4/15/1989 D . .

3 9/17/2014 A Rad 9/20/2014

4 10/11/2007 B Che 11/1/2007

;

run;

proc sort data=have;

  by mrn dxdt;

run;

%macro transpose(x);

  proc transpose data=have out=want_&x (drop=_Name_)

  prefix=&x;

  var &x;

  by mrn;

  run;

%mend transpose;

%transpose(DxDt)

%transpose(Dx)

%transpose(TX)

%transpose(TxDate)

data want;

  merge  want_dxdt want_dx want_tx want_txdate;

  by mrn;

run;

Solution
‎08-13-2015 02:17 PM
Respected Advisor
Posts: 3,799

Re: Want to have only unique rows

Creating so may variables from data is messy.  You can use the PROC SUMMARY trick if you have LE 100 obs per ID.

data have;
   format MRN 3. DxDt mmddyy10. Dx $2. TX $4. TxDate mmddyy10.;
  
informat MRN 3. DxDt mmddyy10. Dx $2. TX $4. TxDate mmddyy10.;
  
input MRN DxDt Dx $ TX $ TxDate;
   datalines;
   1 1/15/2001 A Rad 2/1/2001
   2 11/6/1999 A Sur 12/16/1999
   2 7/11/2005 B Che 9/25/2005
   1 4/2/2010 C Sur 6/8/2010
   1 4/15/1989 D . .
   3 9/17/2014 A Rad 9/20/2014
   4 10/11/2007 B Che 11/1/2007
   ;
   run;


proc sort data=have;
   by mrn dxdt;
   run;
%let obs=1;
proc sql noprint;
  
select max(obs) into Smiley Surprisedbs from (select count(*) as obs from have group by mrn);
   quit;
  
run;
proc summary data=have;
   by mrn;
   output out=wide(drop=_type_)
     
idgroup(out[&obs](dxdt--txdate)=);
   run;
proc print;
  
run;
Capture.PNG
🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 563 views
  • 1 like
  • 4 in conversation