BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
LCart
Calcite | Level 5

I have a dataset with IDs and some variables. Currently the IDs are not unique and may repeat example data below:

IDDateDx
11/1/2001A
15/7/2009A
13/4/1989B
29/18/2013B
27/2/2011C
38/9/1956A
410/1/2001

D

I would like to write code to make the dataset look like below instead:

IDDateDxDate2Dx2Date3Dx3
13/4/1989B1/1/2001A5/7/2009A
27/2/2011C9/18/2013B..
38/9/1956A....
410/1/2001D...

.

I'm fairly new to SAS but what I think might do it is a code that essentially says "if MRN is a duplicate and date1=. then date1=date else if MRN is a duplicate and date1 ne "." and date2=. then date2=date etc...". If there's a way to have the dates in chronological order that would be magical.

This seems pretty messy/complicated so any guidance would really be appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

Creating so may variables from data is messy.  You can use the PROC SUMMARY trick if you have LE 100 obs per ID.

data have;
   format MRN 3. DxDt mmddyy10. Dx $2. TX $4. TxDate mmddyy10.;
  
informat MRN 3. DxDt mmddyy10. Dx $2. TX $4. TxDate mmddyy10.;
  
input MRN DxDt Dx $ TX $ TxDate;
   datalines;
   1 1/15/2001 A Rad 2/1/2001
   2 11/6/1999 A Sur 12/16/1999
   2 7/11/2005 B Che 9/25/2005
   1 4/2/2010 C Sur 6/8/2010
   1 4/15/1989 D . .
   3 9/17/2014 A Rad 9/20/2014
   4 10/11/2007 B Che 11/1/2007
   ;
   run;


proc sort data=have;
   by mrn dxdt;
   run;
%let obs=1;
proc sql noprint;
  
select max(obs) into :obs from (select count(*) as obs from have group by mrn);
   quit;
  
run;
proc summary data=have;
   by mrn;
   output out=wide(drop=_type_)
     
idgroup(out[&obs](dxdt--txdate)=);
   run;
proc print;
  
run;
Capture.PNG

View solution in original post

4 REPLIES 4
dcruik
Lapis Lazuli | Level 10

You can actually use the TRANSPOSE procedure and then merge the two data sets.  May have to change the reorder the columns to get it the way you want, but this will set it up initially for you:

ballardw
Super User

Of course the question is always HOW will you use the result? A vast majority of processes work much better with data in your original form (which shows a level of something called normalization in the database world), so don't lose it.

LCart
Calcite | Level 5

Thank you so much dcruik for helping me think of the data another way!

Ballardw - to elaborate on my problem:

I have a dateset with MRN numbers, dx codes, dates of diagnoses and dates of treatment. So patient 1 could be diagnosed with condition X on 1/15/2001, receive treatment Y on 1/25/2001. This person could later be diagnosed with condition Z on 5/16/2015, and receive treatment Y on 6/1/2015.

What I would like to do is create a dataset where each row represents a patient so that I can merge this information with other demographic information I have for the patients.

Using dcruik's  thinking I sorted the data by mrn and diagnosis date. I created a macro and transposed the data for each variable. I then merged all of the newly created datasets by MRN number. I know this is clunky but it seems to have worked? I'm still checking for accuracy and would welcome suggestions if my process raises any immediate redflags!

Below is simplified/dummy data

data have;

format MRN 3. DxDt mmddyy10. Dx $2. TX $4. TxDate mmddyy10.;

informat MRN 3. DxDt mmddyy10. Dx $2. TX $4. TxDate mmddyy10.;

input MRN DxDt Dx $ TX $ TxDate;

datalines;

1 1/15/2001 A Rad 2/1/2001

2 11/6/1999 A Sur 12/16/1999

2 7/11/2005 B Che 9/25/2005

1 4/2/2010 C Sur 6/8/2010

1 4/15/1989 D . .

3 9/17/2014 A Rad 9/20/2014

4 10/11/2007 B Che 11/1/2007

;

run;

proc sort data=have;

  by mrn dxdt;

run;

%macro transpose(x);

  proc transpose data=have out=want_&x (drop=_Name_)

  prefix=&x;

  var &x;

  by mrn;

  run;

%mend transpose;

%transpose(DxDt)

%transpose(Dx)

%transpose(TX)

%transpose(TxDate)

data want;

  merge  want_dxdt want_dx want_tx want_txdate;

  by mrn;

run;

data_null__
Jade | Level 19

Creating so may variables from data is messy.  You can use the PROC SUMMARY trick if you have LE 100 obs per ID.

data have;
   format MRN 3. DxDt mmddyy10. Dx $2. TX $4. TxDate mmddyy10.;
  
informat MRN 3. DxDt mmddyy10. Dx $2. TX $4. TxDate mmddyy10.;
  
input MRN DxDt Dx $ TX $ TxDate;
   datalines;
   1 1/15/2001 A Rad 2/1/2001
   2 11/6/1999 A Sur 12/16/1999
   2 7/11/2005 B Che 9/25/2005
   1 4/2/2010 C Sur 6/8/2010
   1 4/15/1989 D . .
   3 9/17/2014 A Rad 9/20/2014
   4 10/11/2007 B Che 11/1/2007
   ;
   run;


proc sort data=have;
   by mrn dxdt;
   run;
%let obs=1;
proc sql noprint;
  
select max(obs) into :obs from (select count(*) as obs from have group by mrn);
   quit;
  
run;
proc summary data=have;
   by mrn;
   output out=wide(drop=_type_)
     
idgroup(out[&obs](dxdt--txdate)=);
   run;
proc print;
  
run;
Capture.PNG

sas-innovate-2024.png

Today is the last day to save with the early bird rate! Register today for just $695 - $100 off the standard rate.

 

Plus, pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1736 views
  • 1 like
  • 4 in conversation