Solved: Re: Best way to transpose data with duplicate unique identifiers

cc15 · Posted 07-19-2024 04:22 PM

I would like to transpose the following dataset so that I can calculate the time difference between the patients first and second hospitalization. How would I do that with duplicate identifiers?

This is how my dataset is presently formatted (and some patients may have more than two hospitalizations)

Patient Name	Patient ID	Admission date	Test date
Pete Smith	123	1/2/2020	12/25/2019
Pete Smith	123	3/5/2020	12/25/2019
Sarah Jones	456	2/2/2020	1/25/2020
Sarah Jones	456	4/5/2020	1/25/2020
Mark Adams	789	4/7/2020	3/25/2020
Mark Adams	789	6/1/2020	3/25/2020

This is how I would like it to look

Patient Name	Patient ID	Test date	Admission date	Admission date
Pete Smith	123	12/25/2019	1/2/2020	3/5/2020
Sarah Jones	456	1/25/2020	2/2/2020	4/5/2020
Mark Adams	789	3/25/2020	4/7/2020	6/1/2020

Is transposing the best way to do this? Or would I have to merge?

Tom · Posted 07-23-2024 09:31 PM

You didn't show any code. Just a listing.

First let's convert your original listing into an actual dataset so we have something to code with.

data have;
  input name & :$30. id admit :mmddyy. test :mmddyy.;
  format admit test yymmdd10.;
cards;
Pete Smith   123 1/2/2020 12/25/2019
Pete Smith   123 3/5/2020 12/25/2019
Sarah Jones  456 2/2/2020 1/25/2020
Sarah Jones  456 4/5/2020 1/25/2020
Mark Adams   789 4/7/2020 3/25/2020
Mark Adams   789 6/1/2020 3/25/2020
;

Since it is already sorted by ID we can skip the sorting and go straight to transposing it.

proc transpose data=have out=want prefix=admit;
  by id name test;
run;

Result:

Obs     id       name              test    _NAME_        admit1        admit2

 1     123    Pete Smith     2019-12-25    admit     2020-01-02    2020-03-05
 2     456    Sarah Jones    2020-01-25    admit     2020-02-02    2020-04-05
 3     789    Mark Adams     2020-03-25    admit     2020-04-07    2020-06-01

Now if you want to find the difference in DAYS between ADMIT1 and ADMIT2 you can use subtraction. If you want it in some other date interval use INTCK() function.

data differenes;
  set want;
  days = admit2-admit1;
  months = intck('month',admit1,admit2,'cont');
run;

Result

Obs     id       name              test    _NAME_        admit1        admit2    days    months

 1     123    Pete Smith     2019-12-25    admit     2020-01-02    2020-03-05     63        2
 2     456    Sarah Jones    2020-01-25    admit     2020-02-02    2020-04-05     63        2
 3     789    Mark Adams     2020-03-25    admit     2020-04-07    2020-06-01     55        1

View solution in original post

Tom · Posted 07-19-2024 04:36 PM

That data poses no problem for PROC TRANSPOSE.

proc transpose data=have out=want(drop=_name_) prefix=admitdate ;
  by patientid patientname testdate;
  var admitdate;
run;

cc15 · Posted 07-23-2024 05:53 PM

I tried the following code, but did not get the desired results, this is what I got:

Patient Name	Patient ID	Name of former variable	Admission date1	Test date
Pete Smith	123	Admission date	1/2/2020	12/25/2019
Pete Smith	123	Admission date	3/5/2020	12/25/2019
Sarah Jones	456	Admission date	2/2/2020	1/25/2020
Sarah Jones	456	Admission date	4/5/2020	1/25/2020
Mark Adams	789	Admission date	4/7/2020	3/25/2020
Mark Adams	789	Admission date	6/1/2020	3/25/2020

Tom · Posted 07-23-2024 09:31 PM

You didn't show any code. Just a listing.

First let's convert your original listing into an actual dataset so we have something to code with.

data have;
  input name & :$30. id admit :mmddyy. test :mmddyy.;
  format admit test yymmdd10.;
cards;
Pete Smith   123 1/2/2020 12/25/2019
Pete Smith   123 3/5/2020 12/25/2019
Sarah Jones  456 2/2/2020 1/25/2020
Sarah Jones  456 4/5/2020 1/25/2020
Mark Adams   789 4/7/2020 3/25/2020
Mark Adams   789 6/1/2020 3/25/2020
;

Since it is already sorted by ID we can skip the sorting and go straight to transposing it.

proc transpose data=have out=want prefix=admit;
  by id name test;
run;

Result:

Obs     id       name              test    _NAME_        admit1        admit2

 1     123    Pete Smith     2019-12-25    admit     2020-01-02    2020-03-05
 2     456    Sarah Jones    2020-01-25    admit     2020-02-02    2020-04-05
 3     789    Mark Adams     2020-03-25    admit     2020-04-07    2020-06-01

Now if you want to find the difference in DAYS between ADMIT1 and ADMIT2 you can use subtraction. If you want it in some other date interval use INTCK() function.

data differenes;
  set want;
  days = admit2-admit1;
  months = intck('month',admit1,admit2,'cont');
run;

Result

Obs     id       name              test    _NAME_        admit1        admit2    days    months

 1     123    Pete Smith     2019-12-25    admit     2020-01-02    2020-03-05     63        2
 2     456    Sarah Jones    2020-01-25    admit     2020-02-02    2020-04-05     63        2
 3     789    Mark Adams     2020-03-25    admit     2020-04-07    2020-06-01     55        1

Kurt_Bremser · Posted 07-19-2024 04:51 PM

Since you cannot have two variables with the same name, this is impossible.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

A_Kh · Posted 07-19-2024 05:12 PM

Alternatively, without changing data structure, it is possible to calculate time difference (eg. how many days) using proc sql summary functions.
Eg:

data have;
infile cards dlm=',' truncover; 
length PatientName $20;
input PatientName $	PatientID $	Admissiondate:mmddyy10.	Testdate: mmddyy10.;
format Admissiondate	Testdate date11.; 
cards; 
Pete Smith,	123,	1/2/2020,	12/25/2019
Pete Smith,	123,	3/5/2020,	12/25/2019
Sarah Jones,	456,	2/2/2020,	1/25/2020
Sarah Jones,	456,	4/5/2020,	1/25/2020
Mark Adams,	789,	4/7/2020,	3/25/2020
Mark Adams,	789,	6/1/2020,	3/25/2020
; 

proc sql;
	create table want as
		select*, min(Admissiondate) as FirstAdmissionDate, max(Admissiondate) as LastAdmissionDate
			from have
		group by 1, 2, 4
	order by 2,3;
quit;

Days= LastAdmissionDate - FirstAdmissionDate;

Tom · Posted 07-20-2024 03:37 PM

@cc15 wrote:

... so that I can calculate the time difference between the patients first and second hospitalization ...

No need to transpose to do that.

data want;
  set have;
  by patientid admitdate ;
  days = dif(admitdate);
  if first.patientid then do;
      admitdate1 = admitdate;
     retain admitdate1 ;
    format admitdate1 yymmdd10.;
      days=0;
  end;
  if last.patientid;
run;

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away