I would like to transpose the following dataset so that I can calculate the time difference between the patients first and second hospitalization. How would I do that with duplicate identifiers?
This is how my dataset is presently formatted (and some patients may have more than two hospitalizations)
Patient Name | Patient ID | Admission date | Test date |
Pete Smith | 123 | 1/2/2020 | 12/25/2019 |
Pete Smith | 123 | 3/5/2020 | 12/25/2019 |
Sarah Jones | 456 | 2/2/2020 | 1/25/2020 |
Sarah Jones | 456 | 4/5/2020 | 1/25/2020 |
Mark Adams | 789 | 4/7/2020 | 3/25/2020 |
Mark Adams | 789 | 6/1/2020 | 3/25/2020 |
This is how I would like it to look
Patient Name | Patient ID | Test date | Admission date | Admission date |
Pete Smith | 123 | 12/25/2019 | 1/2/2020 | 3/5/2020 |
Sarah Jones | 456 | 1/25/2020 | 2/2/2020 | 4/5/2020 |
Mark Adams | 789 | 3/25/2020 | 4/7/2020 | 6/1/2020 |
Is transposing the best way to do this? Or would I have to merge?
You didn't show any code. Just a listing.
First let's convert your original listing into an actual dataset so we have something to code with.
data have;
input name & :$30. id admit :mmddyy. test :mmddyy.;
format admit test yymmdd10.;
cards;
Pete Smith 123 1/2/2020 12/25/2019
Pete Smith 123 3/5/2020 12/25/2019
Sarah Jones 456 2/2/2020 1/25/2020
Sarah Jones 456 4/5/2020 1/25/2020
Mark Adams 789 4/7/2020 3/25/2020
Mark Adams 789 6/1/2020 3/25/2020
;
Since it is already sorted by ID we can skip the sorting and go straight to transposing it.
proc transpose data=have out=want prefix=admit;
by id name test;
run;
Result:
Obs id name test _NAME_ admit1 admit2 1 123 Pete Smith 2019-12-25 admit 2020-01-02 2020-03-05 2 456 Sarah Jones 2020-01-25 admit 2020-02-02 2020-04-05 3 789 Mark Adams 2020-03-25 admit 2020-04-07 2020-06-01
Now if you want to find the difference in DAYS between ADMIT1 and ADMIT2 you can use subtraction. If you want it in some other date interval use INTCK() function.
data differenes;
set want;
days = admit2-admit1;
months = intck('month',admit1,admit2,'cont');
run;
Result
Obs id name test _NAME_ admit1 admit2 days months 1 123 Pete Smith 2019-12-25 admit 2020-01-02 2020-03-05 63 2 2 456 Sarah Jones 2020-01-25 admit 2020-02-02 2020-04-05 63 2 3 789 Mark Adams 2020-03-25 admit 2020-04-07 2020-06-01 55 1
That data poses no problem for PROC TRANSPOSE.
proc transpose data=have out=want(drop=_name_) prefix=admitdate ;
by patientid patientname testdate;
var admitdate;
run;
I tried the following code, but did not get the desired results, this is what I got:
Patient Name | Patient ID | Name of former variable | Admission date1 | Test date |
Pete Smith | 123 | Admission date | 1/2/2020 | 12/25/2019 |
Pete Smith | 123 | Admission date | 3/5/2020 | 12/25/2019 |
Sarah Jones | 456 | Admission date | 2/2/2020 | 1/25/2020 |
Sarah Jones | 456 | Admission date | 4/5/2020 | 1/25/2020 |
Mark Adams | 789 | Admission date | 4/7/2020 | 3/25/2020 |
Mark Adams | 789 | Admission date | 6/1/2020 | 3/25/2020 |
You didn't show any code. Just a listing.
First let's convert your original listing into an actual dataset so we have something to code with.
data have;
input name & :$30. id admit :mmddyy. test :mmddyy.;
format admit test yymmdd10.;
cards;
Pete Smith 123 1/2/2020 12/25/2019
Pete Smith 123 3/5/2020 12/25/2019
Sarah Jones 456 2/2/2020 1/25/2020
Sarah Jones 456 4/5/2020 1/25/2020
Mark Adams 789 4/7/2020 3/25/2020
Mark Adams 789 6/1/2020 3/25/2020
;
Since it is already sorted by ID we can skip the sorting and go straight to transposing it.
proc transpose data=have out=want prefix=admit;
by id name test;
run;
Result:
Obs id name test _NAME_ admit1 admit2 1 123 Pete Smith 2019-12-25 admit 2020-01-02 2020-03-05 2 456 Sarah Jones 2020-01-25 admit 2020-02-02 2020-04-05 3 789 Mark Adams 2020-03-25 admit 2020-04-07 2020-06-01
Now if you want to find the difference in DAYS between ADMIT1 and ADMIT2 you can use subtraction. If you want it in some other date interval use INTCK() function.
data differenes;
set want;
days = admit2-admit1;
months = intck('month',admit1,admit2,'cont');
run;
Result
Obs id name test _NAME_ admit1 admit2 days months 1 123 Pete Smith 2019-12-25 admit 2020-01-02 2020-03-05 63 2 2 456 Sarah Jones 2020-01-25 admit 2020-02-02 2020-04-05 63 2 3 789 Mark Adams 2020-03-25 admit 2020-04-07 2020-06-01 55 1
Since you cannot have two variables with the same name, this is impossible.
Alternatively, without changing data structure, it is possible to calculate time difference (eg. how many days) using proc sql summary functions.
Eg:
data have;
infile cards dlm=',' truncover;
length PatientName $20;
input PatientName $ PatientID $ Admissiondate:mmddyy10. Testdate: mmddyy10.;
format Admissiondate Testdate date11.;
cards;
Pete Smith, 123, 1/2/2020, 12/25/2019
Pete Smith, 123, 3/5/2020, 12/25/2019
Sarah Jones, 456, 2/2/2020, 1/25/2020
Sarah Jones, 456, 4/5/2020, 1/25/2020
Mark Adams, 789, 4/7/2020, 3/25/2020
Mark Adams, 789, 6/1/2020, 3/25/2020
;
proc sql;
create table want as
select*, min(Admissiondate) as FirstAdmissionDate, max(Admissiondate) as LastAdmissionDate
from have
group by 1, 2, 4
order by 2,3;
quit;
Days= LastAdmissionDate - FirstAdmissionDate;
@cc15 wrote:
... so that I can calculate the time difference between the patients first and second hospitalization ...
No need to transpose to do that.
data want;
set have;
by patientid admitdate ;
days = dif(admitdate);
if first.patientid then do;
admitdate1 = admitdate;
retain admitdate1 ;
format admitdate1 yymmdd10.;
days=0;
end;
if last.patientid;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.