Hello,
I apologize for the very basic questions, but I've tried a few variations of proc transpose and I haven't been able to figure this one out.
Here is a sample of what my data set looks like:
PatientID | Hospital | AdmissionDate | Transfusion1Dtm | Transfusion1Product | Transfusion1Lot# | Transfusion2Dtm | Transfusion2Product | Transfusion2Lot# | Transfusion3Dtm | Transfusion3Product | Transfusion3Lot# |
1 | HSC | 20OCT2018 | 21OCT2018:18:20:00 | RBC | C054016798 | 21OCT2018:22:00 | RBC | C054016799 | 22OCT2018:14:25:00 | HAS | 435710004 |
2 | HSC | 18OCT2018 | 18OCT2018:16:55:00 | Platelet | C054015830 | ||||||
3 | SBGH | 23OCT2018 | 26OCT2018:20:00:00 | RBC | C054014779 | 27OCT2018:23:43:00 | RBC | C054014786 |
This is what I would like it to look like:
PatientID | Hospital | AdmissionDate | TransfusionDtm | TransfusionProduct | TransfusionLot# |
1 | HSC | 20OCT2018 | 21OCT2018:18:20:00 | RBC | C054016798 |
1 | HSC | 20OCT2018 | 21OCT2018:22:00:00 | RBC | C054016799 |
1 | HSC | 20OCT2018 | 22OCT2018:14:25:00 | HAS | 435710004 |
2 | HSC | 18OCT2018 | 18OCT2018:16:55:00 | Platelet | C054015830 |
3 | SBGH | 23OCT2018 | 26OCT2018:20:00:00 | RBC | C054014779 |
3 | SBGH | 23OCT2018 | 27OCT2018:23:43:00 | RBC | C054014786 |
Essentially, I would like the columns TransfusionDtm, TransfusionProduct and TransfusionLot# to collapse length wise.
Any guidance here would be very much appreciated. Thanks!!
Brett
Transposing data tutorials:
Wide to Long:
https://stats.idre.ucla.edu/sas/modules/reshaping-data-wide-to-long-using-a-data-step/
One way:
data want; set have; array tdt Transfusion1Dtm Transfusion2Dtm Transfusion3Dtm; array tpr Transfusion1Product Transfusion2Product Transfusion3Product; array tlt Transfusion1Lot Transfusion2Lot Transfusion3Lot; do i= 1 to dim(tdt); if not missing(tdt[i]) then do; transfusedt = tdt[i]; transfuseproduct = tpr[i]; tranfuselot = tlt[i]; output; end; end; keep patientid hospital admissiondate transfusedt transfuseproduct tranfuselot ; run;
The arrays are basically to simplify writing references to similar values.
With experience in SAS you will learn that naming similar variables with the number at the end such as instead of Transfusion1Dt use TransfusionDtm1 is nicer for coding. You can reference lists of similar variables with: TransfusionDtm1 - TransfusionDtm12 (if you had 12 measurements or such) or even as TransfusionDtm: the : at the end that way references all variables that start with the same characters.
May not seem like a big deal but wait until you start getting 50 similar variables. Those lists make things much nicer than having to type out all 50 names.
Transposing data tutorials:
Wide to Long:
https://stats.idre.ucla.edu/sas/modules/reshaping-data-wide-to-long-using-a-data-step/
Transposing data tutorials:
Wide to Long:
https://stats.idre.ucla.edu/sas/modules/reshaping-data-wide-to-long-using-a-data-step/
You've already marked this question as answered, but I had a couple of questions that I'd need answered before proposing what I think will be an easier solution.
1. Do you really have 3 variables named Transfusion1Lot#, Transfusion2Lot# and Transfusion3Lot# or are they called something else?
2. Your stated output file doesn't retain the 1s, 2s or 3s that were in the wide file. Do you need to keep them or are they irrelevant?
Art, CEO, AnalystFinder.com
Hi Art,
Thanks for your interest in this question - I'd love to hear what you think. The column name (in the dataset as it is) is transfusion_1_lot_num_cr, and there are actually 15 different transfusions captured (ie this extends from 1 to transfusion_15_lot_num_cr). These were unfortunately named by someone else.
You're correct that I don't need the # specified in the final column, I just want to make sure that the transfusion date/time, product and lot# move together in the table (if that makes sense).
Thanks,
Brett
Are the other two sets of variables named the same way. i.e. are they:
Transfusion1Dtm and Transfusion1Product
or
Transfusion_1_Dtm and Transfusion_1_Product etc.
or something else?
Art, CEO, AnalystFinder.com
Hi Art,
The other variables are transfusion_1_time_cr (in dtm format) and transfusion_1_product_cr.
Thx,
Brett
Are the variable names all lower case? In your original example, some characters in the variable names were upper case.
i.e., unless you indicate otherwise, my proposed solution will expect that the variables (case sensitive) will be in the form:
input PatientID Hospital AdmissionDate
transfusion_1_time_cr transfusion_1_product_cr transfusion_1_lot_num_cr
transfusion_2_time_cr transfusion_2_product_cr transfusion_2_lot_num_cr
transfusion_3_time_cr transfusion_3_product_cr transfusion_3_lot_num_cr;
etc, thru
transfusion_15_time_cr transfusion_15_product_cr transfusion_15_lot_num_cr;
Also, what is the wide dataset's actual filename?
Art, CEO, AnalystFinder.com
The variables are lower case! And the file name is 'CRSummary' (in work library).
Thanks,
Brett
In that case, I think that running the following will achieve what you want:
filename ut url 'http://tiny.cc/untranspose_macro';
%include ut ;
%untranspose(data=CRSummary, out=want, by=PatientID Hospital AdmissionDate,
prefix=transfusion_, suffix=_cr, id=num,
var=_time _product _lot_num, var_first=no)
data want;
set want (drop=num rename=(_time=time _product=product _lot_num=lot_num));
run;
It uses a SAS macro that a group of us wrote and presented at last year's SAS Global Forum. If you want to download the macro and/or read about it, you can get the paper, code, powerpoint and tip sheet at: https://github.com/art297/untranspose
Art, CEO, AnalystFinder.com
Thank-you for your help! 🙂
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.