I am having trouble transposing my dataset which has multiple observations per subject. When I transpose I loose some of the information
My data looks like this:
data set have:
id side indate isolate type
1 1 07-03-17 s. aureus vile
1 1 07-03-17 s. hominis bottle
2 1 10-3-16 s. epidermidis vile
2 1 10-3-16 s. epidermidis bottle
2 1 04-10-17 s. pyogenes vile
2 1 04-10-17 s. pyogenes bottle
3 2 05-05-18 neg vile
4 2 04-25-16 mixed anaerobic culture vile
4 2 04-25-16 e.coli vile
4 2 04-25-16 e. faecalis bottle
4 2 12-13-16 neg vile
I have used this code but it gives me essentially the same data that I already have.
proc transpose data=have out=transhave (drop=_NAME_ _LABEL_) prefix=type_ let ;
by indate;
id type;
var isolat;
copy id side ;
run;
it looks like this:
id side indate vile bottle
1 1 07-03-17 s. aureus .
1 1 07-03-17 "." s. hominis
2 1 10-3-16 s. epidermidis "."
2 1 10-3-16 "." s. epidermidis
ect.
I would like data that looks something like this:
data set want
id side indate-1 isolate_01 isolate_02 isolate_03 indate_2 isolate2_1 isolate2_2
1 1 07-03-17 s. aureus s. hominis
2 1 10-3-16 s. epidermidis . 04-10-17 s. pyogenes
3 2 05-05-18 neg
4 2 04-25-16 mixed anaerobic culture e.coli e. faecalis 12-13-16 neg
I guess I am missing a few steps in getting what I want, I just can't figure out how to get there.
Any help is much appreciated!
Please try the below code
proc sort data=have;
by id side indate;
run;
proc transpose data=have out=want1 prefix=indate;
by id side;
var indate;
run;
proc transpose data=have out=want2 prefix=isolate;
by id side;
var isolate;
run;
data want;
merge want1(in=a) want2(in=b);
by id side;
run;
Please try the below code
proc sort data=have;
by id side indate;
run;
proc transpose data=have out=want1 prefix=indate;
by id side;
var indate;
run;
proc transpose data=have out=want2 prefix=isolate;
by id side;
var isolate;
run;
data want;
merge want1(in=a) want2(in=b);
by id side;
run;
Thanks! It works perfectly. I am so happy 🙂
So you are trying to transpose from long - the preferred format for many tasks - to wide - a hardly useful format to store data. Why? The number of variables in the dataset you want depends on the data you have and will most likely differ between files you process, making it harder to write the code for all subsequent steps.
I use the wide format for logistic regression. It is how I have learned to do it.
@ansepans wrote:
I use the wide format for logistic regression. It is how I have learned to do it.
This is one of the few reasons where the wide format is needed. All information about a subject needs to be in a single observation.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.