Re: Difficult Reshaping of data

bhupesh102 · Posted 09-21-2015 09:45 AM

Hello All,

I have tried Proc Trnaspose in several variation but cant figure this one out. I guess it is going to take more than proc transpose. I have limited array knowledge and I think it can be done using arrays.

Here is how my data looks like:

Obs	ID	Labtest	Result	date	event
1	1	Calcium	9.1	06JUN2014	Outpt
2	1	Calcium	8.1	06JUN2015	Outpt
3	1	Calcium	9.6	07JUL2014	Inpt
4	1	Pot	4.1	16JUN2015	Outpt
5	1	Pot	3.1	06JUN2014	Inpt
6	1	Pot	4.6	07JUL2014	Inpt
7	1	Mag	2.1	26JUN2015	Inpt
8	1	Mag	2.1	06JUL2014	Inpt
9	1	Mag	1.6	07JUL2012	Inpt
10	2	Creat	1.1	07JUL2012	Outpt
11	2	Creat	1.0	06JUN2015	Outpt
12	2	Creat	2.0	06JUL2014	Inpt
13	2	Pot	3.1	07JUL2012	Outpt
14	2	Pot	4.1	06JUN2014	Inpt
15	2	Pot	4.6	16JUN2015	Inpt
16	2	Mag	1.1	07JUL2014	Inpt
17	2	Mag	1.1	06JUL2014	Inpt
18	2	Mag	1.3	06JUN2014	Inpt

Here is the code for it:

data have; 
  input ID Labtest $ Result date $9. event $; 
 cards; 
1 Calcium 9.1 06JUN2014 Outpt
1 Calcium 8.1 06JUN2015 Outpt
1 Calcium 9.6 07JUL2014 Inpt
1 Pot 4.1 16JUN2015 Outpt
1 Pot 3.1 06JUN2014 Inpt
1 Pot 4.6 07JUL2014 Inpt
1 Mag 2.1 26JUN2015 Inpt
1 Mag 2.1 06JUL2014 Inpt
1 Mag 1.6 07JUL2012 Inpt
2 Creat 1.1 07JUL2012 Outpt
2 Creat 1.0 06JUN2015 Outpt
2 Creat 2.0 06JUL2014 Inpt
2 Pot 3.1 07JUL2012 Outpt
2 Pot 4.1 06JUN2014 Inpt
2 Pot 4.6 16JUN2015 Inpt
2 Mag 1.1 07JUL2014 Inpt
2 Mag 1.1 06JUL2014 Inpt
2 Mag 1.3 06JUN2014 Inpt
; 
run;

I want my data to be reshaped into this structure:

Obs	ID	Calcium	Calcium_Date	Calcium_Event	Pot	Pot_Date	Pot_Event	Mag	Mag_Date	Mag_Event	Creat	Creat_Date	Creat_Event
1	1	9.1	06JUN2014	Outpt	4.1	16JUN2015	Outpt	2.1	26JUN2015	Inpt	.
2	1	8.1	06JUN2015	Outpt	3.1	06JUN2014	Inpt	2.1	06JUL2014	Inpt	.
3	1	9.6	07JUL2014	Inpt	4.6	07JUL2014	Inpt	1.6	07JUL2012	Inpt	.
4	2	.			3.1	07JUL2012	Outpt	1.1	07JUL2014	Inpt	1.1	07JUL2012	Outpt
5	2	.			4.1	06JUN2014	Inpt	1.1	06JUL2014	Inpt	1.0	06JUN2015	Outpt
6	2	.			4.6	16JUN2015	Inpt	1.3	06JUN2014	Inpt	2.0	06JUL2014	Inpt

Thank you,

your help is really appreciated.

Bhupesh Panwar

PaigeMiller · Posted 09-21-2015 10:39 AM

In the output data set, what is the ordering of the rows within ID?

Why does Calcium_Date of 06JUN14 wind up in the same row as Pot_Date of 16JUN15 and Mag_Date of 26JUN15?

--
Paige Miller

RW9 · Posted 09-21-2015 10:43 AM

Hi,

I would agree, seems a bit odd the order. I would suggest you need to first sort the data, and then apply a sequence number to the data. Then its simply a matter of transposing it and merging by the sequence number:

proc sort data=have;
by id labtest date;
run;

data have;
set have;
by id labtest;
retain seq;
if first.labtest then seq=1;
else seq=seq+1;
run;

However, two questions from my side. Why is date a character variable? If you are not using ISO dates then you need to have these as numeric otherwise they will not sort correctly. Secondly, this is clinical lab data, why are you not using CDISC SDTM models? The transposed output you seem to require - and it would only by for a Listing, there is no point working data transposed - would be far easier with that standard structure.

Pretty much industry standard now: http://www.cdisc.org/sdtm

bhupesh102 · Posted 09-21-2015 12:42 PM

I tried this but I am not sure how a seq variable is going to help.

proc sort data=have;
by id labtest;
run;

data have;
set have;
by id labtest;
retain seq;
if first.labtest then seq=1;
else seq=seq+1;
run;

proc sort data=have;
by id;
run;

proc transpose data=have out=want;
by id;
var result;
id labtest;
run;

This is giving me these errors in the log:
ERROR: The ID value "Calcium" occurs twice in the same BY group.
ERROR: The ID value "Calcium" occurs twice in the same BY group.
ERROR: The ID value "Mag" occurs twice in the same BY group.
ERROR: The ID value "Mag" occurs twice in the same BY group.
ERROR: The ID value "Pot" occurs twice in the same BY group.
ERROR: The ID value "Pot" occurs twice in the same BY group.
NOTE: The above message was for the following BY group:
ID=1
ERROR: The ID value "Creat" occurs twice in the same BY group.
ERROR: The ID value "Creat" occurs twice in the same BY group.
ERROR: The ID value "Mag" occurs twice in the same BY group.
ERROR: The ID value "Mag" occurs twice in the same BY group.
ERROR: The ID value "Pot" occurs twice in the same BY group.
ERROR: The ID value "Pot" occurs twice in the same BY gro

To answer your questions:
1. Date is indeed a numeric variable in the original dataset. This is something I created quickly to get help.

2. I am not familiar with CDISC SDTM models. I am a clinician and would like to reshape data in a way that is easy for me to visualize and analyze.

Thanks again

bhupesh102 · Posted 09-21-2015 12:37 PM

There is no specific ordering within ID. This is not an actual data set. i just created this randomly to get help.

PaigeMiller · Posted 09-21-2015 02:27 PM

@bhupesh102 wrote:
There is no specific ordering within ID. This is not an actual data set. i just created this randomly to get help.

If you can't be specific about what you want and explain the output you show, then how can we help you, unless we give you SAS code that randomly puts data together?

--
Paige Miller

Tom · Posted 09-21-2015 03:19 PM

Looks like you are just pasting the values for CALCIUM next to the values for POT, MAG, etc..

You can do that with the MERGE statement, if you take care to clear the values before the next interation of the data step.

data want ;
   merge have (where = (labtest='Calcium') rename=result=calcium_result date=calcium_date .... )
        have (where = (labtest='Pot') rename=result=pot_result date=pot_date .... )
  ;
  by id;
  output;
  call missing(of _all_);
run;

bhupesh102 · Posted 09-21-2015 04:16 PM

Tom,

Thank you for your solution. I ended up using this code. I have also attached the log. i was able to get the output I desired. I just wanted someone to look at the log for correctness before I apply this to the actual dataset.

data want ;
   merge have (where = (labtest='Calcium') rename = (result=calcium date=calcium_date event=calcium_event))
        have (where = (labtest='Pot') rename = (result=potassium date=pot_date event=pot_event))
		 have (where = (labtest='Mag') rename = (result=Magnesium date=Mag_date event=Mag_event))
		 have (where = (labtest='Creat') rename = (result=Creatnine date=Creat_date event=Creat_event))
  ;
  by id;
  output;
  call missing(of _all_);
drop labtest;
run;

SAS LOG

103 data want ;
104 merge have (where = (labtest='Calcium') rename = (result=calcium date=calcium_date
104! event=calcium_event))
105 have (where = (labtest='Pot') rename = (result=potassium date=pot_date
105! event=pot_event))
106 have (where = (labtest='Mag') rename = (result=Magnesium date=Mag_date
106! event=Mag_event))
107 have (where = (labtest='Creat') rename = (result=Creatnine date=Creat_date
107! event=Creat_event))
108 ;
109 by id;
110 output;
111 call missing(of _all_);
112 drop labtest;
113 run;

NOTE: MERGE statement has more than one data set with repeats of BY values.
NOTE: There were 3 observations read from the data set WORK.HAVE.
WHERE labtest='Calcium';
NOTE: There were 6 observations read from the data set WORK.HAVE.
WHERE labtest='Pot';
NOTE: There were 6 observations read from the data set WORK.HAVE.
WHERE labtest='Mag';
NOTE: There were 3 observations read from the data set WORK.HAVE.
WHERE labtest='Creat';
NOTE: The data set WORK.WANT has 6 observations and 13 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds

@Tom wrote:
Looks like you are just pasting the values for CALCIUM next to the values for POT, MAG, etc..
You can do that with the MERGE statement, if you take care to clear the values before the next interation of the data step.
data want ;
   merge have (where = (labtest='Calcium') rename=result=calcium_result date=calcium_date .... )
        have (where = (labtest='Pot') rename=result=pot_result date=pot_date .... )
  ;
  by id;
  output;
  call missing(of _all_);
run;

SAS Innovate 2025: Call for Content

Classroom Training Available!