Help using Base SAS procedures

Difficult Reshaping of data

Reply
Occasional Contributor
Posts: 6

Difficult Reshaping of data

Hello All,

 

I have tried Proc Trnaspose in several variation but cant figure this one out. I guess it is going to take more than proc transpose. I have limited array knowledge and I think it can be done using arrays. 

 

Here is how my data looks like:

 

Obs

ID

Labtest

Result

date

event

1

1

Calcium

9.1

06JUN2014

Outpt

2

1

Calcium

8.1

06JUN2015

Outpt

3

1

Calcium

9.6

07JUL2014

Inpt

4

1

Pot

4.1

16JUN2015

Outpt

5

1

Pot

3.1

06JUN2014

Inpt

6

1

Pot

4.6

07JUL2014

Inpt

7

1

Mag

2.1

26JUN2015

Inpt

8

1

Mag

2.1

06JUL2014

Inpt

9

1

Mag

1.6

07JUL2012

Inpt

10

2

Creat

1.1

07JUL2012

Outpt

11

2

Creat

1.0

06JUN2015

Outpt

12

2

Creat

2.0

06JUL2014

Inpt

13

2

Pot

3.1

07JUL2012

Outpt

14

2

Pot

4.1

06JUN2014

Inpt

15

2

Pot

4.6

16JUN2015

Inpt

16

2

Mag

1.1

07JUL2014

Inpt

17

2

Mag

1.1

06JUL2014

Inpt

18

2

Mag

1.3

06JUN2014

Inpt


 

 

Here is the code for it:

data have; 
  input ID Labtest $ Result date $9. event $; 
 cards; 
1 Calcium 9.1 06JUN2014 Outpt
1 Calcium 8.1 06JUN2015 Outpt
1 Calcium 9.6 07JUL2014 Inpt
1 Pot 4.1 16JUN2015 Outpt
1 Pot 3.1 06JUN2014 Inpt
1 Pot 4.6 07JUL2014 Inpt
1 Mag 2.1 26JUN2015 Inpt
1 Mag 2.1 06JUL2014 Inpt
1 Mag 1.6 07JUL2012 Inpt
2 Creat 1.1 07JUL2012 Outpt
2 Creat 1.0 06JUN2015 Outpt
2 Creat 2.0 06JUL2014 Inpt
2 Pot 3.1 07JUL2012 Outpt
2 Pot 4.1 06JUN2014 Inpt
2 Pot 4.6 16JUN2015 Inpt
2 Mag 1.1 07JUL2014 Inpt
2 Mag 1.1 06JUL2014 Inpt
2 Mag 1.3 06JUN2014 Inpt
; 
run;

 

I want my data to be reshaped into this structure:

 

 

Obs

ID

Calcium

Calcium_Date

Calcium_Event

Pot

Pot_Date

Pot_Event

Mag

Mag_Date

Mag_Event

Creat

Creat_Date

Creat_Event

1

1

9.1

06JUN2014

Outpt

4.1

16JUN2015

Outpt

2.1

26JUN2015

Inpt

.

 

 

2

1

8.1

06JUN2015

Outpt

3.1

06JUN2014

Inpt

2.1

06JUL2014

Inpt

.

 

 

3

1

9.6

07JUL2014

Inpt

4.6

07JUL2014

Inpt

1.6

07JUL2012

Inpt

.

 

 

4

2

.

 

 

3.1

07JUL2012

Outpt

1.1

07JUL2014

Inpt

1.1

07JUL2012

Outpt

5

2

.

 

 

4.1

06JUN2014

Inpt

1.1

06JUL2014

Inpt

1.0

06JUN2015

Outpt

6

2

.

 

 

4.6

16JUN2015

Inpt

1.3

06JUN2014

Inpt

2.0

06JUL2014

Inpt

 

 

Thank you, 

your help is really appreciated.

 

Bhupesh Panwar

 

 
Trusted Advisor
Posts: 1,607

Re: Difficult Reshaping of data

[ Edited ]

In the output data set, what is the ordering of the rows within ID?

 

Why does Calcium_Date of 06JUN14 wind up in the same row as Pot_Date of 16JUN15 and Mag_Date of 26JUN15?

Super User
Super User
Posts: 7,392

Re: Difficult Reshaping of data

Hi,

 

I would agree, seems a bit odd the order.  I would suggest you need to first sort the data, and then apply a sequence number to the data.  Then its simply a matter of transposing it and merging by the sequence number:

proc sort data=have;
by id labtest date;
run;

data have;
set have;
by id labtest;
retain seq;
if first.labtest then seq=1;
else seq=seq+1;
run;

 

However, two questions from my side.  Why is date a character variable?  If you are not using ISO dates then you need to have these as numeric otherwise they will not sort correctly.  Secondly, this is clinical lab data, why are you not using CDISC SDTM models?  The transposed output you seem to require - and it would only by for a Listing, there is no point working data transposed - would be far easier with that standard structure.

Pretty much industry standard now: http://www.cdisc.org/sdtm

Occasional Contributor
Posts: 6

Re: Difficult Reshaping of data

I tried this but I am not sure how a seq variable is going to help.

proc sort data=have;
by id labtest;
run;

data have;
set have;
by id labtest;
retain seq;
if first.labtest then seq=1;
else seq=seq+1;
run;

proc sort data=have;
by id;
run;

proc transpose data=have out=want;
by id;
var result;
id labtest;
run;

This is giving me these errors in the log:
ERROR: The ID value "Calcium" occurs twice in the same BY group.
ERROR: The ID value "Calcium" occurs twice in the same BY group.
ERROR: The ID value "Mag" occurs twice in the same BY group.
ERROR: The ID value "Mag" occurs twice in the same BY group.
ERROR: The ID value "Pot" occurs twice in the same BY group.
ERROR: The ID value "Pot" occurs twice in the same BY group.
NOTE: The above message was for the following BY group:
ID=1
ERROR: The ID value "Creat" occurs twice in the same BY group.
ERROR: The ID value "Creat" occurs twice in the same BY group.
ERROR: The ID value "Mag" occurs twice in the same BY group.
ERROR: The ID value "Mag" occurs twice in the same BY group.
ERROR: The ID value "Pot" occurs twice in the same BY group.
ERROR: The ID value "Pot" occurs twice in the same BY gro



To answer your questions:
1. Date is indeed a numeric variable in the original dataset. This is something I created quickly to get help.

2. I am not familiar with CDISC SDTM models. I am a clinician and would like to reshape data in a way that is easy for me to visualize and analyze.

Thanks again
Occasional Contributor
Posts: 6

Re: Difficult Reshaping of data

There is no specific ordering within ID. This is not an actual data set. i just created this randomly to get help.
Trusted Advisor
Posts: 1,607

Re: Difficult Reshaping of data


bhupesh102 wrote:
There is no specific ordering within ID. This is not an actual data set. i just created this randomly to get help.

If you can't be specific about what you want and explain the output you show, then how can we help you, unless we give you SAS code that randomly puts data together?

 

 

Super User
Super User
Posts: 6,497

Re: Difficult Reshaping of data

Looks like you are just pasting the values for CALCIUM next to the values for POT, MAG, etc..

You can do that with the MERGE statement, if you take care to clear the values before the next interation of the data step.

 

data want ;
   merge have (where = (labtest='Calcium') rename=result=calcium_result date=calcium_date .... )
        have (where = (labtest='Pot') rename=result=pot_result date=pot_date .... )
  ;
  by id;
  output;
  call missing(of _all_);
run;

     

Occasional Contributor
Posts: 6

Re: Difficult Reshaping of data

Tom,

 

Thank you for your solution. I ended up using this code. I have also attached the log. i was able to get the output I desired. I just wanted someone to look at the log for correctness before I apply this to the actual dataset.

 

data want ;
   merge have (where = (labtest='Calcium') rename = (result=calcium date=calcium_date event=calcium_event))
        have (where = (labtest='Pot') rename = (result=potassium date=pot_date event=pot_event))
		 have (where = (labtest='Mag') rename = (result=Magnesium date=Mag_date event=Mag_event))
		 have (where = (labtest='Creat') rename = (result=Creatnine date=Creat_date event=Creat_event))
  ;
  by id;
  output;
  call missing(of _all_);
drop labtest;
run;

SAS LOG

 

103 data want ;
104 merge have (where = (labtest='Calcium') rename = (result=calcium date=calcium_date
104! event=calcium_event))
105 have (where = (labtest='Pot') rename = (result=potassium date=pot_date
105! event=pot_event))
106 have (where = (labtest='Mag') rename = (result=Magnesium date=Mag_date
106! event=Mag_event))
107 have (where = (labtest='Creat') rename = (result=Creatnine date=Creat_date
107! event=Creat_event))
108 ;
109 by id;
110 output;
111 call missing(of _all_);
112 drop labtest;
113 run;

NOTE: MERGE statement has more than one data set with repeats of BY values.
NOTE: There were 3 observations read from the data set WORK.HAVE.
WHERE labtest='Calcium';
NOTE: There were 6 observations read from the data set WORK.HAVE.
WHERE labtest='Pot';
NOTE: There were 6 observations read from the data set WORK.HAVE.
WHERE labtest='Mag';
NOTE: There were 3 observations read from the data set WORK.HAVE.
WHERE labtest='Creat';
NOTE: The data set WORK.WANT has 6 observations and 13 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds


Tom wrote:

Looks like you are just pasting the values for CALCIUM next to the values for POT, MAG, etc..

You can do that with the MERGE statement, if you take care to clear the values before the next interation of the data step.

 

data want ;
   merge have (where = (labtest='Calcium') rename=result=calcium_result date=calcium_date .... )
        have (where = (labtest='Pot') rename=result=pot_result date=pot_date .... )
  ;
  by id;
  output;
  call missing(of _all_);
run;

     


Ask a Question
Discussion stats
  • 7 replies
  • 355 views
  • 2 likes
  • 4 in conversation