BookmarkSubscribeRSS Feed
michellel
Calcite | Level 5

Hi, I have a question about converting long data to wide data within several groups.

My original data looks like below for same id.

id     surgery_date               medication_date     medication_name

1          03/15/2010                  05/01/2010                   A

1          03/15/2010                  05/01/2010                   B

1          03/15/2010                  09/01/2010                   C

1          03/15/2010                  11/01/2010                   D

1          05/01/2011                  09/01/2011                   E

1          05/01/2011                  09/01/2011                   F

1          05/01/2011                  09/01/2011                   G

1          05/01/2011                  12/01/2011                   H

I need the new data looks like the following.

id     surgery_date               medication_date_1     medication_name_1     medication_name_2     medication_name_2     medication_date_2     medication_name_1     medication_name_2     medication_date_3     medication_name_1     medication_name_2

1          03/15/2010                  05/01/2010                   A                                             B                                                               09/01/2010                    C                                                                      11/01/2010                    D

1          05/01/2011                  09/01/2011                   E                                             F                              G                              12/01/2011                    H

If there is no medication_date_&i. variables in each line, I know how to put the medication_name from long data to wide data. However now, in each line there are also medication name group. In this case, I am not sure if SAS could get the new dataset like the above.

Thanks so much for your help!

11 REPLIES 11
ballardw
Super User

I'm guessing as it's hard to read your desired output but does this come close to what you want?

data have;
informat id $4.  surgery_date  medication_date mmddyy10.   medication_name $4.;
format   surgery_date  medication_date mmddyy10.   ;
input id     surgery_date    medication_date     medication_name;
datalines;
1          03/15/2010                  05/01/2010                   A
1          03/15/2010                  05/01/2010                   B
1          03/15/2010                  09/01/2010                   C
1          03/15/2010                  11/01/2010                   D
1          05/01/2011                  09/01/2011                   E
1          05/01/2011                  09/01/2011                   F
1          05/01/2011                  09/01/2011                   G
1          05/01/2011                  12/01/2011                   H
;
run;

proc transpose data=have out=trans1(drop=_name_) Prefix=Medication_date;
by id surgery_date;
var medication_date;
run;


proc transpose data=have out=trans2(drop=_name_) Prefix=medication_name;
by id surgery_date;
var medication_name;
run;

data want;
   merge  trans1 trans2;
   by id surgery_date;
run;

michellel
Calcite | Level 5

Thanks ballardw for your effort. Sorry I did not notice that my result data looks not clear on different screen.

I got your answer, transposing the two variables medication_date and medication_name from long to wide and merge together, but I am wondering how to tell which medication_name within which medication_date in one row. The result I was trying to get looks as below for columns. Thanks!

id  surgery_date med_date_1  med_name_1 med_name_2  med_date_2  med_name_1 med_name_2

Reeza
Super User

It doesn't do what you want. This is a bad data design in so many ways, it'll end up being wider than necessary by far with many empty cells. Why do you want it this way?

michellel
Calcite | Level 5

I agree with your point. This is not good data structure. The data format was required by someone else.

ballardw
Super User

unless the data get sorted in between the two transpose statements date1 should go with medication1

Reeza
Super User

In the OP expected output the date, 05/01/2010, is only outputted once whereas in yours it's outputted twice.

I don't agree with the structure of the expected data however and your output is a better format.

ballardw
Super User

As I said, I was guessing at the requirement. The apparent requirement I thought was there was to group on two variables as the statement was unclear. And considering an apparent 3 month gap between the medication dates ...

michellel
Calcite | Level 5

Thanks for your help. The data was required by someone else.

Ksharp
Super User

The most convenient way is using proc means .

Or use MERGE skill which me , Matt and Arthur.T have been talked at

http://support.sas.com/resources/papers/proceedings15/2785-2015.pdf

Code: Program

data have;
informat id $4.  surgery_date  medication_date mmddyy10.   medication_name $4.;
format   surgery_date  medication_date mmddyy10.   ;
input id   surgery_date   medication_date   medication_name;
datalines;
1 03/15/2010 05/01/2010 A
1 03/15/2010 05/01/2010 B
1 03/15/2010 09/01/2010 C
1 03/15/2010 11/01/2010 D
1 05/01/2011 09/01/2011 E
1 05/01/2011 09/01/2011 F
1 05/01/2011 09/01/2011 G
1 05/01/2011 12/01/2011 H
;
run;
proc sql;
select max(count) into : n
  from (select count(*) as count from have group by id,surgery_date);
quit;
proc summary data=have nway;
class id surgery_date;
output out=want idgroup(out[&n]  (medication_date  medication_name )=);
run;

Xia Keshan

michellel
Calcite | Level 5

Thanks Xia Ke for your help. Even though the result format is still not what I need, your coding is really nice.

Ksharp
Super User

OU. I just realize what you are looking for .

Code: Program.sas

data have;
informat id $4.  surgery_date  medication_date mmddyy10.   medication_name $4.;
format   surgery_date  medication_date mmddyy10.   ;
input id   surgery_date   medication_date   medication_name;
datalines;
1 03/15/2010 05/01/2010 A
1 03/15/2010 05/01/2010 B
1 03/15/2010 09/01/2010 C
1 03/15/2010 11/01/2010 D
1 05/01/2011 09/01/2011 E
1 05/01/2011 09/01/2011 F
1 05/01/2011 09/01/2011 G
1 05/01/2011 12/01/2011 H
;
run;
proc sql;
select max(count) into : n separated by ' '
  from (select count(*) as count from have group by id,surgery_date,medication_date);
quit;
proc summary data=have nway;
class id surgery_date medication_date;
output out=temp(drop=_:) idgroup(out[&n]  (medication_name )=);
run;

proc sql;
select max(count) into : m separated by ' '
  from (select count(*) as count from temp group by id,surgery_date);
quit;
proc summary data=temp nway;
class id surgery_date;
output out=want(drop=_:) idgroup(out[&m]  (medication_date medication_name_1-medication_name_&n )=);
run;

proc sql;
select name into : list separated by ' '
  from dictionary.columns
   where libname='WORK' and memname='WANT' and lowcase(name) like 'medication%'
   order by input(scan(name,-1,'_'),best8.),name;
quit;
data final_want;
retain id surgery_date &list ;
set want;
run;

Xia Keshan

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1229 views
  • 1 like
  • 4 in conversation