Hi, I have a question about converting long data to wide data within several groups.
My original data looks like below for same id.
id surgery_date medication_date medication_name
1 03/15/2010 05/01/2010 A
1 03/15/2010 05/01/2010 B
1 03/15/2010 09/01/2010 C
1 03/15/2010 11/01/2010 D
1 05/01/2011 09/01/2011 E
1 05/01/2011 09/01/2011 F
1 05/01/2011 09/01/2011 G
1 05/01/2011 12/01/2011 H
I need the new data looks like the following.
id surgery_date medication_date_1 medication_name_1 medication_name_2 medication_name_2 medication_date_2 medication_name_1 medication_name_2 medication_date_3 medication_name_1 medication_name_2
1 03/15/2010 05/01/2010 A B 09/01/2010 C 11/01/2010 D
1 05/01/2011 09/01/2011 E F G 12/01/2011 H
If there is no medication_date_&i. variables in each line, I know how to put the medication_name from long data to wide data. However now, in each line there are also medication name group. In this case, I am not sure if SAS could get the new dataset like the above.
Thanks so much for your help!
I'm guessing as it's hard to read your desired output but does this come close to what you want?
data have;
informat id $4. surgery_date medication_date mmddyy10. medication_name $4.;
format surgery_date medication_date mmddyy10. ;
input id surgery_date medication_date medication_name;
datalines;
1 03/15/2010 05/01/2010 A
1 03/15/2010 05/01/2010 B
1 03/15/2010 09/01/2010 C
1 03/15/2010 11/01/2010 D
1 05/01/2011 09/01/2011 E
1 05/01/2011 09/01/2011 F
1 05/01/2011 09/01/2011 G
1 05/01/2011 12/01/2011 H
;
run;
proc transpose data=have out=trans1(drop=_name_) Prefix=Medication_date;
by id surgery_date;
var medication_date;
run;
proc transpose data=have out=trans2(drop=_name_) Prefix=medication_name;
by id surgery_date;
var medication_name;
run;
data want;
merge trans1 trans2;
by id surgery_date;
run;
Thanks ballardw for your effort. Sorry I did not notice that my result data looks not clear on different screen.
I got your answer, transposing the two variables medication_date and medication_name from long to wide and merge together, but I am wondering how to tell which medication_name within which medication_date in one row. The result I was trying to get looks as below for columns. Thanks!
id surgery_date med_date_1 med_name_1 med_name_2 med_date_2 med_name_1 med_name_2
It doesn't do what you want. This is a bad data design in so many ways, it'll end up being wider than necessary by far with many empty cells. Why do you want it this way?
I agree with your point. This is not good data structure. The data format was required by someone else.
unless the data get sorted in between the two transpose statements date1 should go with medication1
In the OP expected output the date, 05/01/2010, is only outputted once whereas in yours it's outputted twice.
I don't agree with the structure of the expected data however and your output is a better format.
As I said, I was guessing at the requirement. The apparent requirement I thought was there was to group on two variables as the statement was unclear. And considering an apparent 3 month gap between the medication dates ...
Thanks for your help. The data was required by someone else.
The most convenient way is using proc means .
Or use MERGE skill which me , Matt and Arthur.T have been talked at
http://support.sas.com/resources/papers/proceedings15/2785-2015.pdf
data have;
informat id $4. surgery_date medication_date mmddyy10. medication_name $4.;
format surgery_date medication_date mmddyy10. ;
input id surgery_date medication_date medication_name;
datalines;
1 03/15/2010 05/01/2010 A
1 03/15/2010 05/01/2010 B
1 03/15/2010 09/01/2010 C
1 03/15/2010 11/01/2010 D
1 05/01/2011 09/01/2011 E
1 05/01/2011 09/01/2011 F
1 05/01/2011 09/01/2011 G
1 05/01/2011 12/01/2011 H
;
run;
proc sql;
select max(count) into : n
from (select count(*) as count from have group by id,surgery_date);
quit;
proc summary data=have nway;
class id surgery_date;
output out=want idgroup(out[&n] (medication_date medication_name )=);
run;
Xia Keshan
Thanks Xia Ke for your help. Even though the result format is still not what I need, your coding is really nice.
OU. I just realize what you are looking for .
data have;
informat id $4. surgery_date medication_date mmddyy10. medication_name $4.;
format surgery_date medication_date mmddyy10. ;
input id surgery_date medication_date medication_name;
datalines;
1 03/15/2010 05/01/2010 A
1 03/15/2010 05/01/2010 B
1 03/15/2010 09/01/2010 C
1 03/15/2010 11/01/2010 D
1 05/01/2011 09/01/2011 E
1 05/01/2011 09/01/2011 F
1 05/01/2011 09/01/2011 G
1 05/01/2011 12/01/2011 H
;
run;
proc sql;
select max(count) into : n separated by ' '
from (select count(*) as count from have group by id,surgery_date,medication_date);
quit;
proc summary data=have nway;
class id surgery_date medication_date;
output out=temp(drop=_:) idgroup(out[&n] (medication_name )=);
run;
proc sql;
select max(count) into : m separated by ' '
from (select count(*) as count from temp group by id,surgery_date);
quit;
proc summary data=temp nway;
class id surgery_date;
output out=want(drop=_:) idgroup(out[&m] (medication_date medication_name_1-medication_name_&n )=);
run;
proc sql;
select name into : list separated by ' '
from dictionary.columns
where libname='WORK' and memname='WANT' and lowcase(name) like 'medication%'
order by input(scan(name,-1,'_'),best8.),name;
quit;
data final_want;
retain id surgery_date &list ;
set want;
run;
Xia Keshan
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.