I have a table with three variables: patient, time, drug. Let us say there are two patients. First patient received drug A at time 1 followed by drug B at time 2. Second patient received drug B at time 1 followed by drug A at time 2. I want to accumulate a substring for the group variable, such that patient 1 belongs to group "AB" and patient 2 belong to group "BA". Any suggestions are welcome!
data want;
input patient time drug $ group $;
datalines;
1 1 A AB
1 2 B AB
2 1 B BA
2 2 A BA
;
run;
/*Assuming there are no duplicated obs in your data*/
data have;
input patient time drug $;
datalines;
1 1 A
1 2 B
2 1 B
2 2 A
;
run;
data want;
do until(last.patient);
set have;
by patient;
length group $ 10;
group=cats(group,drug);
end;
do until(last.patient);
set have;
by patient;
output;
end;
run;
/*Assuming there are no duplicated obs in your data*/
data have;
input patient time drug $;
datalines;
1 1 A
1 2 B
2 1 B
2 2 A
;
run;
data want;
do until(last.patient);
set have;
by patient;
length group $ 10;
group=cats(group,drug);
end;
do until(last.patient);
set have;
by patient;
output;
end;
run;
@Ksharp , thank you, this is very helpful! I would like to have "ABB" group for duplicates, could you please advice?:
data want;
input patient time drug $ group $;
datalines;
1 1 A ABB
1 2 B ABB
1 3 B ABB
2 1 B BA
2 2 A BA
;
run;
Hi @pink_poodle , after tried many times using if first. and retain etc. techniques and failed, I suddenly realized that this is a very simple problem about proc transpose, the code is as simple as follows :
data have;
input patient time drug $;
datalines;
1 1 A
1 2 B
1 3 B
2 1 B
2 2 A
;
run;
proc print data=have;run;
proc transpose data=have out=want;
by patient;
id time;
var drug;
run;
proc print data=want;run;
And then you can use cat() function to put column 1,2,3 together to produce the patient group column of this typical cross over design. The conclusion is: proc transpose is the simple and efficient solution.
Hi @pink_poodle , the complete steps are as follows (using proc transpose to produce the group column):
data have;
input patient time drug $;
datalines;
1 1 A
1 2 B
1 3 B
2 1 B
2 2 A
;
run;
proc print data=have;run;
data have1;
set have;
timechar=cat('t',put(time,1.));
run;
proc print data=have1;run;
proc transpose data=have1 out=want;
by patient;
id timechar;
var drug;
run;
proc print data=want;run;
data want1;
set want;
group=compress(cat(t1,t2,t3));
run;
proc print data=want1;run;
proc sql;
select h.*,
w1.group
from have as h left join
want1 w1
on h.patient=w1.patient;
quit;
The last table somehow changed the order of the time column. So order by should be used here:
proc sql;
select h.*,
w1.group
from have as h left join
want1 w1
on h.patient=w1.patient
order by h.patient,h.time;
quit;
Sure! You can achieve this easily using a PROC SQL
approach or DATA step
in SAS by grouping by patient and then accumulating the drug sequence based on time. Here's one clean solution using PROC SQL
that works well for your scenario:
proc sql;
create table want as
select patient,
catx('',
/* sort and concatenate the drugs by time for each patient */
max(case when time=1 then drug end),
max(case when time=2 then drug end)
) as group
from your_table
group by patient;
quit;
Pass through each ID twice, once to build GROUP, once to output it:
data have;
input patient time drug $;
datalines;
1 1 A
1 2 B
2 1 B
2 2 A
run;
data want;
set have (in=firstpass)
have (in=secondpass);
by id;
retain group ' ' ; /*As many blanks as needed for group */
if first.id then group=drug;
else if firstpass then group=cats(group,drug);
if secondpass;
run;
You're looking to create a new variable, group
, that reflects the drug sequence for each patient based on the time
variable. To do this, we need to sort the data by patient
and time
, then concatenate the drug
values for each patient in the correct order.
Here’s a solution that achieves what you want using a DATA step
:
data want;
input patient time drug $ group $;
;
1 1 A AB
1 2 B AB
2 1 B BA
2 2 A BA
;
run;
/* Step 1: Sort the data by patient and time */
proc sort data=want;
by patient time;
run;
/* Step 2: Concatenate the drug values per patient */
data want_grouped;
set want;
by patient;
retain group_string;
/* Accumulate drug values for each patient */
if first.patient then group_string = ''; /* Reset at start of new patient */
group_string = ('', group_string, drug); /* Append drug to the group string */
/* Output once all records for a patient are accumulated */
if last.patient then do;
group = group_string;
output;
end;
drop group_string;
run;
proc print data=want_grouped;
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.