Solved: Re: Patient drug sequence groups

pink_poodle · Posted 02-06-2024 06:52 PM

I have a table with three variables: patient, time, drug. Let us say there are two patients. First patient received drug A at time 1 followed by drug B at time 2. Second patient received drug B at time 1 followed by drug A at time 2. I want to accumulate a substring for the group variable, such that patient 1 belongs to group "AB" and patient 2 belong to group "BA". Any suggestions are welcome!

data want;
	input patient time drug $ group $;
	datalines;
		1 1 A AB
		1 2 B AB
		2 1 B BA
		2 2 A BA
	;
run;

Ksharp · Posted 02-06-2024 09:37 PM

/*Assuming there are no duplicated obs in your data*/
data have;
	input patient time drug $;
	datalines;
		1 1 A 
		1 2 B 
		2 1 B 
		2 2 A 
	;
run;

data want;
do until(last.patient);
 set have;
 by patient;
 length group $ 10;
 group=cats(group,drug);
end;
do until(last.patient);
 set have;
 by patient;
 output;
end;
run;

View solution in original post

rvikram · Posted 02-06-2024 09:24 PM

What is your question?

pink_poodle · Posted 02-06-2024 09:26 PM

My question is how to accumulate a substring for the group variable, such that patient 1 belongs to group "AB" and patient 2 belong to group "BA" ?

rvikram · Posted 02-06-2024 09:26 PM

If I were you, I’d check the syntax for data lines again.

You need infile statement, and preferably a DLM keyword. Please check again.

pink_poodle · Posted 02-06-2024 09:46 PM

@rvikram, it should work, please give it a try :).

rvikram · Posted 02-06-2024 09:32 PM

If you are looking to group certain variables, you can write a proc sql step with group by.

If you don’t want duplicate values, you can write a proc sort step with nodup, or select distinct within proc sql.

Ksharp · Posted 02-06-2024 09:37 PM

/*Assuming there are no duplicated obs in your data*/
data have;
	input patient time drug $;
	datalines;
		1 1 A 
		1 2 B 
		2 1 B 
		2 2 A 
	;
run;

data want;
do until(last.patient);
 set have;
 by patient;
 length group $ 10;
 group=cats(group,drug);
end;
do until(last.patient);
 set have;
 by patient;
 output;
end;
run;

pink_poodle · Posted 02-06-2024 09:56 PM

@Ksharp , thank you, this is very helpful! I would like to have "ABB" group for duplicates, could you please advice?:

data want;
	input patient time drug $ group $;
	datalines;
		1 1 A ABB
		1 2 B ABB
		1 3 B ABB
		2 1 B BA
		2 2 A BA
	;
run;

Ksharp · Posted 02-06-2024 10:01 PM

Just try my original code. No need any change.

dxiao2017 · Posted 08-08-2025 08:52 AM

Hi @pink_poodle , after tried many times using if first. and retain etc. techniques and failed, I suddenly realized that this is a very simple problem about proc transpose, the code is as simple as follows :

data have;
	input patient time drug $;
	datalines;
1 1 A
1 2 B
1 3 B
2 1 B
2 2 A
;
run;
proc print data=have;run;
proc transpose data=have out=want;
   by patient;
   id time;
   var drug;
run;
proc print data=want;run;

And then you can use cat() function to put column 1,2,3 together to produce the patient group column of this typical cross over design. The conclusion is: proc transpose is the simple and efficient solution.

dxiao2017 · Posted 08-08-2025 09:26 AM

Hi @pink_poodle , the complete steps are as follows (using proc transpose to produce the group column):

data have;
	input patient time drug $;
	datalines;
1 1 A
1 2 B
1 3 B
2 1 B
2 2 A
;
run;
proc print data=have;run;
data have1;
   set have;
   timechar=cat('t',put(time,1.));
run;
proc print data=have1;run;
proc transpose data=have1 out=want;
   by patient;
   id timechar;
   var drug;
run;
proc print data=want;run;
data want1;
   set want;
   group=compress(cat(t1,t2,t3));
run;
proc print data=want1;run;
proc sql;
select h.*,
       w1.group
   from have as h left join
        want1 w1
   on h.patient=w1.patient;
quit;

dxiao2017 · Posted 08-08-2025 10:02 AM

The last table somehow changed the order of the time column. So order by should be used here:

proc sql;
select h.*,
       w1.group
   from have as h left join
        want1 w1
   on h.patient=w1.patient
   order by h.patient,h.time;
quit;

ArdigenSupport · Posted 07-25-2025 04:47 PM

Sure! You can achieve this easily using a PROC SQL approach or DATA step in SAS by grouping by patient and then accumulating the drug sequence based on time. Here's one clean solution using PROC SQL that works well for your scenario:
proc sql;
create table want as
select patient,
catx('',
/* sort and concatenate the drugs by time for each patient */
max(case when time=1 then drug end),
max(case when time=2 then drug end)
) as group
from your_table
group by patient;
quit;

mkeintz · Posted 02-06-2024 09:58 PM

Pass through each ID twice, once to build GROUP, once to output it:

data have;
  input patient time drug $;
  datalines;
1 1 A 
1 2 B 
2 1 B 
2 2 A 
run;

data want;
  set have (in=firstpass)
      have (in=secondpass);
  by id;
  retain group '          ' ;  /*As many blanks as needed for group */
  if first.id then group=drug;
  else if firstpass then group=cats(group,drug);
  if secondpass;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

ArdigenSupport · Posted 07-25-2025 04:55 PM

You're looking to create a new variable, group, that reflects the drug sequence for each patient based on the time variable. To do this, we need to sort the data by patient and time, then concatenate the drug values for each patient in the correct order.

Here’s a solution that achieves what you want using a DATA step:

data want;
input patient time drug $ group $;
;
1 1 A AB
1 2 B AB
2 1 B BA
2 2 A BA
;
run;

/* Step 1: Sort the data by patient and time */
proc sort data=want;
by patient time;
run;

/* Step 2: Concatenate the drug values per patient */
data want_grouped;
set want;
by patient;

retain group_string;

/* Accumulate drug values for each patient */
if first.patient then group_string = ''; /* Reset at start of new patient */
group_string = ('', group_string, drug); /* Append drug to the group string */

/* Output once all records for a patient are accumulated */
if last.patient then do;
group = group_string;
output;
end;

drop group_string;
run;

proc print data=want_grouped;
run;

Registration is open

SAS Training: Just a Click Away