Solved: Re: Patient drug sequence groups

pink_poodle · Posted 02-06-2024 06:52 PM

I have a table with three variables: patient, time, drug. Let us say there are two patients. First patient received drug A at time 1 followed by drug B at time 2. Second patient received drug B at time 1 followed by drug A at time 2. I want to accumulate a substring for the group variable, such that patient 1 belongs to group "AB" and patient 2 belong to group "BA". Any suggestions are welcome!

data want;
	input patient time drug $ group $;
	datalines;
		1 1 A AB
		1 2 B AB
		2 1 B BA
		2 2 A BA
	;
run;

Ksharp · Posted 02-06-2024 09:37 PM

/*Assuming there are no duplicated obs in your data*/
data have;
	input patient time drug $;
	datalines;
		1 1 A 
		1 2 B 
		2 1 B 
		2 2 A 
	;
run;

data want;
do until(last.patient);
 set have;
 by patient;
 length group $ 10;
 group=cats(group,drug);
end;
do until(last.patient);
 set have;
 by patient;
 output;
end;
run;

View solution in original post

rvikram · Posted 02-06-2024 09:24 PM

What is your question?

pink_poodle · Posted 02-06-2024 09:26 PM

My question is how to accumulate a substring for the group variable, such that patient 1 belongs to group "AB" and patient 2 belong to group "BA" ?

rvikram · Posted 02-06-2024 09:26 PM

If I were you, I’d check the syntax for data lines again.

You need infile statement, and preferably a DLM keyword. Please check again.

pink_poodle · Posted 02-06-2024 09:46 PM

@rvikram, it should work, please give it a try :).

rvikram · Posted 02-06-2024 09:32 PM

If you are looking to group certain variables, you can write a proc sql step with group by.

If you don’t want duplicate values, you can write a proc sort step with nodup, or select distinct within proc sql.

Ksharp · Posted 02-06-2024 09:37 PM

/*Assuming there are no duplicated obs in your data*/
data have;
	input patient time drug $;
	datalines;
		1 1 A 
		1 2 B 
		2 1 B 
		2 2 A 
	;
run;

data want;
do until(last.patient);
 set have;
 by patient;
 length group $ 10;
 group=cats(group,drug);
end;
do until(last.patient);
 set have;
 by patient;
 output;
end;
run;

pink_poodle · Posted 02-06-2024 09:56 PM

@Ksharp , thank you, this is very helpful! I would like to have "ABB" group for duplicates, could you please advice?:

data want;
	input patient time drug $ group $;
	datalines;
		1 1 A ABB
		1 2 B ABB
		1 3 B ABB
		2 1 B BA
		2 2 A BA
	;
run;

Ksharp · Posted 02-06-2024 10:01 PM

Just try my original code. No need any change.

mkeintz · Posted 02-06-2024 09:58 PM

Pass through each ID twice, once to build GROUP, once to output it:

data have;
  input patient time drug $;
  datalines;
1 1 A 
1 2 B 
2 1 B 
2 2 A 
run;

data want;
  set have (in=firstpass)
      have (in=secondpass);
  by id;
  retain group '          ' ;  /*As many blanks as needed for group */
  if first.id then group=drug;
  else if firstpass then group=cats(group,drug);
  if secondpass;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away