BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
pink_poodle
Barite | Level 11

I have a table with three variables: patient, time, drug. Let us say there are two patients. First patient received drug A at time 1 followed by drug B at time 2. Second patient received drug B at time 1 followed by drug A at time 2. I want to accumulate a substring for the group variable, such that patient 1 belongs to group "AB" and patient 2  belong to group "BA". Any suggestions are welcome!

data want;
	input patient time drug $ group $;
	datalines;
		1 1 A AB
		1 2 B AB
		2 1 B BA
		2 2 A BA
	;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
/*Assuming there are no duplicated obs in your data*/
data have;
	input patient time drug $;
	datalines;
		1 1 A 
		1 2 B 
		2 1 B 
		2 2 A 
	;
run;

data want;
do until(last.patient);
 set have;
 by patient;
 length group $ 10;
 group=cats(group,drug);
end;
do until(last.patient);
 set have;
 by patient;
 output;
end;
run;

View solution in original post

16 REPLIES 16
rvikram
Fluorite | Level 6
What is your question?
pink_poodle
Barite | Level 11
My question is how to accumulate a substring for the group variable, such that patient 1 belongs to group "AB" and patient 2 belong to group "BA" ?
rvikram
Fluorite | Level 6
If I were you, I’d check the syntax for data lines again.

You need infile statement, and preferably a DLM keyword. Please check again.
pink_poodle
Barite | Level 11
@rvikram, it should work, please give it a try :).
rvikram
Fluorite | Level 6
If you are looking to group certain variables, you can write a proc sql step with group by.

If you don’t want duplicate values, you can write a proc sort step with nodup, or select distinct within proc sql.
Ksharp
Super User
/*Assuming there are no duplicated obs in your data*/
data have;
	input patient time drug $;
	datalines;
		1 1 A 
		1 2 B 
		2 1 B 
		2 2 A 
	;
run;

data want;
do until(last.patient);
 set have;
 by patient;
 length group $ 10;
 group=cats(group,drug);
end;
do until(last.patient);
 set have;
 by patient;
 output;
end;
run;
pink_poodle
Barite | Level 11

@Ksharp , thank you, this is very helpful! I would like to have "ABB" group for duplicates, could you please advice?:

data want;
	input patient time drug $ group $;
	datalines;
		1 1 A ABB
		1 2 B ABB
		1 3 B ABB
		2 1 B BA
		2 2 A BA
	;
run;
Ksharp
Super User
Just try my original code. No need any change.
dxiao2017
Lapis Lazuli | Level 10

Hi @pink_poodle , after tried many times using if first. and retain etc. techniques and failed, I suddenly realized that this is a very simple problem about proc transpose, the code is as simple as follows :

data have;
	input patient time drug $;
	datalines;
1 1 A
1 2 B
1 3 B
2 1 B
2 2 A
;
run;
proc print data=have;run;
proc transpose data=have out=want;
   by patient;
   id time;
   var drug;
run;
proc print data=want;run;

dxiao2017_1-1754657308369.png

And then you can use cat() function to put column 1,2,3 together to produce the patient group column of this typical cross over design. The conclusion is: proc transpose is the simple and efficient solution.

dxiao2017
Lapis Lazuli | Level 10

Hi @pink_poodle , the complete steps are as follows (using proc transpose to produce the group column):

data have;
	input patient time drug $;
	datalines;
1 1 A
1 2 B
1 3 B
2 1 B
2 2 A
;
run;
proc print data=have;run;
data have1;
   set have;
   timechar=cat('t',put(time,1.));
run;
proc print data=have1;run;
proc transpose data=have1 out=want;
   by patient;
   id timechar;
   var drug;
run;
proc print data=want;run;
data want1;
   set want;
   group=compress(cat(t1,t2,t3));
run;
proc print data=want1;run;
proc sql;
select h.*,
       w1.group
   from have as h left join
        want1 w1
   on h.patient=w1.patient;
quit;

dxiao2017_0-1754659476413.png

 

dxiao2017_1-1754659535104.png

dxiao2017
Lapis Lazuli | Level 10

The last table somehow changed the order of the time column. So order by should be used here:

proc sql;
select h.*,
       w1.group
   from have as h left join
        want1 w1
   on h.patient=w1.patient
   order by h.patient,h.time;
quit;

dxiao2017_0-1754661706351.png

ArdigenSupport
Fluorite | Level 6

Sure! You can achieve this easily using a PROC SQL approach or DATA step in SAS by grouping by patient and then accumulating the drug sequence based on time. Here's one clean solution using PROC SQL that works well for your scenario:
proc sql;
create table want as
select patient,
catx('',
/* sort and concatenate the drugs by time for each patient */
max(case when time=1 then drug end),
max(case when time=2 then drug end)
) as group
from your_table
group by patient;
quit;

mkeintz
PROC Star

Pass through each ID twice, once to build GROUP, once to output it:

 

data have;
  input patient time drug $;
  datalines;
1 1 A 
1 2 B 
2 1 B 
2 2 A 
run;

data want;
  set have (in=firstpass)
      have (in=secondpass);
  by id;
  retain group '          ' ;  /*As many blanks as needed for group */
  if first.id then group=drug;
  else if firstpass then group=cats(group,drug);
  if secondpass;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ArdigenSupport
Fluorite | Level 6

You're looking to create a new variable, group, that reflects the drug sequence for each patient based on the time variable. To do this, we need to sort the data by patient and time, then concatenate the drug values for each patient in the correct order.

Here’s a solution that achieves what you want using a DATA step:

data want;
input patient time drug $ group $;
;
1 1 A AB
1 2 B AB
2 1 B BA
2 2 A BA
;
run;

/* Step 1: Sort the data by patient and time */
proc sort data=want;
by patient time;
run;

/* Step 2: Concatenate the drug values per patient */
data want_grouped;
set want;
by patient;

retain group_string;

/* Accumulate drug values for each patient */
if first.patient then group_string = ''; /* Reset at start of new patient */
group_string = ('', group_string, drug); /* Append drug to the group string */

/* Output once all records for a patient are accumulated */
if last.patient then do;
group = group_string;
output;
end;

drop group_string;
run;

proc print data=want_grouped;
run;

hackathon24-white-horiz.png

Join the 2025 SAS Hackathon!

Calling all data scientists and open-source enthusiasts! Want to solve real problems that impact your company or the world? Register to hack by August 31st!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 16 replies
  • 3609 views
  • 8 likes
  • 6 in conversation