BookmarkSubscribeRSS Feed
mcalde33
Obsidian | Level 7

I need to make table A into Table B:

 

Table A

 

Patient_ID

drug_ID

drug

 

9900001

1

vinorelbine

9900001

2

carboplatin

9900001

2

paclitaxel

9900003

1

fluorouracil

9900005

1

5-FU

9900005

1

leucovorin

9900005

2

fluorouracil

9900005

3

fluorouracil

9900005

3

leucovorin

9900008

1

fluorouracil

9900008

2

fluorouracil

9900008

2

leucovorin

9900008

3

fluorouracil

9900008

3

irinotecan

9900008

3

leucovorin

 

Table B

 

Patient_ID

drug_ID

drug_combo

 

9900001

1

vinorelbine

9900001

2

carboplatin paclitaxel

9900003

1

fluorouracil

9900005

1

5-FU leucovorin

9900005

2

fluorouracil

9900005

3

fluorouracil leucovorin

9900008

1

fluorouracil

9900008

2

fluorouracil leucovorin

9900008

3

fluorouracil irinotecan leucovorin

 

11 REPLIES 11
PeterClemmensen
Tourmaline | Level 20
data have;
input Patient_ID $ drug_ID drug :$20.;
datalines;
9900001 1 vinorelbine  
9900001 2 carboplatin  
9900001 2 paclitaxel   
9900003 1 fluorouracil 
9900005 1 5-FU         
9900005 1 leucovorin   
9900005 2 fluorouracil 
9900005 3 fluorouracil 
9900005 3 leucovorin   
9900008 1 fluorouracil 
9900008 2 fluorouracil 
9900008 2 leucovorin   
9900008 3 fluorouracil 
9900008 3 irinotecan   
9900008 3 leucovorin   
;

data want(drop = drug);
   do until (last.drug_ID);
      set have;
      by Patient_ID drug_ID;
      length drug_combo $200;
      drug_combo = catx(' ', drug_combo, drug);
   end;
run;

 

Result:

 

Patient_ID  drug_ID  drug_combo
9900001     1        vinorelbine
9900001     2        carboplatin paclitaxel
9900003     1        fluorouracil
9900005     1        5-FU leucovorin
9900005     2        fluorouracil
9900005     3        fluorouracil leucovorin
9900008     1        fluorouracil
9900008     2        fluorouracil leucovorin
9900008     3        fluorouracil irinotecan leucovorin
mcalde33
Obsidian | Level 7
This didnt work for me, and why was 200 used as the length of drug combo?
mcalde33
Obsidian | Level 7
9900001 1 n
2 9900001 2 l
3 9900001 2 i
4 9900003 1 c
5 9900005 3 c
6 9900005 2 c
7 9900005 1 9
8 9900005 3 n
9 9900008 3 n
10 9900008 3 n
11 9900008 3 c
12 9900008 2 n
13 9900008 2 c
14 9900008 1 c
ballardw
Super User

What exactly is superior about Table B?

Do you expect the result to always be fluorouracil leucovorin, to pick one combination, when the order in the original data might result in leucovorin fluorouracil from different sequences of data?

Do you have any idea what the maximum number of items might need to be combined? This goes to setting the maximum length of the target variable.

 

Suggestion: If you must combine such place a character between values that will not appear in your values so that it is possible to separate them back to the originals later when it is needed (and likely will be needed). Very easy to do with @PeterClemmensen's code. Place a comma, for example in the CATX function call: Catx(',' ,<other parameters>) for example places a comma between all the values.

Reeza
Super User

With your current approach order matters. 

carboplatin paclitaxel will not be the same as paclitaxel carboplatin and because there's no delimiter it'll be harder to separate out drugs that have two names. 

If this matters sort your data before you do the combination.

 

What are you trying to achieve that makes you think this is the correct format?

 

 

mcalde33
Obsidian | Level 7

The employer wants it that way, im not sure.

Reeza
Super User

The two main method are outlined here:

https://gist.github.com/statgeek/d583cfa992bf56da51d435165b07e96a

 

The length is set to a large number to ensure it can hold the combined values at the end. If we don't know the data, we usually guess an arbitrarily large number. Feel free to customize it to your requirements. 


If a solution doesn't work, you need to explain how in details. 

Is there an error in the log? It could be anything from a missing semicolon to an incorrect solution. 
Saying it doesn't work doesn't convey any information. 

mcalde33
Obsidian | Level 7
it has already been mentioned. I sent the output I was getting, that way I conveyed the output to explain it.
Reeza
Super User

@mcalde33 wrote:
it has already been mentioned. I sent the output I was getting, that way I conveyed the output to explain it.

Do you mean this post?

Did it get cut off? Because I'm not seeing any details or explanations of what went wrong or how just a random series of data that looks nothing like your original posted data and therefore have no idea what it means.

 

9900001 1 n
2 9900001 2 l
3 9900001 2 i
4 9900003 1 c
5 9900005 3 c
6 9900005 2 c
7 9900005 1 9
8 9900005 3 n
9 9900008 3 n
10 9900008 3 n
11 9900008 3 c
12 9900008 2 n
13 9900008 2 c
14 9900008 1 c


 

mcalde33
Obsidian | Level 7
This is the code used:
DATA drugs;
Input Patient_ID Drug_ID Drug :$20;

Datalines;

9900001 1 vinorelbine
9900001 2 carboplatin
9900001 2 paclitaxel
9900003 1 fluorouracil
9900005 1 5-FU
9900005 1 leucovorin
9900005 2 fluorouracil
9900005 3 fluorouracil
9900005 3 leucovorin
9900008 1 fluorouracil
9900008 2 fluorouracil
9900008 2 leucovorin
9900008 3 fluorouracil
9900008 3 irinotecan
9900008 3 leucovorin
;

RUN;

Data Drug_combo (DROP=drug);
DO until (last.drug_ID);
SET drugs;
BY Patient_ID Drug_ID;
LENGTH drug_combo $200;
drug_combo= catx('', drug_combo, drug);
end;
RUN;



My log:

1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
68
69 DATA drugs;
70 Input Patient_ID Drug_ID Drug :$20;
71
72 Datalines;

NOTE: SAS went to a new line when INPUT statement reached past the end of a line.
NOTE: The data set WORK.DRUGS has 14 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 669.78k
OS Memory 23976.00k
Timestamp 10/25/2022 10:23:33 PM
Step Count 24 Switch Count 2
Page Faults 0
Page Reclaims 167
Page Swaps 0
Voluntary Context Switches 13
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 264


89 ;
90
91 RUN;
92
93 Data Drug_combo (DROP=drug);
94 DO until (last.drug_ID);
95 SET drugs;
96 BY Patient_ID Drug_ID;
97 LENGTH drug_combo $200;
98 drug_combo= catx('', drug_combo, drug);
99 end;
100 RUN;

NOTE: There were 14 observations read from the data set WORK.DRUGS.
NOTE: The data set WORK.DRUG_COMBO has 9 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 1070.75k
OS Memory 24236.00k
Timestamp 10/25/2022 10:23:33 PM
Step Count 25 Switch Count 2
Page Faults 0
Page Reclaims 210
Page Swaps 0
Voluntary Context Switches 13
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 264


101
102
103
104
105
106 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
116


My output:

Obs Patient_ID Drug_ID Drug
1 9900008 3 c
2 9900008 2 c
3 9900008 2 n
4 9900008 1 c
5 9900008 3 n
6 9900008 3 n
7 9900005 3 c
8 9900005 2 c
9 9900005 1 9
10 9900005 3 n
11 9900003 1 c
12 9900001 2 l
13 9900001 1 n
14 9900001 2 i
Reeza
Super User

The first step is incorrect. Note the log:

 

NOTE: SAS went to a new line when INPUT statement reached past the end of a line.
NOTE: The data set WORK.DRUGS has 14 observations and 3 variables.
NOTE: DATA statement used (Total process time):

 

If you print the input data set, the data is truncated from when it's imported, leading to the problem you're seeing.

The solution is actually fine assuming you'd be applying to actual data not just test data.

 

DATA drugs;
length drug $ 20;
Input Patient_ID Drug_ID Drug;

Datalines;

9900001 1 vinorelbine
9900001 2 carboplatin
9900001 2 paclitaxel
9900003 1 fluorouracil
9900005 1 5-FU
9900005 1 leucovorin
9900005 2 fluorouracil
9900005 3 fluorouracil
9900005 3 leucovorin
9900008 1 fluorouracil
9900008 2 fluorouracil
9900008 2 leucovorin
9900008 3 fluorouracil
9900008 3 irinotecan
9900008 3 leucovorin
;

RUN;



Data Drug_combo (DROP=drug);
DO until (last.drug_ID);
SET drugs;
BY Patient_ID Drug_ID;
LENGTH drug_combo $200;
drug_combo= catx('', drug_combo, drug);
end;
RUN;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 1125 views
  • 0 likes
  • 4 in conversation