I need to make table A into Table B:
Table A
Patient_ID | drug_ID | drug |
9900001 | 1 | vinorelbine |
9900001 | 2 | carboplatin |
9900001 | 2 | paclitaxel |
9900003 | 1 | fluorouracil |
9900005 | 1 | 5-FU |
9900005 | 1 | leucovorin |
9900005 | 2 | fluorouracil |
9900005 | 3 | fluorouracil |
9900005 | 3 | leucovorin |
9900008 | 1 | fluorouracil |
9900008 | 2 | fluorouracil |
9900008 | 2 | leucovorin |
9900008 | 3 | fluorouracil |
9900008 | 3 | irinotecan |
9900008 | 3 | leucovorin |
Table B
Patient_ID | drug_ID | drug_combo |
9900001 | 1 | vinorelbine |
9900001 | 2 | carboplatin paclitaxel |
9900003 | 1 | fluorouracil |
9900005 | 1 | 5-FU leucovorin |
9900005 | 2 | fluorouracil |
9900005 | 3 | fluorouracil leucovorin |
9900008 | 1 | fluorouracil |
9900008 | 2 | fluorouracil leucovorin |
9900008 | 3 | fluorouracil irinotecan leucovorin |
data have;
input Patient_ID $ drug_ID drug :$20.;
datalines;
9900001 1 vinorelbine
9900001 2 carboplatin
9900001 2 paclitaxel
9900003 1 fluorouracil
9900005 1 5-FU
9900005 1 leucovorin
9900005 2 fluorouracil
9900005 3 fluorouracil
9900005 3 leucovorin
9900008 1 fluorouracil
9900008 2 fluorouracil
9900008 2 leucovorin
9900008 3 fluorouracil
9900008 3 irinotecan
9900008 3 leucovorin
;
data want(drop = drug);
do until (last.drug_ID);
set have;
by Patient_ID drug_ID;
length drug_combo $200;
drug_combo = catx(' ', drug_combo, drug);
end;
run;
Result:
Patient_ID drug_ID drug_combo 9900001 1 vinorelbine 9900001 2 carboplatin paclitaxel 9900003 1 fluorouracil 9900005 1 5-FU leucovorin 9900005 2 fluorouracil 9900005 3 fluorouracil leucovorin 9900008 1 fluorouracil 9900008 2 fluorouracil leucovorin 9900008 3 fluorouracil irinotecan leucovorin
What exactly is superior about Table B?
Do you expect the result to always be fluorouracil leucovorin, to pick one combination, when the order in the original data might result in leucovorin fluorouracil from different sequences of data?
Do you have any idea what the maximum number of items might need to be combined? This goes to setting the maximum length of the target variable.
Suggestion: If you must combine such place a character between values that will not appear in your values so that it is possible to separate them back to the originals later when it is needed (and likely will be needed). Very easy to do with @PeterClemmensen's code. Place a comma, for example in the CATX function call: Catx(',' ,<other parameters>) for example places a comma between all the values.
With your current approach order matters.
carboplatin paclitaxel will not be the same as paclitaxel carboplatin and because there's no delimiter it'll be harder to separate out drugs that have two names.
If this matters sort your data before you do the combination.
What are you trying to achieve that makes you think this is the correct format?
The employer wants it that way, im not sure.
The two main method are outlined here:
https://gist.github.com/statgeek/d583cfa992bf56da51d435165b07e96a
The length is set to a large number to ensure it can hold the combined values at the end. If we don't know the data, we usually guess an arbitrarily large number. Feel free to customize it to your requirements.
If a solution doesn't work, you need to explain how in details.
Is there an error in the log? It could be anything from a missing semicolon to an incorrect solution.
Saying it doesn't work doesn't convey any information.
@mcalde33 wrote:
it has already been mentioned. I sent the output I was getting, that way I conveyed the output to explain it.
Do you mean this post?
Did it get cut off? Because I'm not seeing any details or explanations of what went wrong or how just a random series of data that looks nothing like your original posted data and therefore have no idea what it means.
9900001 1 n
2 9900001 2 l
3 9900001 2 i
4 9900003 1 c
5 9900005 3 c
6 9900005 2 c
7 9900005 1 9
8 9900005 3 n
9 9900008 3 n
10 9900008 3 n
11 9900008 3 c
12 9900008 2 n
13 9900008 2 c
14 9900008 1 c
The first step is incorrect. Note the log:
NOTE: SAS went to a new line when INPUT statement reached past the end of a line.
NOTE: The data set WORK.DRUGS has 14 observations and 3 variables.
NOTE: DATA statement used (Total process time):
If you print the input data set, the data is truncated from when it's imported, leading to the problem you're seeing.
The solution is actually fine assuming you'd be applying to actual data not just test data.
DATA drugs;
length drug $ 20;
Input Patient_ID Drug_ID Drug;
Datalines;
9900001 1 vinorelbine
9900001 2 carboplatin
9900001 2 paclitaxel
9900003 1 fluorouracil
9900005 1 5-FU
9900005 1 leucovorin
9900005 2 fluorouracil
9900005 3 fluorouracil
9900005 3 leucovorin
9900008 1 fluorouracil
9900008 2 fluorouracil
9900008 2 leucovorin
9900008 3 fluorouracil
9900008 3 irinotecan
9900008 3 leucovorin
;
RUN;
Data Drug_combo (DROP=drug);
DO until (last.drug_ID);
SET drugs;
BY Patient_ID Drug_ID;
LENGTH drug_combo $200;
drug_combo= catx('', drug_combo, drug);
end;
RUN;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.