Hi all,
I had asked a similar question a while ago but I am running into some issues.
Dataset A contains information on the gene type and list of prescribed drugs associated with it. For example, the prescribed drugs for Gene type ‘a’ are as follows: Afatinib,Neo-Cetuximab,Dacomitinib,Erlotinib,Gefitinib,Lapatinib,Necitumumab,Osimertinib,Panitumumab.
The dataset B contains information on what the patient got prescribed and the dates associated with it. For example, ID#1 got Panitumumab, Binimetinib, and Neo-Cetuximab on different dates.
Therefore, in my merged dataset, I would like to get a row for Panitumimab and a row for Neo-Cetuximab. Further, in case the patient got a drug that is not listed in the drug_list I want it to be missing. For example, ID #6 prescribed drug Idelalisib does not match the drug_list for the specific gene ‘b’. Similar scenario for and ID#7
Here are the datasets and the code I had started using.
data A;
informat id 3. date_a mmddyy10. Gene_type $3. Drug_list $200.;
input id date_a Gene_type Drug_list;
format id 3. date_a mmddyy10. Gene_Type$3. Drug_list $200.;
cards;
1 1/2/2018 a Afatinib,Neo-Cetuximab,Dacomitinib,Erlotinib,Gefitinib,Lapatinib,Necitumumab,Osimertinib,Panitumumab
1 2/10/2019 b Cetuximab,Xanitumumab
2 3/3/2019 c Binimetinib,Cetuximab,Cobimetinib,Idelalisib,Panitumumab,Trametinib
3 1/2/2018 d Alpelisib,Everolimus,Temsirolimus
4 2/2/2019 f Ado-trastuzumab,Afatinib,Dacomitinib,Lapatinib,Neratinib,Pertuzumab,Trastuzumab,Trastuzumab-anns,Trastuzumab-dkst,Trastuzumab-dttb,Trastuzumab-pkrb,Trastuzumab-qyyp
4 2/10/2019 e Bosutinib,Dasatinib,Everolimus,Temsirolimus
5 3/3/2019 a Afatinib,Neo-Cetuximab,Dacomitinib,Erlotinib,Gefitinib,Lapatinib,Necitumumab,Osimertinib,Panitumumab
6 1/2/2018 c Binimetinib,Cetuximab,Cobimetinib,Idelalisib,Panitumumab,Trametinib
6 2/2/2019 b Cetuximab,Xanitumumab
7 2/10/2019 d Alpelisib,Everolimus,Temsirolimus
7 3/10/2019 b Cetuximab,Xanitumumab
;;;
proc sort; by id; run;
data B;
informat id 3. date_b mmddyy10. Drug $20.;
input id date_b Drug;
format id 3. date_b mmddyy10. Drug $200.;
cards;
1 2/01/2018 Panitumumab
1 3/02/2019 Binimetinib
1 3/28/2019 Neo-Cetuximab
2 3/23/2019 Trametinib
2 3/25/2019 Temsirolimus
3 2/01/2018 Alpelisib
5 3/21/2019 Dacomitinib
4 3/01/2019 Everolimus
4 4/01/2019 Cobimetinib
4 5/01/2019 Ado-trastuzumab
6 1/17/2018 Idelalisib
7 3/02/2019 Temsirolimus
7 4/01/2019 Binimetinib
;;;
Data Want:
ID | Date_a | Gene_type | Drug_list | Drug | Date_b |
1 | 1/2/2018 | a | Afatinib,Neo-Cetuximab,Dacomitinib,Erlotinib,Gefitinib,Lapatinib,Necitumumab,Osimertinib,Panitumumab | Panitumumab | 2/1/2018 |
1 | 1/2/2018 | a | Afatinib,Neo-Cetuximab,Dacomitinib,Erlotinib,Gefitinib,Lapatinib,Necitumumab,Osimertinib,Panitumumab | Neo-Cetuximab | 3/28/2019 |
1 | 2/10/2019 | b | Cetuximab,Xanitumumab | ||
2 | 3/3/2019 | c | Binimetinib,Cetuximab,Cobimetinib,Idelalisib,Panitumumab,Trametinib | Trametinib | 3/23/2019 |
3 | 1/2/2018 | d | Alpelisib,Everolimus,Temsirolimus | Alpelisib | 2/1/2018 |
4 | 2/2/2019 | f | Ado-trastuzumab,Afatinib,Dacomitinib,Lapatinib,Neratinib,Pertuzumab,Trastuzumab,Trastuzumab-anns,Trastuzumab-dkst,Trastuzumab-dttb | Ado-trastuzumab | 5/1/2019 |
4 | 2/10/2019 | e | Bosutinib,Dasatinib,Everolimus,Temsirolimus | Everolimus | 3/1/2019 |
5 | 3/3/2019 | a | Afatinib,Neo-Cetuximab,Dacomitinib,Erlotinib,Gefitinib,Lapatinib,Necitumumab,Osimertinib,Panitumumab | Dacomitinib | 3/21/2019 |
6 | 1/2/2018 | c | Binimetinib,Cetuximab,Cobimetinib,Idelalisib,Panitumumab,Trametinib | Idelalisib | 1/17/2018 |
6 | 2/2/2019 | b | Cetuximab,Xanitumumab | ||
7 | 2/10/2019 | d | Alpelisib,Everolimus,Temsirolimus | Temsirolimus | 3/2/2019 |
7 | 3/10/2019 | b | Cetuximab,Xanitumumab |
Proc sql;
Create table c2 as select a.*, b.Drug,b.Date_b
from a a join b b on a.id=b.id
and findw( a.drug_list,strip(b.drug),',') > 0 and (b.date_b);
Quit;
proc sql;
select GENES.*, DRUGS.DRUG, DRUGS.DATE_B
from GENES left join DRUGS
on GENES.ID=DRUGS.ID
and findw( GENES.DRUG_LIST, DRUGS.DRUG, ',- ', 't')
order by GENES.ID, GENE_TYPE ;
quit;
ID | DATE_A | GENE_TYPE | DRUG_LIST | DRUG | DATE_B |
---|---|---|---|---|---|
1 | 01/02/2018 | a | Afatinib,Neo-Cetuximab,Dacomitinib,Erlotinib,Gefitinib,Lapatinib,Necitumumab,Osimertinib,Panitumumab | Neo-Cetuximab | 03/28/2019 |
1 | 01/02/2018 | a | Afatinib,Neo-Cetuximab,Dacomitinib,Erlotinib,Gefitinib,Lapatinib,Necitumumab,Osimertinib,Panitumumab | Panitumumab | 02/01/2018 |
1 | 02/10/2019 | b | Cetuximab,Xanitumumab | . | |
2 | 03/03/2019 | c | Binimetinib,Cetuximab,Cobimetinib,Idelalisib,Panitumumab,Trametinib | Trametinib | 03/23/2019 |
3 | 01/02/2018 | d | Alpelisib,Everolimus,Temsirolimus | Alpelisib | 02/01/2018 |
4 | 02/10/2019 | e | Bosutinib,Dasatinib,Everolimus,Temsirolimus | Everolimus | 03/01/2019 |
4 | 02/02/2019 | f | Ado-trastuzumab,Afatinib,Dacomitinib,Lapatinib,Neratinib,Pertuzumab,Trastuzumab,Trastuzumab-anns,Trastuzumab-dkst,Trastuzumab-dttb,Trastuzumab-pkrb,Trastuzumab-qyyp | Ado-trastuzumab | 05/01/2019 |
5 | 03/03/2019 | a | Afatinib,Neo-Cetuximab,Dacomitinib,Erlotinib,Gefitinib,Lapatinib,Necitumumab,Osimertinib,Panitumumab | Dacomitinib | 03/21/2019 |
6 | 02/02/2019 | b | Cetuximab,Xanitumumab | . | |
6 | 01/02/2018 | c | Binimetinib,Cetuximab,Cobimetinib,Idelalisib,Panitumumab,Trametinib | Idelalisib | 01/17/2018 |
7 | 03/10/2019 | b | Cetuximab,Xanitumumab | . | |
7 | 02/10/2019 | d | Alpelisib,Everolimus,Temsirolimus | Temsirolimus | 03/02/2019 |
so Neo-Cetuximab matches with Cetuximab?
> They are unique and different.
OK just replace
findw( GENES.DRUG_LIST, prxchange('s/\A(Ado|Neo)-//',1,DRUGS.DRUG), ',- ', 't')
with
findw( GENES.DRUG_LIST, DRUGS.DRUG), ', ', 't')
then, and report.
1. Please format your code and your data
2. Something like this is what you need. Please alter to suit:
proc sql;
create table WANT as
select GENES.*, DRUGS.DRUG, DRUGS.DATE_B
from GENES join DRUGS
on GENES.ID=DRUGS.ID
and findw( GENES.DRUG_LIST, prxchange('s/\A(Ado|Neo)-//',1,DRUGS.DRUG), ',- ', 't')
order by GENES.ID, GENE_TYPE ;
quit;
ID | DATE_A | GENE_TYPE | DRUG_LIST | DRUG | DATE_B |
---|---|---|---|---|---|
1 | 01/02/2018 | a | Afatinib,Neo-Cetuximab,Dacomitinib,Erlotinib,Gefitinib,Lapatinib,Necitumumab,Osimertinib,Panitumumab | Neo-Cetuximab | 03/28/2019 |
1 | 01/02/2018 | a | Afatinib,Neo-Cetuximab,Dacomitinib,Erlotinib,Gefitinib,Lapatinib,Necitumumab,Osimertinib,Panitumumab | Panitumumab | 02/01/2018 |
1 | 02/10/2019 | b | Cetuximab,Xanitumumab | Neo-Cetuximab | 03/28/2019 |
2 | 03/03/2019 | c | Binimetinib,Cetuximab,Cobimetinib,Idelalisib,Panitumumab,Trametinib | Trametinib | 03/23/2019 |
3 | 01/02/2018 | d | Alpelisib,Everolimus,Temsirolimus | Alpelisib | 02/01/2018 |
4 | 02/10/2019 | e | Bosutinib,Dasatinib,Everolimus,Temsirolimus | Everolimus | 03/01/2019 |
4 | 02/02/2019 | f | Ado-trastuzumab,Afatinib,Dacomitinib,Lapatinib,Neratinib,Pertuzumab,Trastuzumab,Trastuzumab-anns,Trastuzumab-dkst,Trastuzumab-dttb,Trastuzumab-pkrb,Trastuzumab-qyyp | Ado-trastuzumab | 05/01/2019 |
5 | 03/03/2019 | a | Afatinib,Neo-Cetuximab,Dacomitinib,Erlotinib,Gefitinib,Lapatinib,Necitumumab,Osimertinib,Panitumumab | Dacomitinib | 03/21/2019 |
6 | 01/02/2018 | c | Binimetinib,Cetuximab,Cobimetinib,Idelalisib,Panitumumab,Trametinib | Idelalisib | 01/17/2018 |
7 | 02/10/2019 | d | Alpelisib,Everolimus,Temsirolimus | Temsirolimus | 03/02/2019 |
Thanks for this:
I used this code and got the output for ID#1 shows Neo-Cetuximab merging with Cetuximab. It should be a empty cell for drug and date_b because #ID1 was never given Cetuximab.
Basically, each drug name is unique. Neo-Cetuximab is different from Cetuximab and so does any drug with Ado- or any other prefix.I know it's probably a subtle change in the code. I am still trying..thanks again
proc sql;
create table WANT as
select GENES.*, DRUGS.DRUG, DRUGS.DATE_B
from GENES left join DRUGS
on GENES.ID=DRUGS.ID
and findw( GENES.DRUG_LIST, prxchange('s/\A(Ado|Neo)-//',1,DRUGS.DRUG), ',- ', 't')
order by GENES.ID, GENE_TYPE ;
quit;
Please format your code and data.
proc sql;
select GENES.*, DRUGS.DRUG, DRUGS.DATE_B
from GENES left join DRUGS
on GENES.ID=DRUGS.ID
and findw( GENES.DRUG_LIST, DRUGS.DRUG, ',- ', 't')
order by GENES.ID, GENE_TYPE ;
quit;
ID | DATE_A | GENE_TYPE | DRUG_LIST | DRUG | DATE_B |
---|---|---|---|---|---|
1 | 01/02/2018 | a | Afatinib,Neo-Cetuximab,Dacomitinib,Erlotinib,Gefitinib,Lapatinib,Necitumumab,Osimertinib,Panitumumab | Neo-Cetuximab | 03/28/2019 |
1 | 01/02/2018 | a | Afatinib,Neo-Cetuximab,Dacomitinib,Erlotinib,Gefitinib,Lapatinib,Necitumumab,Osimertinib,Panitumumab | Panitumumab | 02/01/2018 |
1 | 02/10/2019 | b | Cetuximab,Xanitumumab | . | |
2 | 03/03/2019 | c | Binimetinib,Cetuximab,Cobimetinib,Idelalisib,Panitumumab,Trametinib | Trametinib | 03/23/2019 |
3 | 01/02/2018 | d | Alpelisib,Everolimus,Temsirolimus | Alpelisib | 02/01/2018 |
4 | 02/10/2019 | e | Bosutinib,Dasatinib,Everolimus,Temsirolimus | Everolimus | 03/01/2019 |
4 | 02/02/2019 | f | Ado-trastuzumab,Afatinib,Dacomitinib,Lapatinib,Neratinib,Pertuzumab,Trastuzumab,Trastuzumab-anns,Trastuzumab-dkst,Trastuzumab-dttb,Trastuzumab-pkrb,Trastuzumab-qyyp | Ado-trastuzumab | 05/01/2019 |
5 | 03/03/2019 | a | Afatinib,Neo-Cetuximab,Dacomitinib,Erlotinib,Gefitinib,Lapatinib,Necitumumab,Osimertinib,Panitumumab | Dacomitinib | 03/21/2019 |
6 | 02/02/2019 | b | Cetuximab,Xanitumumab | . | |
6 | 01/02/2018 | c | Binimetinib,Cetuximab,Cobimetinib,Idelalisib,Panitumumab,Trametinib | Idelalisib | 01/17/2018 |
7 | 03/10/2019 | b | Cetuximab,Xanitumumab | . | |
7 | 02/10/2019 | d | Alpelisib,Everolimus,Temsirolimus | Temsirolimus | 03/02/2019 |
Maybe remove the dash as a separator in the findw() function.
And don't forget to format and reread your question if you want to be helped.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.