Hi all,
I had asked a similar question a while ago but I am running into some issues.
Dataset A contains information on the gene type and list of prescribed drugs associated with it. For example, the prescribed drugs for Gene type ‘a’ are as follows: Afatinib,Neo-Cetuximab,Dacomitinib,Erlotinib,Gefitinib,Lapatinib,Necitumumab,Osimertinib,Panitumumab.
The dataset B contains information on what the patient got prescribed and the dates associated with it. For example, ID#1 got Panitumumab, Binimetinib, and Neo-Cetuximab on different dates.
Therefore, in my merged dataset, I would like to get a row for Panitumimab and a row for Neo-Cetuximab. Further, in case the patient got a drug that is not listed in the drug_list I want it to be missing. For example, ID #6 prescribed drug Idelalisib does not match the drug_list for the specific gene ‘b’. Similar scenario for and ID#7
Here are the datasets and the code I had started using.
data A;
informat id 3. date_a mmddyy10. Gene_type $3. Drug_list $200.;
input id date_a Gene_type Drug_list;
format id 3. date_a mmddyy10. Gene_Type$3. Drug_list $200.;
cards;
1 1/2/2018 a Afatinib,Neo-Cetuximab,Dacomitinib,Erlotinib,Gefitinib,Lapatinib,Necitumumab,Osimertinib,Panitumumab
1 2/10/2019 b Cetuximab,Xanitumumab
2 3/3/2019 c Binimetinib,Cetuximab,Cobimetinib,Idelalisib,Panitumumab,Trametinib
3 1/2/2018 d Alpelisib,Everolimus,Temsirolimus
4 2/2/2019 f Ado-trastuzumab,Afatinib,Dacomitinib,Lapatinib,Neratinib,Pertuzumab,Trastuzumab,Trastuzumab-anns,Trastuzumab-dkst,Trastuzumab-dttb,Trastuzumab-pkrb,Trastuzumab-qyyp
4 2/10/2019 e Bosutinib,Dasatinib,Everolimus,Temsirolimus
5 3/3/2019 a Afatinib,Neo-Cetuximab,Dacomitinib,Erlotinib,Gefitinib,Lapatinib,Necitumumab,Osimertinib,Panitumumab
6 1/2/2018 c Binimetinib,Cetuximab,Cobimetinib,Idelalisib,Panitumumab,Trametinib
6 2/2/2019 b Cetuximab,Xanitumumab
7 2/10/2019 d Alpelisib,Everolimus,Temsirolimus
7 3/10/2019 b Cetuximab,Xanitumumab
;;;
proc sort; by id; run;
data B;
informat id 3. date_b mmddyy10. Drug $20.;
input id date_b Drug;
format id 3. date_b mmddyy10. Drug $200.;
cards;
1 2/01/2018 Panitumumab
1 3/02/2019 Binimetinib
1 3/28/2019 Neo-Cetuximab
2 3/23/2019 Trametinib
2 3/25/2019 Temsirolimus
3 2/01/2018 Alpelisib
5 3/21/2019 Dacomitinib
4 3/01/2019 Everolimus
4 4/01/2019 Cobimetinib
4 5/01/2019 Ado-trastuzumab
6 1/17/2018 Idelalisib
7 3/02/2019 Temsirolimus
7 4/01/2019 Binimetinib
;;;
Data Want:
ID | Date_a | Gene_type | Drug_list | Drug | Date_b |
1 | 1/2/2018 | a | Afatinib,Neo-Cetuximab,Dacomitinib,Erlotinib,Gefitinib,Lapatinib,Necitumumab,Osimertinib,Panitumumab | Panitumumab | 2/1/2018 |
1 | 1/2/2018 | a | Afatinib,Neo-Cetuximab,Dacomitinib,Erlotinib,Gefitinib,Lapatinib,Necitumumab,Osimertinib,Panitumumab | Neo-Cetuximab | 3/28/2019 |
1 | 2/10/2019 | b | Cetuximab,Xanitumumab | ||
2 | 3/3/2019 | c | Binimetinib,Cetuximab,Cobimetinib,Idelalisib,Panitumumab,Trametinib | Trametinib | 3/23/2019 |
3 | 1/2/2018 | d | Alpelisib,Everolimus,Temsirolimus | Alpelisib | 2/1/2018 |
4 | 2/2/2019 | f | Ado-trastuzumab,Afatinib,Dacomitinib,Lapatinib,Neratinib,Pertuzumab,Trastuzumab,Trastuzumab-anns,Trastuzumab-dkst,Trastuzumab-dttb | Ado-trastuzumab | 5/1/2019 |
4 | 2/10/2019 | e | Bosutinib,Dasatinib,Everolimus,Temsirolimus | Everolimus | 3/1/2019 |
5 | 3/3/2019 | a | Afatinib,Neo-Cetuximab,Dacomitinib,Erlotinib,Gefitinib,Lapatinib,Necitumumab,Osimertinib,Panitumumab | Dacomitinib | 3/21/2019 |
6 | 1/2/2018 | c | Binimetinib,Cetuximab,Cobimetinib,Idelalisib,Panitumumab,Trametinib | Idelalisib | 1/17/2018 |
6 | 2/2/2019 | b | Cetuximab,Xanitumumab | ||
7 | 2/10/2019 | d | Alpelisib,Everolimus,Temsirolimus | Temsirolimus | 3/2/2019 |
7 | 3/10/2019 | b | Cetuximab,Xanitumumab |
Proc sql;
Create table c2 as select a.*, b.Drug,b.Date_b
from a a join b b on a.id=b.id
and findw( a.drug_list,strip(b.drug),',') > 0 and (b.date_b);
Quit;
proc sql;
select GENES.*, DRUGS.DRUG, DRUGS.DATE_B
from GENES left join DRUGS
on GENES.ID=DRUGS.ID
and findw( GENES.DRUG_LIST, DRUGS.DRUG, ',- ', 't')
order by GENES.ID, GENE_TYPE ;
quit;
ID | DATE_A | GENE_TYPE | DRUG_LIST | DRUG | DATE_B |
---|---|---|---|---|---|
1 | 01/02/2018 | a | Afatinib,Neo-Cetuximab,Dacomitinib,Erlotinib,Gefitinib,Lapatinib,Necitumumab,Osimertinib,Panitumumab | Neo-Cetuximab | 03/28/2019 |
1 | 01/02/2018 | a | Afatinib,Neo-Cetuximab,Dacomitinib,Erlotinib,Gefitinib,Lapatinib,Necitumumab,Osimertinib,Panitumumab | Panitumumab | 02/01/2018 |
1 | 02/10/2019 | b | Cetuximab,Xanitumumab | . | |
2 | 03/03/2019 | c | Binimetinib,Cetuximab,Cobimetinib,Idelalisib,Panitumumab,Trametinib | Trametinib | 03/23/2019 |
3 | 01/02/2018 | d | Alpelisib,Everolimus,Temsirolimus | Alpelisib | 02/01/2018 |
4 | 02/10/2019 | e | Bosutinib,Dasatinib,Everolimus,Temsirolimus | Everolimus | 03/01/2019 |
4 | 02/02/2019 | f | Ado-trastuzumab,Afatinib,Dacomitinib,Lapatinib,Neratinib,Pertuzumab,Trastuzumab,Trastuzumab-anns,Trastuzumab-dkst,Trastuzumab-dttb,Trastuzumab-pkrb,Trastuzumab-qyyp | Ado-trastuzumab | 05/01/2019 |
5 | 03/03/2019 | a | Afatinib,Neo-Cetuximab,Dacomitinib,Erlotinib,Gefitinib,Lapatinib,Necitumumab,Osimertinib,Panitumumab | Dacomitinib | 03/21/2019 |
6 | 02/02/2019 | b | Cetuximab,Xanitumumab | . | |
6 | 01/02/2018 | c | Binimetinib,Cetuximab,Cobimetinib,Idelalisib,Panitumumab,Trametinib | Idelalisib | 01/17/2018 |
7 | 03/10/2019 | b | Cetuximab,Xanitumumab | . | |
7 | 02/10/2019 | d | Alpelisib,Everolimus,Temsirolimus | Temsirolimus | 03/02/2019 |
so Neo-Cetuximab matches with Cetuximab?
> They are unique and different.
OK just replace
findw( GENES.DRUG_LIST, prxchange('s/\A(Ado|Neo)-//',1,DRUGS.DRUG), ',- ', 't')
with
findw( GENES.DRUG_LIST, DRUGS.DRUG), ', ', 't')
then, and report.
1. Please format your code and your data
2. Something like this is what you need. Please alter to suit:
proc sql;
create table WANT as
select GENES.*, DRUGS.DRUG, DRUGS.DATE_B
from GENES join DRUGS
on GENES.ID=DRUGS.ID
and findw( GENES.DRUG_LIST, prxchange('s/\A(Ado|Neo)-//',1,DRUGS.DRUG), ',- ', 't')
order by GENES.ID, GENE_TYPE ;
quit;
ID | DATE_A | GENE_TYPE | DRUG_LIST | DRUG | DATE_B |
---|---|---|---|---|---|
1 | 01/02/2018 | a | Afatinib,Neo-Cetuximab,Dacomitinib,Erlotinib,Gefitinib,Lapatinib,Necitumumab,Osimertinib,Panitumumab | Neo-Cetuximab | 03/28/2019 |
1 | 01/02/2018 | a | Afatinib,Neo-Cetuximab,Dacomitinib,Erlotinib,Gefitinib,Lapatinib,Necitumumab,Osimertinib,Panitumumab | Panitumumab | 02/01/2018 |
1 | 02/10/2019 | b | Cetuximab,Xanitumumab | Neo-Cetuximab | 03/28/2019 |
2 | 03/03/2019 | c | Binimetinib,Cetuximab,Cobimetinib,Idelalisib,Panitumumab,Trametinib | Trametinib | 03/23/2019 |
3 | 01/02/2018 | d | Alpelisib,Everolimus,Temsirolimus | Alpelisib | 02/01/2018 |
4 | 02/10/2019 | e | Bosutinib,Dasatinib,Everolimus,Temsirolimus | Everolimus | 03/01/2019 |
4 | 02/02/2019 | f | Ado-trastuzumab,Afatinib,Dacomitinib,Lapatinib,Neratinib,Pertuzumab,Trastuzumab,Trastuzumab-anns,Trastuzumab-dkst,Trastuzumab-dttb,Trastuzumab-pkrb,Trastuzumab-qyyp | Ado-trastuzumab | 05/01/2019 |
5 | 03/03/2019 | a | Afatinib,Neo-Cetuximab,Dacomitinib,Erlotinib,Gefitinib,Lapatinib,Necitumumab,Osimertinib,Panitumumab | Dacomitinib | 03/21/2019 |
6 | 01/02/2018 | c | Binimetinib,Cetuximab,Cobimetinib,Idelalisib,Panitumumab,Trametinib | Idelalisib | 01/17/2018 |
7 | 02/10/2019 | d | Alpelisib,Everolimus,Temsirolimus | Temsirolimus | 03/02/2019 |
Thanks for this:
I used this code and got the output for ID#1 shows Neo-Cetuximab merging with Cetuximab. It should be a empty cell for drug and date_b because #ID1 was never given Cetuximab.
Basically, each drug name is unique. Neo-Cetuximab is different from Cetuximab and so does any drug with Ado- or any other prefix.I know it's probably a subtle change in the code. I am still trying..thanks again
proc sql;
create table WANT as
select GENES.*, DRUGS.DRUG, DRUGS.DATE_B
from GENES left join DRUGS
on GENES.ID=DRUGS.ID
and findw( GENES.DRUG_LIST, prxchange('s/\A(Ado|Neo)-//',1,DRUGS.DRUG), ',- ', 't')
order by GENES.ID, GENE_TYPE ;
quit;
Please format your code and data.
proc sql;
select GENES.*, DRUGS.DRUG, DRUGS.DATE_B
from GENES left join DRUGS
on GENES.ID=DRUGS.ID
and findw( GENES.DRUG_LIST, DRUGS.DRUG, ',- ', 't')
order by GENES.ID, GENE_TYPE ;
quit;
ID | DATE_A | GENE_TYPE | DRUG_LIST | DRUG | DATE_B |
---|---|---|---|---|---|
1 | 01/02/2018 | a | Afatinib,Neo-Cetuximab,Dacomitinib,Erlotinib,Gefitinib,Lapatinib,Necitumumab,Osimertinib,Panitumumab | Neo-Cetuximab | 03/28/2019 |
1 | 01/02/2018 | a | Afatinib,Neo-Cetuximab,Dacomitinib,Erlotinib,Gefitinib,Lapatinib,Necitumumab,Osimertinib,Panitumumab | Panitumumab | 02/01/2018 |
1 | 02/10/2019 | b | Cetuximab,Xanitumumab | . | |
2 | 03/03/2019 | c | Binimetinib,Cetuximab,Cobimetinib,Idelalisib,Panitumumab,Trametinib | Trametinib | 03/23/2019 |
3 | 01/02/2018 | d | Alpelisib,Everolimus,Temsirolimus | Alpelisib | 02/01/2018 |
4 | 02/10/2019 | e | Bosutinib,Dasatinib,Everolimus,Temsirolimus | Everolimus | 03/01/2019 |
4 | 02/02/2019 | f | Ado-trastuzumab,Afatinib,Dacomitinib,Lapatinib,Neratinib,Pertuzumab,Trastuzumab,Trastuzumab-anns,Trastuzumab-dkst,Trastuzumab-dttb,Trastuzumab-pkrb,Trastuzumab-qyyp | Ado-trastuzumab | 05/01/2019 |
5 | 03/03/2019 | a | Afatinib,Neo-Cetuximab,Dacomitinib,Erlotinib,Gefitinib,Lapatinib,Necitumumab,Osimertinib,Panitumumab | Dacomitinib | 03/21/2019 |
6 | 02/02/2019 | b | Cetuximab,Xanitumumab | . | |
6 | 01/02/2018 | c | Binimetinib,Cetuximab,Cobimetinib,Idelalisib,Panitumumab,Trametinib | Idelalisib | 01/17/2018 |
7 | 03/10/2019 | b | Cetuximab,Xanitumumab | . | |
7 | 02/10/2019 | d | Alpelisib,Everolimus,Temsirolimus | Temsirolimus | 03/02/2019 |
Maybe remove the dash as a separator in the findw() function.
And don't forget to format and reread your question if you want to be helped.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.