Hello,
I am trying to parse out the drug name, strength, and unit variable for specific medications only (Amantadine, Carbidopa, Entacapone, Apomorphine, Trihexyphenidyl, Amantadine, Trihexyphenidyl, and levodopa). The final aim is to calculate the daily dose of these medications. I have tried using the scan, compress followed by the coalescec option to no avail.
1) Things to note a) Use of different delimeters "-" and "/" for the strength variable, (b) different position of drug of interest : position1/position2/position3.
I have included my data and table desired.
Thank you for your assistance.
data WORK.HAVE;
infile datalines dsd truncover;
input Patient_ID:32. Drug_Name:$37. Strength:$12.;
label Patient_ID="Patient ID" Drug_Name="Drug_Name" Strength="Strength";
datalines;
1 Carbidopa 25 MG
1 Carbidopa/Levodopa 25-100 MG
2 Levodopa/Entacapone 100-200 MG
2 Apomorphine 2 MG/ML
2 Ropinirole/Entacapone 2-200 MG
3 Levodopa/Entacapone/Ropinirole 100-200-2 MG
3 Pramipexole/Entacapone/Amantadine 1-200-100 MG
3 Pramipexole/Trihexyphenidyl 1-2 MG
4 Amantadine/Benztropine 100-1 MG
5 Trihexyphenidyl/Rasagiline/Entacapone 2-0.5-200 MG
;;;;
Want
Patient ID | Drug_of_Interest | Strength1 | Unit1 | Drug of interest 2 | Strength2 | Unit2 |
1 | Amantadine | 100 | MG | . | . | |
1 | Carbidopa | 25 | MG | Levodopa | 100 | MG |
2 | Entacapone | 200 | MG | . | . | |
2 | Apomorphine | 2 | MG/ML | . | . | |
2 | Entacapone | 200 | MG | . | . | |
3 | Entacapone | 200 | MG | |||
3 | Entacapone | 200 | MG | Amantadine | 100 | MG |
3 | Trihexyphenidyl | 2 | MG | . | . | . |
4 | Amantadine | 100 | MG | . | . | . |
5 | Trihexyphenidyl | 2 | MG | Entacapone | 200 | MG |
data WORK.HAVE;
infile datalines truncover;
input Patient_ID:32. Drug_Name:$37. Strength $12.;
label Patient_ID="Patient ID" Drug_Name="Drug_Name" Strength="Strength";
datalines;
1 Carbidopa 25 MG
1 Carbidopa/Levodopa 25-100 MG
2 Levodopa/Entacapone 100-200 MG
2 Apomorphine 2 MG/ML
2 Ropinirole/Entacapone 2-200 MG
3 Levodopa/Entacapone/Ropinirole 100-200-2 MG
3 Pramipexole/Entacapone/Amantadine 1-200-100 MG
3 Pramipexole/Trihexyphenidyl 1-2 MG
4 Amantadine/Benztropine 100-1 MG
5 Trihexyphenidyl/Rasagiline/Entacapone 2-0.5-200 MG
;
data temp;
set have;
n+1;
do i=1 to countw(Drug_Name,'/');
Drug_of_Interest_=scan(Drug_Name,i,'/');
Strength_=scan(Strength,i,'-','s');
Unit_=scan(Strength,-1,,'s');
if prxmatch('/Amantadine|Carbidopa|Entacapone|Apomorphine|Trihexyphenidyl|levodopa/i',Drug_of_Interest_) then output;
end;
keep n Patient_ID Drug_of_Interest_ Strength_ Unit_;
run;
data temp2;
set temp;
by n;
if first.n then id=0;
id+1;
run;
proc transpose data=temp2 out=temp3;
by n id Patient_ID;
var Drug_of_Interest_ Strength_ Unit_;
run;
proc transpose data=temp3 out=want(drop=_name_);
by n Patient_ID;
var col1;
id _name_ id;
run;
This parses the drug names (assuming a max length of 24 characters and that the separator is always a slash).
data want;
set have;
array drug $ 24 drug1-drug3;
do i=1 to countw(drug_name,'/');
drug(i)=scan(drug_name,i,'/');
end;
drop i drug_name;
run;
I leave it up to you to follow the above code and modify it to work on dosages.
Correction to the above code
data want;
set have;
array drug $ 24 drug1-drug3;
count=0;
do i=1 to countw(drug_name,'/');
this_drug=scan(drug_name,i,'/');
if upcase(this_drug) in
('AMANTADINE', 'CARBIDOPA', 'ENTACAPONE', 'APOMORPHINE', 'TRIHEXYPHENIDYL', 'AMANTADINE', 'TRIHEXYPHENIDYL', 'LEVODOPA') then do;
count=count+1;
drug(count)=this_drug;
end;
end;
drop i drug_name count this_drug;
run;
Thank you for your response and your time. This code did not parse out the strength variaable.
I would start by converting the dataset into one observation per drug.
First lets fix your example data step so it can run. I will rename the variables to indicate the are LISTS and not normal values. It is not clear to me what you are doing with the UNITS, so since it looks like a separate variable in your data lines I will make it a separate variable (even though it does not seem to have enough values).
data HAVE;
input Patient_ID drugs :$60. strengths :$20. Units :$20. ;
datalines;
1 Carbidopa 25 MG
1 Carbidopa/Levodopa 25-100 MG
2 Levodopa/Entacapone 100-200 MG
2 Apomorphine 2 MG/ML
2 Ropinirole/Entacapone 2-200 MG
3 Levodopa/Entacapone/Ropinirole 100-200-2 MG
3 Pramipexole/Entacapone/Amantadine 1-200-100 MG
3 Pramipexole/Trihexyphenidyl 1-2 MG
4 Amantadine/Benztropine 100-1 MG
5 Trihexyphenidyl/Rasagiline/Entacapone 2-0.5-200 MG
;;;;
Now just loop over the paired lists and parse out the DRUG_NAME and STRENGTH from the list.
data want;
set have;
do index=1 to countw(drugs,'/');
length drug_name $30 strength $10 unit $10 ;
drug_name=scan(drugs,index,'/');
strength=scan(strengths,index,'-');
unit=scan(units,min(index,countw(units,'-')),'-');
output;
end;
run;
Now your test for whether the drug is a drug of interest is simpler since DRUG_NAME now only has one value per observation.
Thank you for your response. I had to make some teaks to the code. I apologize for my sample data not being clear. The strength variable included the unit as well (ie unit is not a separate column in my data)
@cheroij wrote:
... The strength variable included the unit as well (ie unit is not a separate column in my data)
You might need to revisit that decision. The units for all drugs taken by a patient are not always going to be the same. And even if your drug lists just represent single pill combination drugs then not all of those will have the strengths listed using the same units.
Don't saddle yourself with the burden of a wide dataset (Maxim 19).
data have;
infile datalines dsd dlm=" " truncover;
length patient_id 8 drugs $40 strengths $12 unit $10;
input patient_id drugs strengths unit;
datalines;
1 Carbidopa 25 MG
1 Carbidopa/Levodopa 25-100 MG
2 Levodopa/Entacapone 100-200 MG
2 Apomorphine 2 MG/ML
2 Ropinirole/Entacapone 2-200 MG
3 Levodopa/Entacapone/Ropinirole 100-200-2 MG
3 Pramipexole/Entacapone/Amantadine 1-200-100 MG
3 Pramipexole/Trihexyphenidyl 1-2 MG
4 Amantadine/Benztropine 100-1 MG
5 Trihexyphenidyl/Rasagiline/Entacapone 2-0.5-200 MG
;
data interest;
length drug $20;
input drug;
drug = upcase(drug);
datalines;
Amantadine
Carbidopa
Entacapone
Apomorphine
Trihexyphenidyl
Amantadine
Trihexyphenidyl
levodopa
;
data want;
retain patient_id n n1 drug strength unit; * just for variable order;
set have;
length drug $20 strength 8 n 8 n1 8;
by patient_id;
if _n_ = 1
then do;
declare hash int (dataset:"interest");
int.definekey("drug");
int.definedone();
end;
if first.patient_id
then n = 1;
else n + 1;
n1 = 0;
do i = 1 to countw(drugs,"/");
drug = scan(drugs,i,"/");
strength = input(scan(strengths,i,"-"),32.);
if int.check(key:upcase(drug)) = 0
then do;
n1 + 1;
output;
end;
end;
keep patient_id drug strength unit n n1;
run;
proc report data=want;
column patient_id n n1,(drug unit strength);
define patient_id / group;
define n / group noprint;
define n1 / "" across;
define drug / display;
define unit / display;
define strength / analysis;
run;
I created additional variables to get a report like you showed.
Thank you for your response. Your code worked with minor tweaks. Sorry about the confusion in data sample provided.The nint column is not a separate column in my dataset.
data WORK.HAVE;
infile datalines truncover;
input Patient_ID:32. Drug_Name:$37. Strength $12.;
label Patient_ID="Patient ID" Drug_Name="Drug_Name" Strength="Strength";
datalines;
1 Carbidopa 25 MG
1 Carbidopa/Levodopa 25-100 MG
2 Levodopa/Entacapone 100-200 MG
2 Apomorphine 2 MG/ML
2 Ropinirole/Entacapone 2-200 MG
3 Levodopa/Entacapone/Ropinirole 100-200-2 MG
3 Pramipexole/Entacapone/Amantadine 1-200-100 MG
3 Pramipexole/Trihexyphenidyl 1-2 MG
4 Amantadine/Benztropine 100-1 MG
5 Trihexyphenidyl/Rasagiline/Entacapone 2-0.5-200 MG
;
data temp;
set have;
n+1;
do i=1 to countw(Drug_Name,'/');
Drug_of_Interest_=scan(Drug_Name,i,'/');
Strength_=scan(Strength,i,'-','s');
Unit_=scan(Strength,-1,,'s');
if prxmatch('/Amantadine|Carbidopa|Entacapone|Apomorphine|Trihexyphenidyl|levodopa/i',Drug_of_Interest_) then output;
end;
keep n Patient_ID Drug_of_Interest_ Strength_ Unit_;
run;
data temp2;
set temp;
by n;
if first.n then id=0;
id+1;
run;
proc transpose data=temp2 out=temp3;
by n id Patient_ID;
var Drug_of_Interest_ Strength_ Unit_;
run;
proc transpose data=temp3 out=want(drop=_name_);
by n Patient_ID;
var col1;
id _name_ id;
run;
Thank you😊
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.