Solved: Parse Out Columns Based on Conditions

cheroij · Posted 12-29-2024 05:24 AM

Hello,

I am trying to parse out the drug name, strength, and unit variable for specific medications only (Amantadine, Carbidopa, Entacapone, Apomorphine, Trihexyphenidyl, Amantadine, Trihexyphenidyl, and levodopa). The final aim is to calculate the daily dose of these medications. I have tried using the scan, compress followed by the coalescec option to no avail.

1) Things to note a) Use of different delimeters "-" and "/" for the strength variable, (b) different position of drug of interest : position1/position2/position3.

I have included my data and table desired.

Thank you for your assistance.

data WORK.HAVE;
infile datalines dsd truncover;
input Patient_ID:32. Drug_Name:$37. Strength:$12.;
label Patient_ID="Patient ID" Drug_Name="Drug_Name" Strength="Strength";
datalines;
1 Carbidopa 25 MG
1 Carbidopa/Levodopa 25-100 MG
2 Levodopa/Entacapone 100-200 MG
2 Apomorphine 2 MG/ML
2 Ropinirole/Entacapone 2-200 MG
3 Levodopa/Entacapone/Ropinirole 100-200-2 MG
3 Pramipexole/Entacapone/Amantadine 1-200-100 MG
3 Pramipexole/Trihexyphenidyl 1-2 MG
4 Amantadine/Benztropine 100-1 MG
5 Trihexyphenidyl/Rasagiline/Entacapone 2-0.5-200 MG
;;;;

Want

Patient ID	Drug_of_Interest	Strength1	Unit1	Drug of interest 2	Strength2	Unit2
1	Amantadine	100	MG		.	.
1	Carbidopa	25	MG	Levodopa	100	MG
2	Entacapone	200	MG		.	.
2	Apomorphine	2	MG/ML		.	.
2	Entacapone	200	MG		.	.
3	Entacapone	200	MG
3	Entacapone	200	MG	Amantadine	100	MG
3	Trihexyphenidyl	2	MG	.	.	.
4	Amantadine	100	MG	.	.	.
5	Trihexyphenidyl	2	MG	Entacapone	200	MG

Ksharp · Posted 12-29-2024 09:14 PM

data WORK.HAVE;
infile datalines  truncover;
input Patient_ID:32. Drug_Name:$37. Strength $12.;
label Patient_ID="Patient ID" Drug_Name="Drug_Name" Strength="Strength";
datalines;
1 Carbidopa 25 MG
1 Carbidopa/Levodopa 25-100 MG
2 Levodopa/Entacapone 100-200 MG
2 Apomorphine 2 MG/ML
2 Ropinirole/Entacapone 2-200 MG
3 Levodopa/Entacapone/Ropinirole 100-200-2 MG
3 Pramipexole/Entacapone/Amantadine 1-200-100 MG
3 Pramipexole/Trihexyphenidyl 1-2 MG
4 Amantadine/Benztropine 100-1 MG
5 Trihexyphenidyl/Rasagiline/Entacapone 2-0.5-200 MG
;
data temp;
 set have;
 n+1;
 do i=1 to countw(Drug_Name,'/');
  Drug_of_Interest_=scan(Drug_Name,i,'/');
  Strength_=scan(Strength,i,'-','s');
  Unit_=scan(Strength,-1,,'s');
  if prxmatch('/Amantadine|Carbidopa|Entacapone|Apomorphine|Trihexyphenidyl|levodopa/i',Drug_of_Interest_) then output;
 end;
 keep n  Patient_ID Drug_of_Interest_ Strength_  Unit_;
 run;
 data temp2;
  set temp;
  by n;
  if first.n then id=0;
  id+1;
run;
 proc transpose data=temp2 out=temp3;
 by n id Patient_ID;
 var Drug_of_Interest_ Strength_  Unit_;
 run;
 proc transpose data=temp3 out=want(drop=_name_);
 by n Patient_ID;
 var col1;
 id _name_ id;
 run;

View solution in original post

PaigeMiller · Posted 12-29-2024 06:26 AM

This parses the drug names (assuming a max length of 24 characters and that the separator is always a slash).

data want;
	set have;
	array drug $ 24 drug1-drug3;
	do i=1 to countw(drug_name,'/');
		drug(i)=scan(drug_name,i,'/');
	end;
	drop i drug_name;
run;

I leave it up to you to follow the above code and modify it to work on dosages.

--
Paige Miller

PaigeMiller · Posted 12-29-2024 11:02 AM

Correction to the above code

data want;
	set have;
	array drug $ 24 drug1-drug3;
	count=0;
	do i=1 to countw(drug_name,'/');
		this_drug=scan(drug_name,i,'/');
		if upcase(this_drug) in 
			('AMANTADINE', 'CARBIDOPA', 'ENTACAPONE', 'APOMORPHINE', 'TRIHEXYPHENIDYL', 'AMANTADINE', 'TRIHEXYPHENIDYL', 'LEVODOPA') then do;
		    count=count+1;
		    drug(count)=this_drug;
	    end;
	end;
	drop i drug_name count this_drug;
run;

--
Paige Miller

cheroij · Posted 12-30-2024 10:56 AM

Thank you for your response and your time. This code did not parse out the strength variaable.

Tom · Posted 12-29-2024 12:01 PM

I would start by converting the dataset into one observation per drug.

First lets fix your example data step so it can run. I will rename the variables to indicate the are LISTS and not normal values. It is not clear to me what you are doing with the UNITS, so since it looks like a separate variable in your data lines I will make it a separate variable (even though it does not seem to have enough values).

data HAVE;
  input Patient_ID drugs :$60. strengths :$20. Units :$20. ;
datalines;
1 Carbidopa 25 MG
1 Carbidopa/Levodopa 25-100 MG
2 Levodopa/Entacapone 100-200 MG
2 Apomorphine 2 MG/ML
2 Ropinirole/Entacapone 2-200 MG
3 Levodopa/Entacapone/Ropinirole 100-200-2 MG
3 Pramipexole/Entacapone/Amantadine 1-200-100 MG
3 Pramipexole/Trihexyphenidyl 1-2 MG
4 Amantadine/Benztropine 100-1 MG
5 Trihexyphenidyl/Rasagiline/Entacapone 2-0.5-200 MG
;;;;

Now just loop over the paired lists and parse out the DRUG_NAME and STRENGTH from the list.

data want;
  set have;
  do index=1 to countw(drugs,'/');
    length drug_name $30 strength $10 unit $10 ;
    drug_name=scan(drugs,index,'/');
    strength=scan(strengths,index,'-');
    unit=scan(units,min(index,countw(units,'-')),'-');
    output;
  end;
run;

Now your test for whether the drug is a drug of interest is simpler since DRUG_NAME now only has one value per observation.

cheroij · Posted 12-30-2024 10:59 AM

Thank you for your response. I had to make some teaks to the code. I apologize for my sample data not being clear. The strength variable included the unit as well (ie unit is not a separate column in my data)

Tom · Posted 12-30-2024 11:50 AM

@cheroij wrote:

... The strength variable included the unit as well (ie unit is not a separate column in my data)

You might need to revisit that decision. The units for all drugs taken by a patient are not always going to be the same. And even if your drug lists just represent single pill combination drugs then not all of those will have the strengths listed using the same units.

Kurt_Bremser · Posted 12-29-2024 02:39 PM

Don't saddle yourself with the burden of a wide dataset (Maxim 19).

data have;
infile datalines dsd dlm=" " truncover;
length patient_id 8 drugs $40 strengths $12 unit $10;
input patient_id drugs strengths unit;
datalines;
1 Carbidopa 25 MG
1 Carbidopa/Levodopa 25-100 MG
2 Levodopa/Entacapone 100-200 MG
2 Apomorphine 2 MG/ML
2 Ropinirole/Entacapone 2-200 MG
3 Levodopa/Entacapone/Ropinirole 100-200-2 MG
3 Pramipexole/Entacapone/Amantadine 1-200-100 MG
3 Pramipexole/Trihexyphenidyl 1-2 MG
4 Amantadine/Benztropine 100-1 MG
5 Trihexyphenidyl/Rasagiline/Entacapone 2-0.5-200 MG
;

data interest;
length drug $20;
input drug;
drug = upcase(drug);
datalines;
Amantadine
Carbidopa
Entacapone
Apomorphine
Trihexyphenidyl
Amantadine
Trihexyphenidyl
levodopa
;

data want;
retain patient_id n n1 drug strength unit; * just for variable order;
set have;
length drug $20 strength 8 n 8 n1 8;
by patient_id;
if _n_ = 1
then do;
  declare hash int (dataset:"interest");
  int.definekey("drug");
  int.definedone();
end;
if first.patient_id
then n = 1;
else n + 1;
n1 = 0;
do i = 1 to countw(drugs,"/");
  drug = scan(drugs,i,"/");
  strength = input(scan(strengths,i,"-"),32.);
  if int.check(key:upcase(drug)) = 0
  then do;
    n1 + 1;
    output;
  end;
end;
keep patient_id drug strength unit n n1;
run;

proc report data=want;
column patient_id n n1,(drug unit strength);
define patient_id / group;
define n / group noprint;
define n1 / "" across;
define drug / display;
define unit / display;
define strength / analysis;
run;

I created additional variables to get a report like you showed.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

cheroij · Posted 12-30-2024 11:04 AM

Thank you for your response. Your code worked with minor tweaks. Sorry about the confusion in data sample provided.The nint column is not a separate column in my dataset.

Ksharp · Posted 12-29-2024 09:14 PM

data WORK.HAVE;
infile datalines  truncover;
input Patient_ID:32. Drug_Name:$37. Strength $12.;
label Patient_ID="Patient ID" Drug_Name="Drug_Name" Strength="Strength";
datalines;
1 Carbidopa 25 MG
1 Carbidopa/Levodopa 25-100 MG
2 Levodopa/Entacapone 100-200 MG
2 Apomorphine 2 MG/ML
2 Ropinirole/Entacapone 2-200 MG
3 Levodopa/Entacapone/Ropinirole 100-200-2 MG
3 Pramipexole/Entacapone/Amantadine 1-200-100 MG
3 Pramipexole/Trihexyphenidyl 1-2 MG
4 Amantadine/Benztropine 100-1 MG
5 Trihexyphenidyl/Rasagiline/Entacapone 2-0.5-200 MG
;
data temp;
 set have;
 n+1;
 do i=1 to countw(Drug_Name,'/');
  Drug_of_Interest_=scan(Drug_Name,i,'/');
  Strength_=scan(Strength,i,'-','s');
  Unit_=scan(Strength,-1,,'s');
  if prxmatch('/Amantadine|Carbidopa|Entacapone|Apomorphine|Trihexyphenidyl|levodopa/i',Drug_of_Interest_) then output;
 end;
 keep n  Patient_ID Drug_of_Interest_ Strength_  Unit_;
 run;
 data temp2;
  set temp;
  by n;
  if first.n then id=0;
  id+1;
run;
 proc transpose data=temp2 out=temp3;
 by n id Patient_ID;
 var Drug_of_Interest_ Strength_  Unit_;
 run;
 proc transpose data=temp3 out=want(drop=_name_);
 by n Patient_ID;
 var col1;
 id _name_ id;
 run;

cheroij · Posted 12-30-2024 10:59 AM

Thank you😊

Parse Out Columns Based on Conditions

Re: Parse Out Columns Based on Conditions

Re: Parse Out Columns Based on Conditions

Re: Parse Out Columns Based on Conditions

Re: Parse Out Columns Based on Conditions

Re: Parse Out Columns Based on Conditions

Re: Parse Out Columns Based on Conditions

Re: Parse Out Columns Based on Conditions

Re: Parse Out Columns Based on Conditions

Re: Parse Out Columns Based on Conditions

Re: Parse Out Columns Based on Conditions

Re: Parse Out Columns Based on Conditions

Registration is open

Registration is open

SAS Training: Just a Click Away