Hi,
I have the following dataset and I want to create a new variable when sas finds any mention of the first word, it will output but am not getting the right outcome.
For example,
data check1;
input gnn $ 1-50;
datalines;
ESOMEPRAZOLE
ESOMEPRAZOLE MAGNESIUM
ESOMEPRAZOLE/MAGNESIUM
ESOMEPRAZOLE_calcium
ESOMEPRAZOLE HCL
ESOMEPRAZOLE H ;
run;
data check2;
set check1;
if GNN= "ESOMEPRAZOLE: " then new_drug = anagesic;
keep gnn;
run;
I added a colon (:) to the drug name so that sas can find all mention but its not working.
This is what I want as output
ESOMEPRAZOLE | Analgesics |
ESOMEPRAZOLE MAGNESIUM | Analgesics |
ESOMEPRAZOLE/MAGNESIUM | Analgesics |
ESOMEPRAZOLE_calcium | Analgesics |
ESOMEPRAZOLE HCL | Analgesics |
ESOMEPRAZOLE H | Analgesics |
Try the FIND function 😊:
data check1;
input gnn $ 1-50;
datalines;
ESOMEPRAZOLE
ESOMEPRAZOLE MAGNESIUM
NOT AN ANALGESIC
ESOMEPRAZOLE/MAGNESIUM
ESOMEPRAZOLE_calcium
ESOMEPRAZOLE HCL
ESOMEPRAZOLE H
;
data check2;
set check1;
if find(GNN, "ESOMEPRAZOLE") then new_drug='Anagesic';
run;
Obs | gnn | new_drug |
---|---|---|
1 | ESOMEPRAZOLE | Anagesic |
2 | ESOMEPRAZOLE MAGNESIUM | Anagesic |
3 | BAD DRUG | |
4 | ESOMEPRAZOLE/MAGNESIUM | Anagesic |
5 | ESOMEPRAZOLE_calcium | Anagesic |
6 | ESOMEPRAZOLE HCL | Anagesic |
7 | ESOMEPRAZOLE H | Anagesic |
If your : addition was intended to match values that start with a specific word or string of characters then : would go on the equal sign:
if GNN=: "ESOMEPRAZOLE" then new_drug = anagesic;
If you include a space at the end of the string to match, as you did, then it would not match the value of ESOMEPRAZOLE/MAGNESIUM or ESOMEPRAZOLE_calcium due to the / or _ .
Also if there is any chance that your variable GNN might sometimes have a value that starts with Esomeprazole, note the different case, then it doesn't match. You could work around that by using the UPCASE function to match the value regardless of original case:
if upcase (GNN) =: "ESOMEPRAZOLE" then new_drug = anagesic;
Try the FIND function 😊:
data check1;
input gnn $ 1-50;
datalines;
ESOMEPRAZOLE
ESOMEPRAZOLE MAGNESIUM
NOT AN ANALGESIC
ESOMEPRAZOLE/MAGNESIUM
ESOMEPRAZOLE_calcium
ESOMEPRAZOLE HCL
ESOMEPRAZOLE H
;
data check2;
set check1;
if find(GNN, "ESOMEPRAZOLE") then new_drug='Anagesic';
run;
Obs | gnn | new_drug |
---|---|---|
1 | ESOMEPRAZOLE | Anagesic |
2 | ESOMEPRAZOLE MAGNESIUM | Anagesic |
3 | BAD DRUG | |
4 | ESOMEPRAZOLE/MAGNESIUM | Anagesic |
5 | ESOMEPRAZOLE_calcium | Anagesic |
6 | ESOMEPRAZOLE HCL | Anagesic |
7 | ESOMEPRAZOLE H | Anagesic |
Thank you both. I wish I could accept both solutions because they both provided me a better understand of how to address the problem.
@ballardw Thank you, you provide the solution and refined my initial thought process.
@SASJedi Thank you so much. I accepted your solution because when I looked into the entire data, I found some drug name with weird name. For example 0.9%ESOMEPRAZOLE or HCL ESOMEPRAZOLE. I was able to use the find function to identify these ones.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.