Hi all,
I'm trying to create a binary (dummy) variable with information from two categorical variables, but I'm having trouble. I need to use wildcards because var1 has multiple variables that start with the same letters, each of which match var2. If I need to I will type them all out completely but I feel sure there is a way to do this (and the code for this one bit is already over 100 lines).
data sample;
format var1 $11. var2 $4.;
input var1 $ var2 $;
datalines;
aa1 aaa
aa2 aaa
aa bbb
abdfgdfh bbb
abdfttyrty bbb
abwww bbb
abc aaa
abd aaa
adrtrtrt dd
adyuyuii ddd
adxx bbb
adrr ccc
;
data sample;
set sample;
if (var1 = aa1 or aa2) and (var2 = aaa) then match=1;
else if (var1 = ab:) and (var2 = bbb) then match=1;
else if (var1 = ad:) and (var2 = ddd or dd) then match=1;
else match=0;
run;
which gives me
ERROR 388-185: Expecting an arithmetic operator.
ERROR 200-322: The symbol is not recognized and will be ignored.
ERROR 76-322: Syntax error, statement will be ignored.
I've also tried it like this
if var1= 'aa1' or 'aa2' and var2 = 'aaa' then match=1;
else if var1 =: 'ab%' and var2 ='bbb' then match=1;
else if var1 =: 'ad%' and (var2 ='ddd' or 'dd') then match=1;
which gives me NOTE: Invalid numeric data, but creates the variable -- with only 1 one, the first observation, and the rest all as zeros.
and like this
if (var1= aa1 or aa2) and var2 = aaa then match=1;
else if var1 =: ab% and var2 =bbb then match=1;
else if var1 =: ad% and (var2 =ddd or dd) then match=1;
which gives me
ERROR 388-185: Expecting an arithmetic operator.
ERROR 200-322: The symbol is not recognized and will be ignored.
ERROR 76-322: Syntax error, statement will be ignored.
Anyone have a clue as to my mistake?
I'm using SAS 9.2 on Windows 7.
EDIT: after reading your comments (thank you!) I have working code:
if var1 in ("aa1" "aa2") and var2 = "aaa" then match=1;
else if var1 =: "ab" and var2 ="bbb" then match=1;
else if var1 =: "ad" and var2 in ("ddd" "dd") then match=1;
First issue:
if (var1 = aa1 or aa2) is using aa1 and aa2 as VARIABLES, which do not exist
Second you want to reference the Values of concern, text literals require quotes to tell SAS you are looking for specific strings
If you want to see if var1 has a value of either aa1 or aa2 here are two ways:
if (var1 = 'aa1' or var1='aa2')
or
if var1 in ('aa1' 'aa2').
The value vs variable has to be addressed in all of your code.
When you use the ab: construct it is looking for VARIABLES that start with ap, not values. You would use var1 =: 'ab' to look for strings starting with 'ab'.
In future posts with errors please post the log including the procedure or datastep. There are things that tell us which line and likely specific causes that cannot be determined by just posting the error.
if var1 in ("aa1" "aa2") and (var2 = "aaa") then match=1;
For starters here's one correction. Some similar ones need to be carried through as well.
If you're comparing to a text value you need to include quotes and if checking for multiple variables use IN ()
Another change, the colon goes with the = sign, not the
(var1 =: "ab") and (var2 = "bbb")
Thank you, I did not realize that I did not need the % when using =:
First issue:
if (var1 = aa1 or aa2) is using aa1 and aa2 as VARIABLES, which do not exist
Second you want to reference the Values of concern, text literals require quotes to tell SAS you are looking for specific strings
If you want to see if var1 has a value of either aa1 or aa2 here are two ways:
if (var1 = 'aa1' or var1='aa2')
or
if var1 in ('aa1' 'aa2').
The value vs variable has to be addressed in all of your code.
When you use the ab: construct it is looking for VARIABLES that start with ap, not values. You would use var1 =: 'ab' to look for strings starting with 'ab'.
In future posts with errors please post the log including the procedure or datastep. There are things that tell us which line and likely specific causes that cannot be determined by just posting the error.
Thank you, this was helpful. I am no longer getting an error code, but it is not catching all of the matches. Do you see a problem in the following code?
if var1 in ("aa1" "aa2") and var2 = "aaa" then match=1;
if var1 =: "ab" and var2 ="bbb" then match=1;
if var1 =: "ad" and var2 in ("ddd" "dd") then match=1;
There's no problem with that code. The problem is what you added after that code:
else match=0;
That ELSE applies only to the last statement before it. If you want to link all the statements together, you have to add "else" a few more times:
if var1 in ("aa1" "aa2") and var2 = "aaa" then match=1;
else if var1 =: "ab" and var2 ="bbb" then match=1;
else if var1 =: "ad" and var2 in ("ddd" "dd") then match=1;
else match=0;
When you remember those who helped you, please mark the other poster's answer as correct. He did most of the work and gave you the right tools to use.
Ah, thank you, I overlooked that when re-typing it.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.