BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JackHP
Fluorite | Level 6

I would like to create a variable, spanish, with a value of 1 if participant is a Spanish speaker, and a 0 of they are not.

 

I would like to determine this by the presence of a numeric 1 in at least one of a range of variables: spanish2-spanish4 spanish5-spanish24 (note there is no spanish5 variable)

 

I've tried the below but this sets just about every observation incorrectly the initial missing categorization. Thank you for your help!

 

Data = test; set = mvrx19.master;

Spanish = .;

if Spanish2-Spanish4 or Spanish6-Spanish24 = 0 then Spanish = 0;

else if Spanish2-Spanish4 or Spanish6-Spanish24 = 1 then Spanish = 1;

proc print;

var spanish Spanish2-Spanish4 Spanish6-Spanish24;

run;

 

Example desired output:

spanish      spanish2   spanish3    spanish4    spanish6 .....

     1                  0                  .                   1                 0

     0                  0                  .                   0                  .

      .                  .                   .                   .                   .

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hi @JackHP,

 

Another option is the IN operator applied to an array:

data want;
set mvrx19.master;
array esp[*] spanish2-spanish4 spanish6-spanish24;
spanish=(1 in esp);
run;

If variables spanish2, spanish3, spanish4, spanish6, ..., spanish24 are the only variables in mvrx19.master whose names start with, say, "span", the variable list in the array definition can be written as span:. Together with other ways to shorten the code:

data want;
set mvrx19.master;
array e span:;
spanish=1 in e;
run;

(assuming that there's no variable e [or esp in the first example] in dataset mvrx19.master).

 

Note that your code doesn't work correctly for several reasons:

  1. The equals signs in the DATA and SET statements are syntax errors.
  2. The hyphens in your IF conditions are interpreted as minus signs, so you're looking at two differences involving only four variables. No variable lists are used there.*
  3. A numeric expression such as (the difference) Spanish2-Spanish4 without a comparison operator is evaluated as TRUE if the value of the expression is neither zero nor missing. The equals signs in your IF conditions only refer to the difference Spanish6-Spanish24.
  4. The OR operator in the first IF condition would be logically incorrect even if everything else worked as intended.

 

* The same happens if you forget the OF operator before a numbered range list in the argument of a SAS function, e.g.

whichn(1, of Spanish2-Spanish4, of Spanish6-Spanish24)

View solution in original post

6 REPLIES 6
novinosrin
Tourmaline | Level 20

data have;
input       spanish2   spanish3    spanish4    spanish6;
cards;
0                  .                   1                 0
0                  .                   0                  .
.                   .                   .                   .
;

data want;
set have;
spanish=^^sum(of spanish2--spanish6,0);
run;

or MAX

 

spanish=^^max(of spanish2--spanish6,0);

 

ballardw
Super User

To find a specific value in a group of variables then the function you want is WHICHN for numeric and WHICHC for character values.

The function actually returns the position number of the first variable with the value or 0 if not found.

 

Try

Data = test; 
   set = mvrx19.master;
   Spanish = (whichn(1,Spanish2-Spanish4,Spanish6-Spanish24)>0 );
run;

The parentheses around the "whichn(1,Spanish2-Spanish4,Spanish6-Spanish24)>0" return a logical true, or 1, if any of the values have a 1, or false (0) if none have a one.

 

Whichn is one of the functions that will use variable lists as you attempted.

JackHP
Fluorite | Level 6

Thank you for the quick response and for sharing the WhichN/WhichC function!

 

Unfortunately, this isn't quite getting the output I'm looking for - if at least one variable in a list = 1, I would like the new variable "Spanish" to equal 1 (even if other variables in the list are missing or set to 0). Current output has Spanish = 0 for all observations, even though several observations have a value of 1 for at least some of the Spanish2-Spanish24 variables queried. Any suggestions on how to revise the code?

 

Thank you!

JackHP
Fluorite | Level 6
Following up on my prior message:

The below code produced the output I was looking for - I added "of" before each variable range.

Data test; set mvrx19.master;
esp = .;
esp = (whichn(1, of Spanish2-Spanish4, of Spanish6-Spanish24)>0 );
run;
FreelanceReinh
Jade | Level 19

Hi @JackHP,

 

Another option is the IN operator applied to an array:

data want;
set mvrx19.master;
array esp[*] spanish2-spanish4 spanish6-spanish24;
spanish=(1 in esp);
run;

If variables spanish2, spanish3, spanish4, spanish6, ..., spanish24 are the only variables in mvrx19.master whose names start with, say, "span", the variable list in the array definition can be written as span:. Together with other ways to shorten the code:

data want;
set mvrx19.master;
array e span:;
spanish=1 in e;
run;

(assuming that there's no variable e [or esp in the first example] in dataset mvrx19.master).

 

Note that your code doesn't work correctly for several reasons:

  1. The equals signs in the DATA and SET statements are syntax errors.
  2. The hyphens in your IF conditions are interpreted as minus signs, so you're looking at two differences involving only four variables. No variable lists are used there.*
  3. A numeric expression such as (the difference) Spanish2-Spanish4 without a comparison operator is evaluated as TRUE if the value of the expression is neither zero nor missing. The equals signs in your IF conditions only refer to the difference Spanish6-Spanish24.
  4. The OR operator in the first IF condition would be logically incorrect even if everything else worked as intended.

 

* The same happens if you forget the OF operator before a numbered range list in the argument of a SAS function, e.g.

whichn(1, of Spanish2-Spanish4, of Spanish6-Spanish24)
JackHP
Fluorite | Level 6

Thank you for the detailed reply - this worked and output is exactly what I was looking for - thank you!

 

I appreciate the feedback on the initial code as well, it's helpful to have that detailed, as well as the alternative option using the WhichN approach.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 7027 views
  • 3 likes
  • 4 in conversation