Hi SAS people,
I'm wondering if there's an easy way (like a function?) to treat an expression as an actual variable/variable name in the creation of a variable in a DATA step. For example, if I have the following data,
data have;
input id $ first_fla_lang $ first_fla_grd first_flb_lang $ first_flb_grd first_flc_lang $ first_flc_grd
second_fla_lang $ second_fla_grd second_flb_lang $ second_flb_grd;
cards;
1234567 FRENCH 3.667 SPANISH 4 SPANISH 3 RUSSIAN 2.667 RUSSIAN 3
1234566 SPANISH 4 SPANISH 4 SPANISH 3 PORTUGUESE 3.667 PORTUGUESE 4
1234565 ITALIAN 2 ITALIAN 3 ITALIAN 3 ITALIAN 2.333 GREEK 3
;
run;
I would like to be able to have an expression (in this case, using the tranwrd function) act as an actual variable in the DATA step so that the actual value of the newly created variable is the value of the variable being referred to in the expression (which results to the variable name of the variable that I want the new variable to be equal to). So the code would be something like this:
data want; set have;
x=/*[insert function here?]*/(tranwrd("first_fla_lang","lang","grd"));
y=first_fla_grd;
run;
The result should be that the newly created variables x and y are equal to each other (i.e., equal to the value of the numerical variable first_fla_grd), with the resulting data looking like this:
instead of x being a string variable equal to whatever the expression results to, like this (don't want):
Thanks so much!
If you want to reference a variable indirectly you have three main choices.
The clearest is to use an ARRAY and an index number into the array.
Here is a clear example (not that I would actually use a data structure like this).
data want;
set have;
array months jan feb mar apr may jun jul aug sep oct nov dec;
new_var = months[ month(date_var) ] ;
run;
For your data you might use arrays like this:
data want;
set have;
array languages first_fla_lang first_flb_lang first_flc_lang second_fla_lang second_flb_lang ;
array grades first_fla_grd first_flb_grd first_flc_grd second_fla_grd second_flb_grd;
do order=1 to dim(grades);
language=languages[order];
grade=grades[order];
output;
end;
keep id order language grade;
run;
The next way is to use a macro variable to generate the name. Note this means that the SAME variable is referenced every time since the macro language just generates the code that becomes the data step. Once the data step is defined the code for the step cannot change.
data want;
set have;
x= %sysfunc(tranwrd(first_fla_lang,lang,grd));
y=first_fla_grd;
run;
Which will result in this data step being run
data want;
set have;
x= first_fla_grd;
y=first_fla_grd;
run;
The third way is to use the VVALUEX() function.
data want;
set have;
x=vvaluex(tranwrd("first_fla_lang","lang","grd"));
y=first_fla_grd;
run;
Note that this always returns the FORMATTED value of the named variable, so that means it always returns a character string. So if you want X to be a numeric variable you would need to add an INPUT() function call.
data want;
set have;
x=input(vvaluex(tranwrd("first_fla_lang","lang","grd")),32.);
y=first_fla_grd;
run;
If you want to reference a variable indirectly you have three main choices.
The clearest is to use an ARRAY and an index number into the array.
Here is a clear example (not that I would actually use a data structure like this).
data want;
set have;
array months jan feb mar apr may jun jul aug sep oct nov dec;
new_var = months[ month(date_var) ] ;
run;
For your data you might use arrays like this:
data want;
set have;
array languages first_fla_lang first_flb_lang first_flc_lang second_fla_lang second_flb_lang ;
array grades first_fla_grd first_flb_grd first_flc_grd second_fla_grd second_flb_grd;
do order=1 to dim(grades);
language=languages[order];
grade=grades[order];
output;
end;
keep id order language grade;
run;
The next way is to use a macro variable to generate the name. Note this means that the SAME variable is referenced every time since the macro language just generates the code that becomes the data step. Once the data step is defined the code for the step cannot change.
data want;
set have;
x= %sysfunc(tranwrd(first_fla_lang,lang,grd));
y=first_fla_grd;
run;
Which will result in this data step being run
data want;
set have;
x= first_fla_grd;
y=first_fla_grd;
run;
The third way is to use the VVALUEX() function.
data want;
set have;
x=vvaluex(tranwrd("first_fla_lang","lang","grd"));
y=first_fla_grd;
run;
Note that this always returns the FORMATTED value of the named variable, so that means it always returns a character string. So if you want X to be a numeric variable you would need to add an INPUT() function call.
data want;
set have;
x=input(vvaluex(tranwrd("first_fla_lang","lang","grd")),32.);
y=first_fla_grd;
run;
Thanks so much for this comprehensive response! Just curious, why do you have the informat equal to 32. for the second argument of the INPUT function? Why 32 and not some other number? I've seen "CHAR32" and "BEST32" being referred to in other contexts, but never figured out what the significance of that is.
32. is the maximum width of the normal numeric informat. The INPUT() function does not care if you use a width that is larger than the length of the string being read, so it is best to just default to using a width of 32. You might also want to include the LEFT() function to remove any leading spaces that might be in the formatted value of the variable.
input(left(vvaluex("name")),32.)
$CHAR is a character informat. It is different than the normal $ informat in that it does not remove leading spaces and it does not convert a single period into a missing value (which for a character variables is all blanks).
BEST is the name of a FORMAT. Its name derives from its attempt to find to best way to display a number in a limited number of characters. So it moves the decimal point or converts to scientific notation to be able to represent the number being formatted. If you use it as the name of a INFORMAT then SAS will just revert to the normal numeric informat (You can also use F as an alias for the normal numeric informat.). But the name "BEST" for an informat makes no sense. There is only one way to represent a number in SAS, as a 64bit binary floating point value. So there is no "best" way to do the conversion of string to a number.
Thanks again, Tom, for the detail in your response!
Try VVALUEX().
data want; set have; x=vvaluex(tranwrd("first_fla_lang","lang","grd")); y=first_fla_grd; run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.