The Taster variable is made up of number(XX)and gender(M/F). For example, 32F (no blank between) and 31M. Here is my code, this does not have an error but the new variable rows,subj_gender, and subj_age, are empty. My aim is to have gender and age variables, which split from the Taster variable.
data wineA2;
set wineA;
subj_char = Taster;
subj_gender = subject;
subj_age = subject+0;
drop Taster;
run;
proc print data=wineA2;
run;
Actual values in the form of data step code are the best way to provide example data.
The following creates a small data set using a data step to create the two values you mention and then separate them.
data have; input taster $; datalines; 32F 31M ; data want; set have; sub_age = input(compress(taster,,'DK'),f5.); sub_gender=compress(taster,,'D'); run;
The Compress function removes (or keeps) lists of characters present in your value. In this case no list is provided as the option 'D' indicates digit characters. In the first line the K coupled with D keeps digits. Then the input function is used to create the numeric value. The second drops all Digit characters to leave the gender.
Note: while it will work sometimes the use of +0 to make a numeric value is subject to implied conversions, in this case none because you used the whole value and F or M are not going to turn into numeric values with the simple conversion of text to number rules.
I noticed that the "subject" should be "Taster". I ran the new code, subj_age is still empty and subj_gender has data. However, subj_gender still has character and numerical data.
Actual values in the form of data step code are the best way to provide example data.
The following creates a small data set using a data step to create the two values you mention and then separate them.
data have; input taster $; datalines; 32F 31M ; data want; set have; sub_age = input(compress(taster,,'DK'),f5.); sub_gender=compress(taster,,'D'); run;
The Compress function removes (or keeps) lists of characters present in your value. In this case no list is provided as the option 'D' indicates digit characters. In the first line the K coupled with D keeps digits. Then the input function is used to create the numeric value. The second drops all Digit characters to leave the gender.
Note: while it will work sometimes the use of +0 to make a numeric value is subject to implied conversions, in this case none because you used the whole value and F or M are not going to turn into numeric values with the simple conversion of text to number rules.
Are you saying that "32F" represents a 32-year-old female?
If so, then (untested due to absence of a working data set):
data wineA2;
set wineA;
length subj_gender $1;
subj_gender=char(taster,length(taster));
subj_age=input(translate(taster,'',subj_gender),best3.);
run;
The TRANSLATE function converts any character equal to SUBJ_GENDER to a blank. The INPUT function uses the BEST3. informat to allow for 100-year-old tasters. Otherwise BEST2. would do.
The INPUT() function does not care if you use a width on the INFORMAT that is longer than the length of the string you are reading. The maximum width for the normal numeric informat is 32. So just use an informat specification of: 32.
Note: BEST is the name of a FORMAT. There is no BEST informat. If you do use that name as an informat then SAS will assume you meant to use the normal numeric informat.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.