Hi SAS Coders:
I am trying to change numeric observations into character so I can do my analysis, but it is showing blank observations for these variables:
Obs | fast | purge | language | age_group | sex_category | ethnicity | USBorn | slimfast | DescribeWt | HomeEnv |
1 | . | . | E | <11 | Male | NonH | Life | No | VeryOver | PrettyT |
2 | . | . | E | <11 | Male | NonH | Life | No | SlghtOve | VeryTru |
3 | . | . | E | <11 | Male | Hisp | Miss | Mis | Right | Missing |
4 | . | . | E | <11 | Female | NonH | Life | No | Right | VeryTru |
5 | . | . | E | <11 | Male | Hisp | Life | No | SltUnder | PrettyT |
6 | . | . | E | <11 | Female | Hisp | Life | No | Right | VeryTru |
7 | . | . | E | <11 | Female | NonH | Life | No | Right | PrettyT |
8 | . | . | E | <11 | Female | NonH | Life | No | SlghtOve | PrettyT |
9 | . | . | E | <11 | Female | NonH | Life | No | SltUnder | VeryTru |
10 | . | . | E | <11 | Female | NonH | Life | No | Right | VeryTru |
I know the observations are in fact numeric as it says "1" and "2" in the dataset
My code is this:
DATA WORK.CLEANDATA;
SET YRRSIMPT.import;
/* Categorize age */
IF age < 11 THEN age_group = "<11";
ELSE IF age = 12 THEN age_group = "12";
ELSE IF age = 13 THEN age_group = "13";
ELSE IF age = 14 THEN age_group = "14";
ELSE IF age >= 15 THEN age_group = "14+";
ELSE age_group = "Missing";
/* Categorize sex */
IF sex = 1 THEN sex_category = "Female";
ELSE IF sex = 2 THEN sex_category = "Male";
ELSE sex_category = "Missing";
IF v4 = 1 THEN ethnicity = "Hisp";
ELSE IF v4 = 2 THEN ethnicity = "NonHisp";
ELSE ethnicity = "Missing";
/* Categorize v66 (USBorn) */
IF v66 = 1 THEN USBorn = "<1Yr";
ELSE IF v66 = 2 THEN USBorn = "1-3Yrs";
ELSE IF v66 = 3 THEN USBorn = "4-6Yrs";
ELSE IF v66 = 4 THEN USBorn = "6+Yr";
ELSE IF v66 = 5 THEN USBorn = "Life";
ELSE USBorn = "Missing";
/* Categorize v51 */
IF v51 = 1 THEN fast = "Yes";
ELSE IF v51 = 2 THEN fast = "No";
ELSE fast = "Missing";
/* Categorize v52 */
IF v52 = 1 THEN purge = "Yes";
ELSE IF v52 = 2 THEN purge = "No";
ELSE purge = "Missing";
/* Categorize v53 */
IF v53 = 1 THEN slimfast = "Yes";
ELSE IF v53 = 2 THEN slimfast = "No";
ELSE slimfast = "Missing";
IF v49 = 1 THEN DescribeWt = "VUnderWt";
ELSE IF v49 = 2 THEN DescribeWt = "SltUnder";
ELSE IF v49 = 3 THEN DescribeWt = "Right";
ELSE IF v49 = 4 THEN DescribeWt = "SlghtOver";
ELSE IF v49 = 5 THEN DescribeWt = "VeryOver";
ELSE DescribeWt = "Missing";
/* Categorize v71 (HomeEnv) */
IF v71 = 1 THEN HomeEnv = "NotTrue";
ELSE IF v71 = 2 THEN HomeEnv = "LilTrue";
ELSE IF v71 = 3 THEN HomeEnv = "PrettyTrue";
ELSE IF v71 = 4 THEN HomeEnv = "VeryTrue";
ELSE HomeEnv = "Missing";
KEEP age_group ethnicity sex_category HomeEnv purge fast USBorn DescribeWt language slimfast;
RUN;
If you are surprised that this block is resulting in the value "Missing":
IF v51 = 1 THEN fast = "Yes";
ELSE IF v51 = 2 THEN fast = "No";
ELSE fast = "Missing";
You could change it to e.g.:
IF v51 = 1 THEN fast = "Yes";
ELSE IF v51 = 2 THEN fast = "No";
ELSE IF missing(v51) then fast="Missing";
ELSE put "ERROR: Surprising value for v51 " v51= ;
That will write an error message to your log whenever the value of v51 is not one of the values you are expecting. Typically I tried to avoid using an ELSE statement that does anything other than write an ERROR message, because you never know what odd values may lurk in your data.
That said, as @PaigeMiller showed in SAS you do not need to create new character variables to create these sorts of labeled categories. A SAS format will do it for you, and is much more flexible.
@kcvaldez98 wrote:
Hi SAS Coders:
I am trying to change numeric observations into character so I can do my analysis, but it is showing blank observations for these variables:
Obs fast purge language age_group sex_category ethnicity USBorn slimfast DescribeWt HomeEnv 1 . . E <11 Male NonH Life No VeryOver PrettyT 2 . . E <11 Male NonH Life No SlghtOve VeryTru 3 . . E <11 Male Hisp Miss Mis Right Missing 4 . . E <11 Female NonH Life No Right VeryTru 5 . . E <11 Male Hisp Life No SltUnder PrettyT 6 . . E <11 Female Hisp Life No Right VeryTru 7 . . E <11 Female NonH Life No Right PrettyT 8 . . E <11 Female NonH Life No SlghtOve PrettyT 9 . . E <11 Female NonH Life No SltUnder VeryTru 10 . . E <11 Female NonH Life No Right VeryTru I know the observations are in fact numeric as it says "1" and "2" in the dataset
If the number has values 1 and 2, what is the benefit of changing them to character "1" and "2"? I see no benefit, it seems like unnecessary work.
Also, what part of your code fails if you the values are numeric? I don't see any part of your code that must have character values.
Anyway, once a variable is numeric, it cannot be changed to character. You can create a new character variable with values "1" and "2" if you'd like, but again, why?
A recommendation, don't create character variable categories like this:
IF age < 11 THEN age_group = "<11";
ELSE IF age = 12 THEN age_group = "12";
ELSE IF age = 13 THEN age_group = "13";
ELSE IF age = 14 THEN age_group = "14";
ELSE IF age >= 15 THEN age_group = "14+";
ELSE age_group = "Missing";
Instead use custom formats and leave the variable AGE as numeric.
proc format;
value agef 0-11='<11' 15-high='14+';
run;
/* Then in a DATA step or PROC use */
format age agef.;
If you are surprised that this block is resulting in the value "Missing":
IF v51 = 1 THEN fast = "Yes";
ELSE IF v51 = 2 THEN fast = "No";
ELSE fast = "Missing";
You could change it to e.g.:
IF v51 = 1 THEN fast = "Yes";
ELSE IF v51 = 2 THEN fast = "No";
ELSE IF missing(v51) then fast="Missing";
ELSE put "ERROR: Surprising value for v51 " v51= ;
That will write an error message to your log whenever the value of v51 is not one of the values you are expecting. Typically I tried to avoid using an ELSE statement that does anything other than write an ERROR message, because you never know what odd values may lurk in your data.
That said, as @PaigeMiller showed in SAS you do not need to create new character variables to create these sorts of labeled categories. A SAS format will do it for you, and is much more flexible.
@kcvaldez98 wrote:
I am trying to change numeric observations into character so I can do my analysis, but it is showing blank observations for these variables:
My code is this:
Obs fast purge language age_group sex_category ethnicity USBorn slimfast DescribeWt HomeEnv 1 . . E <11 Male NonH Life No VeryOver PrettyT ... DATA WORK.CLEANDATA; SET YRRSIMPT.import; ... IF v4 = 1 THEN ethnicity = "Hisp"; ELSE IF v4 = 2 THEN ethnicity = "NonHisp"; ELSE ethnicity = "Missing"; ... /* Categorize v51 */ IF v51 = 1 THEN fast = "Yes"; ELSE IF v51 = 2 THEN fast = "No"; ELSE fast = "Missing"; /* Categorize v52 */ IF v52 = 1 THEN purge = "Yes"; ELSE IF v52 = 2 THEN purge = "No"; ELSE purge = "Missing"; ... KEEP age_group ethnicity sex_category HomeEnv purge fast USBorn DescribeWt language slimfast; RUN;
Hi @kcvaldez98,
As you have probably noticed yourself, the "blank" values (displayed as periods ".") of variables fast and purge were due to existing numeric variables with these names in the input dataset YRRSIMPT.import.
How do I know this?
To avoid these problems (also the truncation of values of variable ethnicity and others), just adhere to maxims 2 (read the log), 3 (know your data), 25 (have a clean log) and 47 (set a length). And to maxim 8 (there is a format for it; here: "... rolling your own will usually beat complicated data step logic").
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.