BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
kcvaldez98
Obsidian | Level 7

Hi SAS Coders: 

I am trying to change numeric observations into character so I can do my analysis, but it is showing blank observations for these variables: 

Obs fast purge language age_group sex_category ethnicity USBorn slimfast DescribeWt HomeEnv
1 . . E <11 Male NonH Life No VeryOver PrettyT
2 . . E <11 Male NonH Life No SlghtOve VeryTru
3 . . E <11 Male Hisp Miss Mis Right Missing
4 . . E <11 Female NonH Life No Right VeryTru
5 . . E <11 Male Hisp Life No SltUnder PrettyT
6 . . E <11 Female Hisp Life No Right VeryTru
7 . . E <11 Female NonH Life No Right PrettyT
8 . . E <11 Female NonH Life No SlghtOve PrettyT
9 . . E <11 Female NonH Life No SltUnder VeryTru
10 . . E <11 Female NonH Life No Right VeryTru

I know the observations are in fact numeric as it says "1" and "2" in the dataset

My code is this: 

DATA WORK.CLEANDATA;
    SET YRRSIMPT.import;

    /* Categorize age */
    IF age < 11 THEN age_group = "<11";
    ELSE IF age = 12 THEN age_group = "12";
    ELSE IF age = 13 THEN age_group = "13";
    ELSE IF age = 14 THEN age_group = "14";
    ELSE IF age >= 15 THEN age_group = "14+";
    ELSE age_group = "Missing";
    
     /* Categorize sex */
    IF sex = 1 THEN sex_category = "Female";
    ELSE IF sex = 2 THEN sex_category = "Male";
    ELSE sex_category = "Missing";
    
    IF v4 = 1 THEN ethnicity = "Hisp";
    ELSE IF v4 = 2 THEN ethnicity = "NonHisp";
    ELSE ethnicity = "Missing";
    
     /* Categorize v66 (USBorn) */
    IF v66 = 1 THEN USBorn = "<1Yr";
    ELSE IF v66 = 2 THEN USBorn = "1-3Yrs";
    ELSE IF v66 = 3 THEN USBorn = "4-6Yrs";
    ELSE IF v66 = 4 THEN USBorn = "6+Yr";
    ELSE IF v66 = 5 THEN USBorn = "Life";
    ELSE USBorn = "Missing";
    
     /* Categorize v51 */
    IF v51 = 1 THEN fast = "Yes";
    ELSE IF v51 = 2 THEN fast = "No";
    ELSE fast = "Missing";
    
    /* Categorize v52 */
    IF v52 = 1 THEN purge = "Yes";
    ELSE IF v52 = 2 THEN purge = "No";
    ELSE purge = "Missing";
    
      /* Categorize v53 */
    IF v53 = 1 THEN slimfast = "Yes";
    ELSE IF v53 = 2 THEN slimfast = "No";
    ELSE slimfast = "Missing";
    
    IF v49 = 1 THEN DescribeWt = "VUnderWt";
    ELSE IF v49 = 2 THEN DescribeWt = "SltUnder";
    ELSE IF v49 = 3 THEN DescribeWt = "Right";
    ELSE IF v49 = 4 THEN DescribeWt = "SlghtOver";
    ELSE IF v49 = 5 THEN DescribeWt = "VeryOver";
    ELSE DescribeWt = "Missing";
    
     /* Categorize v71 (HomeEnv) */
    IF v71 = 1 THEN HomeEnv = "NotTrue";
    ELSE IF v71 = 2 THEN HomeEnv = "LilTrue";
    ELSE IF v71 = 3 THEN HomeEnv = "PrettyTrue";
    ELSE IF v71 = 4 THEN HomeEnv = "VeryTrue";
    ELSE HomeEnv = "Missing";


    KEEP age_group ethnicity sex_category HomeEnv purge fast USBorn DescribeWt language slimfast;
RUN;
1 ACCEPTED SOLUTION

Accepted Solutions
Quentin
Super User

If you are surprised that this block is resulting in the value "Missing":

    IF v51 = 1 THEN fast = "Yes";
    ELSE IF v51 = 2 THEN fast = "No";
    ELSE fast = "Missing";

You could change it to e.g.:

    IF v51 = 1 THEN fast = "Yes";
    ELSE IF v51 = 2 THEN fast = "No";
    ELSE IF missing(v51) then fast="Missing";
    ELSE put "ERROR: Surprising value for v51 " v51= ;

That will write an error message to your log whenever the value of v51 is not one of the values you are expecting. Typically I tried to avoid using an ELSE statement that does anything other than write an ERROR message, because you never know what odd values may lurk in your data.

 

That said, as @PaigeMiller showed in SAS you do not need to create new character variables to create these sorts of labeled categories.  A SAS format will do it for you, and is much more flexible.

BASUG is hosting free webinars Next up: Mark Keintz presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

View solution in original post

3 REPLIES 3
PaigeMiller
Diamond | Level 26

@kcvaldez98 wrote:

Hi SAS Coders: 

I am trying to change numeric observations into character so I can do my analysis, but it is showing blank observations for these variables: 

Obs fast purge language age_group sex_category ethnicity USBorn slimfast DescribeWt HomeEnv
1 . . E <11 Male NonH Life No VeryOver PrettyT
2 . . E <11 Male NonH Life No SlghtOve VeryTru
3 . . E <11 Male Hisp Miss Mis Right Missing
4 . . E <11 Female NonH Life No Right VeryTru
5 . . E <11 Male Hisp Life No SltUnder PrettyT
6 . . E <11 Female Hisp Life No Right VeryTru
7 . . E <11 Female NonH Life No Right PrettyT
8 . . E <11 Female NonH Life No SlghtOve PrettyT
9 . . E <11 Female NonH Life No SltUnder VeryTru
10 . . E <11 Female NonH Life No Right VeryTru

I know the observations are in fact numeric as it says "1" and "2" in the dataset


If the number has values 1 and 2, what is the benefit of changing them to character "1" and "2"? I see no benefit, it seems like unnecessary work.

 

Also, what part of your code fails if you the values are numeric? I don't see any part of your code that must have character values.

 

Anyway, once a variable is numeric, it cannot be changed to character. You can create a new character variable with values "1" and "2" if you'd like, but again, why?

 

A recommendation, don't create character variable categories like this:

 

    IF age < 11 THEN age_group = "<11";
    ELSE IF age = 12 THEN age_group = "12";
    ELSE IF age = 13 THEN age_group = "13";
    ELSE IF age = 14 THEN age_group = "14";
    ELSE IF age >= 15 THEN age_group = "14+";
    ELSE age_group = "Missing";

 

Instead use custom formats and leave the variable AGE as numeric.

 

proc format;
     value agef 0-11='<11' 15-high='14+';
run;

/* Then in a DATA step or PROC use */
format age agef.;
--
Paige Miller
Quentin
Super User

If you are surprised that this block is resulting in the value "Missing":

    IF v51 = 1 THEN fast = "Yes";
    ELSE IF v51 = 2 THEN fast = "No";
    ELSE fast = "Missing";

You could change it to e.g.:

    IF v51 = 1 THEN fast = "Yes";
    ELSE IF v51 = 2 THEN fast = "No";
    ELSE IF missing(v51) then fast="Missing";
    ELSE put "ERROR: Surprising value for v51 " v51= ;

That will write an error message to your log whenever the value of v51 is not one of the values you are expecting. Typically I tried to avoid using an ELSE statement that does anything other than write an ERROR message, because you never know what odd values may lurk in your data.

 

That said, as @PaigeMiller showed in SAS you do not need to create new character variables to create these sorts of labeled categories.  A SAS format will do it for you, and is much more flexible.

BASUG is hosting free webinars Next up: Mark Keintz presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
FreelanceReinh
Jade | Level 19

@kcvaldez98 wrote:

I am trying to change numeric observations into character so I can do my analysis, but it is showing blank observations for these variables: 

Obs fast purge language age_group sex_category ethnicity USBorn slimfast DescribeWt HomeEnv
1 . . E <11 Male NonH Life No VeryOver PrettyT
...
My code is this: 
DATA WORK.CLEANDATA;
    SET YRRSIMPT.import;
...    
    IF v4 = 1 THEN ethnicity = "Hisp";
    ELSE IF v4 = 2 THEN ethnicity = "NonHisp";
    ELSE ethnicity = "Missing";
...    
     /* Categorize v51 */
    IF v51 = 1 THEN fast = "Yes";
    ELSE IF v51 = 2 THEN fast = "No";
    ELSE fast = "Missing";
    
    /* Categorize v52 */
    IF v52 = 1 THEN purge = "Yes";
    ELSE IF v52 = 2 THEN purge = "No";
    ELSE purge = "Missing";
...
    KEEP age_group ethnicity sex_category HomeEnv purge fast USBorn DescribeWt language slimfast;
RUN;

Hi @kcvaldez98,

 

As you have probably noticed yourself, the "blank" values (displayed as periods ".") of variables fast and purge were due to existing numeric variables with these names in the input dataset YRRSIMPT.import.

 

How do I know this?

  1. If these variables had been newly created in the DATA step shown, they would have been created as character variables of length 3 bytes because of their first occurrence in an assignment statement with value "Yes". Since either  v51 = 1 or  v51 = 2 or  not (v51 = 1 or v51 = 2) must be true, in every observation variable fast would have been assigned one of the values "Yes", "No" or "Mis" (truncated because of the insufficient length), depending on the value of v51. And similarly purge, depending on v52. Note that this argument holds even if v51 was a character variable in YRRSIMPT.import (in which case an automatic conversion of character to numeric values would have occurred) or if it was not existing in YRRSIMPT.import (in which case v51 would have been created in the DATA step as an uninitialized numeric variable).
  2. If fast had been existing as a character variable of length k  in YRRSIMPT.import, it had been assigned either of the values "Yes", "No" or "Missing"truncated to length k, e.g., "N" for k=1 (if v51=2) or "Missi" for k=5 (if v51 not in (1,2)). And similarly for purge.
  3. So, existing numeric variables fast and purge from YRRSIMPT.import is the only remaining possibility. In this case, indeed, the assignment of a character value, be it "Yes", "No" or "Missing", leads to numeric missing values -- fast=. or purge=., respectively -- together with "Invalid numeric data ..." and character-to-numeric conversion notes in the log.

To avoid these problems (also the truncation of values of variable ethnicity and others), just adhere to maxims 2 (read the log), 3 (know your data), 25 (have a clean log) and 47 (set a length). And to maxim 8 (there is a format for it; here: "... rolling your own will usually beat complicated data step logic").

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 267 views
  • 0 likes
  • 4 in conversation