@kcvaldez98 wrote:
I am trying to change numeric observations into character so I can do my analysis, but it is showing blank observations for these variables:
Obs
fast
purge
language
age_group
sex_category
ethnicity
USBorn
slimfast
DescribeWt
HomeEnv
1
.
.
E
<11
Male
NonH
Life
No
VeryOver
PrettyT
...
My code is this:
DATA WORK.CLEANDATA;
SET YRRSIMPT.import;
...
IF v4 = 1 THEN ethnicity = "Hisp";
ELSE IF v4 = 2 THEN ethnicity = "NonHisp";
ELSE ethnicity = "Missing";
...
/* Categorize v51 */
IF v51 = 1 THEN fast = "Yes";
ELSE IF v51 = 2 THEN fast = "No";
ELSE fast = "Missing";
/* Categorize v52 */
IF v52 = 1 THEN purge = "Yes";
ELSE IF v52 = 2 THEN purge = "No";
ELSE purge = "Missing";
...
KEEP age_group ethnicity sex_category HomeEnv purge fast USBorn DescribeWt language slimfast;
RUN;
Hi @kcvaldez98,
As you have probably noticed yourself, the "blank" values (displayed as periods ".") of variables fast and purge were due to existing numeric variables with these names in the input dataset YRRSIMPT.import.
How do I know this?
If these variables had been newly created in the DATA step shown, they would have been created as character variables of length 3 bytes because of their first occurrence in an assignment statement with value "Yes". Since either v51 = 1 or v51 = 2 or not (v51 = 1 or v51 = 2) must be true, in every observation variable fast would have been assigned one of the values "Yes", "No" or "Mis" (truncated because of the insufficient length), depending on the value of v51. And similarly purge, depending on v52. Note that this argument holds even if v51 was a character variable in YRRSIMPT.import (in which case an automatic conversion of character to numeric values would have occurred) or if it was not existing in YRRSIMPT.import (in which case v51 would have been created in the DATA step as an uninitialized numeric variable).
If fast had been existing as a character variable of length k in YRRSIMPT.import, it had been assigned either of the values "Yes", "No" or "Missing", truncated to length k, e.g., "N" for k=1 (if v51=2) or "Missi" for k=5 (if v51 not in (1,2)). And similarly for purge.
So, existing numeric variables fast and purge from YRRSIMPT.import is the only remaining possibility. In this case, indeed, the assignment of a character value, be it "Yes", "No" or "Missing", leads to numeric missing values -- fast=. or purge=., respectively -- together with "Invalid numeric data ..." and character-to-numeric conversion notes in the log.
To avoid these problems (also the truncation of values of variable ethnicity and others), just adhere to maxims 2 (read the log), 3 (know your data), 25 (have a clean log) and 47 (set a length). And to maxim 8 (there is a format for it; here: "... rolling your own will usually beat complicated data step logic").
... View more