BookmarkSubscribeRSS Feed
desireatem
Pyrite | Level 9

I am trying to create a new variable “Languages” fro.m the variable “Demographics__hc_languages”

However the created variable does not come out well despite the code ran without error

 

 

 

data CFI1;

set CFI;

 

if Demographics__hc_languages in ("English Spanish" "Spanish English") then languages="EnglishSpanish";

else if Demographics__hc_languages in ("English") then languages="EnglishOnly";

else if Demographics__hc_languages in ("Spanish") then languages="SpanishOnly";

else if Demographics__hc_languages  in ("English Creole Other","English Creole","English Other",

"English Spanish Creole","English Spanish Other","Other English", "Other", "Spanish Other") then languages="Other";

 

run;

 

 

Original Variable

 

 

The FREQ Procedure

Demographics::hc_languages

Demographics__hc_languages

Frequency

Percent

Cumulative
Frequency

Cumulative
Percent

English Creole Other

2

0.37

2

0.37

English Creole

7

1.29

9

1.66

English Other

10

1.84

19

3.50

English Spanish Creole

2

0.37

21

3.87

English Spanish Other

8

1.47

29

5.34

English Spanish

139

25.60

168

30.94

English

206

37.94

374

68.88

Other English

1

0.18

375

69.06

Other

6

1.10

381

70.17

Spanish English

19

3.50

400

73.66

Spanish Other

1

0.18

401

73.85

Spanish

142

26.15

543

100.00

Frequency Missing = 273

 

 

 

New Variable Not COMING OUT WELL??????????????????????????????

The SAS System

 

The FREQ Procedure

languages

Frequency

Percent

Cumulative
Frequency

Cumulative
Percent

EnglishOnly

206

58.19

206

58.19

Other

6

1.69

212

59.89

SpanishOnly

142

40.11

354

100.00

Frequency Missing = 462

 

 

13 REPLIES 13
novinosrin
Tourmaline | Level 20

Any chance of leading or trailing blank that may need stripping?

 

if strip(Demographics__hc_languages ) 

 

VDD
Ammonite | Level 13 VDD
Ammonite | Level 13
data CFI1;
set CFI;
 
if Demographics__hc_languages in ("English Spanish" ,"Spanish English") then languages="EnglishSpanish";
else if Demographics__hc_languages in ("English") then languages="EnglishOnly";
else if Demographics__hc_languages in ("Spanish") then languages="SpanishOnly";
else  languages="Other";
 
run;
novinosrin
Tourmaline | Level 20

Your eyes might be sharper 🙂

 

what's the correction?

desireatem
Pyrite | Level 9

I am confused because I do not see anything wrong with the code but the results if not coming as aspected;

 

if Demographics__hc_languages in ("English Spanish" ,"Spanish English") then languages="SpanishEnglish";
else if Demographics__hc_languages in ("English") then languages="EnglishOnly";
else if Demographics__hc_languages in ("Spanish") then languages="SpanishOnly";
else if Demographics__hc_languages^="" then languages="Other";

 

THE SPANISHENGLISH IS NOT COMING OUT;

 

Demographics::hc_languages
Demographics__hc_languages Frequency Percent Cumulative
Frequency
Cumulative
Percent
English Creole Other 2 0.37 2 0.37
English Creole 7 1.29 9 1.66
English Other 10 1.84 19 3.50
English Spanish Creole 2 0.37 21 3.87
English Spanish Other 8 1.47 29 5.34
English Spanish 139 25.60 168 30.94
English 206 37.94 374 68.88
Other English 1 0.18 375 69.06
Other 6 1.10 381 70.17
Spanish English 19 3.50 400 73.66
Spanish Other 1 0.18 401 73.85
Spanish 142 26.15 543 100.00
Frequency Missing = 273

 


The SAS System

The FREQ Procedure
languages Frequency Percent Cumulative
Frequency
Cumulative
Percent
EnglishOnly 206 37.94 206 37.94
Other 195 35.91 401 73.85
SpanishOnly 142 26.15 543 100.00
Frequency Missing = 273
novinosrin
Tourmaline | Level 20

is my suggestion failing too?

desireatem
Pyrite | Level 9

Nope, not working. Also your suggestion add missing data to the 'Other" group

novinosrin
Tourmaline | Level 20

Oh sorry no idea

VDD
Ammonite | Level 13 VDD
Ammonite | Level 13
if Demographics__hc_languages in ("English Spanish" ,"Spanish English") then languages="SpanishEnglish";
else if Demographics__hc_languages in ("English") then languages="EnglishOnly";
else if Demographics__hc_languages in ("Spanish") then languages="SpanishOnly";
else if Demographics__hc_languages^="" then languages="Other";
else languages = "MISSING";

how about if you account for the missing values.

Astounding
PROC Star
You can help diagnose the problem by running:

proc freq data=cfi1;
tables languages *
demographics_hc_languages / missing list;
run;

That would point out where the expected matches failed to be made.
desireatem
Pyrite | Level 9

The mismatched is on "English Spanish" and "Spanish English" that are classified under "Others"

 

languages Demographics__hc_languages Frequency Percent Cumulative
Frequency
Cumulative
Percent
    273 33.46 273 33.46
EnglishOnly English 206 25.25 479 58.70
Other English Creole Other 2 0.25 481 58.95
Other English Creole 7 0.86 488 59.80
Other English Other 10 1.23 498 61.03
Other English Spanish Creole 2 0.25 500 61.27
Other English Spanish Other 8 0.98 508 62.25
Other English Spanish 139 17.03 647 79.29
Other Other English 1 0.12 648 79.41
Other Other 6 0.74 654 80.15
Other Spanish English 19 2.33 673 82.48
Other Spanish Other 1 0.12 674 82.60
SpanishOnly Spanish 142 17.40 816 100.00
Astounding
PROC Star

OK, a good start.  Now it's time to roll up your sleeves and dig into the mismatches.

 

Do the data values contain two blanks between "English" and "Spanish" ?

 

Do the data values contain a tab character instead of a space? 

 

Do the data values contain a leading blank?

 

I'm assuming here that the program is correct, and the program uses one blank between "English" and "Spanish".

Tom
Super User Tom
Super User

SAS normally removes leading spaces when printing values for values like  "         English Spanish"  might look in the print-out like  "English Spanish".

There are also other characters that could be in the variable that look like spaces (or do not print at all) but are not.

Or it could be the variable is coded and what you are seeing is the formatted value and not what is actually in the variable.

 

Languages=strip(compbl(vvalue(Demographics__hc_languages)));
if languages = "English" then languages="EnglishOnly";
else if languages = "Spanish" then languages="SpanishOnly";
else if languages in ("English Spanish" "Spanish English") then languages="EnglishSpanish";
else languages="Other";
FreelanceReinh
Jade | Level 19

In order to know your data you should scrutinize these character values with the $HEXw. format:

proc freq data=cfi1;
format _char_ $hex60.; 
tables Demographics__hc_languages;
run;

This should reveal the true contents of this variable, provided that its length is <=30. (A likely culprit is the word separator, so format length 60 should be more than sufficient.)

 

If you're not curious about what exactly causes the issue, you may want to try and modify your IF conditions like

if compress(Demographics__hc_languages,'','ka') in: ('EnglishSpanish' 'SpanishEnglish')

This excludes all non-alphabetic characters from the comparison, which is why you need to remove the embedded blanks from the value lists as well. (Edit: Inserted the colon after IN just in case there are truncated alphabetic characters. But my recommendation is to apply the $HEXw. format. Edit 2: With the in: operator you'd need to check for the longest, i.e. three-word values first, then continue with the two-word values etc.!)

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1278 views
  • 0 likes
  • 6 in conversation