- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to create a new variable “Languages” fro.m the variable “Demographics__hc_languages”
However the created variable does not come out well despite the code ran without error
data CFI1;
set CFI;
if Demographics__hc_languages in ("English Spanish" "Spanish English") then languages="EnglishSpanish";
else if Demographics__hc_languages in ("English") then languages="EnglishOnly";
else if Demographics__hc_languages in ("Spanish") then languages="SpanishOnly";
else if Demographics__hc_languages in ("English Creole Other","English Creole","English Other",
"English Spanish Creole","English Spanish Other","Other English", "Other", "Spanish Other") then languages="Other";
run;
Original Variable
The FREQ Procedure
|
Demographics::hc_languages |
||||
|
Demographics__hc_languages |
Frequency |
Percent |
Cumulative |
Cumulative |
|
English Creole Other |
2 |
0.37 |
2 |
0.37 |
|
English Creole |
7 |
1.29 |
9 |
1.66 |
|
English Other |
10 |
1.84 |
19 |
3.50 |
|
English Spanish Creole |
2 |
0.37 |
21 |
3.87 |
|
English Spanish Other |
8 |
1.47 |
29 |
5.34 |
|
English Spanish |
139 |
25.60 |
168 |
30.94 |
|
English |
206 |
37.94 |
374 |
68.88 |
|
Other English |
1 |
0.18 |
375 |
69.06 |
|
Other |
6 |
1.10 |
381 |
70.17 |
|
Spanish English |
19 |
3.50 |
400 |
73.66 |
|
Spanish Other |
1 |
0.18 |
401 |
73.85 |
|
Spanish |
142 |
26.15 |
543 |
100.00 |
|
Frequency Missing = 273 |
||||
New Variable Not COMING OUT WELL??????????????????????????????
|
The SAS System |
The FREQ Procedure
|
languages |
Frequency |
Percent |
Cumulative |
Cumulative |
|
EnglishOnly |
206 |
58.19 |
206 |
58.19 |
|
Other |
6 |
1.69 |
212 |
59.89 |
|
SpanishOnly |
142 |
40.11 |
354 |
100.00 |
|
Frequency Missing = 462 |
||||
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Any chance of leading or trailing blank that may need stripping?
if strip(Demographics__hc_languages )
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data CFI1;
set CFI;
if Demographics__hc_languages in ("English Spanish" ,"Spanish English") then languages="EnglishSpanish";
else if Demographics__hc_languages in ("English") then languages="EnglishOnly";
else if Demographics__hc_languages in ("Spanish") then languages="SpanishOnly";
else languages="Other";
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your eyes might be sharper 🙂
what's the correction?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am confused because I do not see anything wrong with the code but the results if not coming as aspected;
if Demographics__hc_languages in ("English Spanish" ,"Spanish English") then languages="SpanishEnglish";
else if Demographics__hc_languages in ("English") then languages="EnglishOnly";
else if Demographics__hc_languages in ("Spanish") then languages="SpanishOnly";
else if Demographics__hc_languages^="" then languages="Other";
THE SPANISHENGLISH IS NOT COMING OUT;
| Demographics::hc_languages | ||||
|---|---|---|---|---|
| Demographics__hc_languages | Frequency | Percent | Cumulative Frequency |
Cumulative Percent |
| English Creole Other | 2 | 0.37 | 2 | 0.37 |
| English Creole | 7 | 1.29 | 9 | 1.66 |
| English Other | 10 | 1.84 | 19 | 3.50 |
| English Spanish Creole | 2 | 0.37 | 21 | 3.87 |
| English Spanish Other | 8 | 1.47 | 29 | 5.34 |
| English Spanish | 139 | 25.60 | 168 | 30.94 |
| English | 206 | 37.94 | 374 | 68.88 |
| Other English | 1 | 0.18 | 375 | 69.06 |
| Other | 6 | 1.10 | 381 | 70.17 |
| Spanish English | 19 | 3.50 | 400 | 73.66 |
| Spanish Other | 1 | 0.18 | 401 | 73.85 |
| Spanish | 142 | 26.15 | 543 | 100.00 |
| Frequency Missing = 273 | ||||
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
is my suggestion failing too?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Nope, not working. Also your suggestion add missing data to the 'Other" group
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Oh sorry no idea
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
if Demographics__hc_languages in ("English Spanish" ,"Spanish English") then languages="SpanishEnglish";
else if Demographics__hc_languages in ("English") then languages="EnglishOnly";
else if Demographics__hc_languages in ("Spanish") then languages="SpanishOnly";
else if Demographics__hc_languages^="" then languages="Other";
else languages = "MISSING";
how about if you account for the missing values.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
proc freq data=cfi1;
tables languages *
demographics_hc_languages / missing list;
run;
That would point out where the expected matches failed to be made.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The mismatched is on "English Spanish" and "Spanish English" that are classified under "Others"
| languages | Demographics__hc_languages | Frequency | Percent | Cumulative Frequency |
Cumulative Percent |
|---|---|---|---|---|---|
| 273 | 33.46 | 273 | 33.46 | ||
| EnglishOnly | English | 206 | 25.25 | 479 | 58.70 |
| Other | English Creole Other | 2 | 0.25 | 481 | 58.95 |
| Other | English Creole | 7 | 0.86 | 488 | 59.80 |
| Other | English Other | 10 | 1.23 | 498 | 61.03 |
| Other | English Spanish Creole | 2 | 0.25 | 500 | 61.27 |
| Other | English Spanish Other | 8 | 0.98 | 508 | 62.25 |
| Other | English Spanish | 139 | 17.03 | 647 | 79.29 |
| Other | Other English | 1 | 0.12 | 648 | 79.41 |
| Other | Other | 6 | 0.74 | 654 | 80.15 |
| Other | Spanish English | 19 | 2.33 | 673 | 82.48 |
| Other | Spanish Other | 1 | 0.12 | 674 | 82.60 |
| SpanishOnly | Spanish | 142 | 17.40 | 816 | 100.00 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
OK, a good start. Now it's time to roll up your sleeves and dig into the mismatches.
Do the data values contain two blanks between "English" and "Spanish" ?
Do the data values contain a tab character instead of a space?
Do the data values contain a leading blank?
I'm assuming here that the program is correct, and the program uses one blank between "English" and "Spanish".
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
SAS normally removes leading spaces when printing values for values like " English Spanish" might look in the print-out like "English Spanish".
There are also other characters that could be in the variable that look like spaces (or do not print at all) but are not.
Or it could be the variable is coded and what you are seeing is the formatted value and not what is actually in the variable.
Languages=strip(compbl(vvalue(Demographics__hc_languages)));
if languages = "English" then languages="EnglishOnly";
else if languages = "Spanish" then languages="SpanishOnly";
else if languages in ("English Spanish" "Spanish English") then languages="EnglishSpanish";
else languages="Other";
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
In order to know your data you should scrutinize these character values with the $HEXw. format:
proc freq data=cfi1;
format _char_ $hex60.;
tables Demographics__hc_languages;
run;
This should reveal the true contents of this variable, provided that its length is <=30. (A likely culprit is the word separator, so format length 60 should be more than sufficient.)
If you're not curious about what exactly causes the issue, you may want to try and modify your IF conditions like
if compress(Demographics__hc_languages,'','ka') in: ('EnglishSpanish' 'SpanishEnglish')
This excludes all non-alphabetic characters from the comparison, which is why you need to remove the embedded blanks from the value lists as well. (Edit: Inserted the colon after IN just in case there are truncated alphabetic characters. But my recommendation is to apply the $HEXw. format. Edit 2: With the in: operator you'd need to check for the longest, i.e. three-word values first, then continue with the two-word values etc.!)