I am cleaning some extraneous strings from a text field
Data SASCDC_2.Arias_County_RC_ETHNICITY_FILL; Set SASCDC_2.Arias_County_RC_ETHNIC_CLEANUP; If Identify_Race_or_Ethnicity in ("American_Indian", "American_Indian_(Cheeroke)", "American_Indian_or_Alaskan Native", "Apache", "Caucasian/_NAtive_American_Cow_Creek", "Caucasian/_Native_American", "Native_American", "Native_American_-_Cherokee", "Native_American_-_Turtle_Mountain_Reservation", "Native_American_German", "Native_American_Irish", "Native_American_and_White", "Native_American,_Hispanic,_Caucasian", "Native_American/Hispanic", "Native_America/Mexican", "Native_American/White", "Native/Indigenous", "Rosebud_Sioux_Tribe", "native_American", "native_american", "American_Indian_+_Other_White", "American_Indian_and_Caucasion", "Alaska_Native", "Alaskan_Native", "American_Indian/Hispanic") then Identify_Race_or_Ethnicity = 'NATIVE_AMERICAN'; run;
In the survey the developers allowed individuals to provide free-form descriptions of R/E. Then these free form responses are to be put into the stylized R/E buckets used by the Census and others. Here is the log
Set SASCDC_2.Arias_County_RC_ETHNIC_CLEANUP; 577 If Identify_Race_or_Ethnicity in 577! ("American_Indian","American_Indian_(Cheeroke)","American_Indian_or_Alaskan --- --- --- --- --- 49 49 49 49 49 577! Native","Apache","Caucasian/_NAtive_American_Cow_Creek","Caucasian/_Native_American", NOTE 49-169: The meaning of an identifier after a quoted string might change in a future SAS release. Inserting white space between a quoted string and the succeeding identifier is recommended. NOTE 49-169: The meaning of an identifier after a quoted string might change in a future SAS release. Inserting white space between a quoted string and the succeeding identifier is recommended. 581 "American_Indian/Hispanic") then Identify_Race_or_Ethnicity = 'NATIVE_AMERICAN' 581! ; 582 run; 581 "American_Indian/Hispanic") then Identify_Race_or_Ethnicity = 'NATIVE_AMERICAN' ------------------------------------------------------ 49 581! ; NOTE 49-169: The meaning of an identifier after a quoted string might change in a future SAS release. Inserting white space between a quoted string and the succeeding identifier is recommended. 583 Data SASCDC_2.Arias_County_RC_ETHNIC_CLEANUP; 584 Set SASCDC_2.Arias_County_RC_ETHNICITY_A; 585 586 If Identify_Race_or_Ethnicity in ("Mexican-Indian (these were their own words)", "US ---- ---- ---- ---- 49 49 49 49
I am not sure what the log is trying to tell me? It is a note not an error technically, however when I check the field that is to be changed many of the free form texts are still there. I am searching for answers or remedies (that is why there are underscores between letters and other text expressions in the above).
In some other coding like Python that is sensitive to whitespace there is at least and explanation of how to correct and then the code runs as expected. Here I think SAS is ambiguous (maybe not - so that is why I am asking how to correct this - even though it is just a note because the code doesn't produce the expected result.)
Thank you for your help.
wlierman
The error message is just the place where SAS compiler was able to see something strange. It has tagged this code was strange.
","A
So it is saying that you shouldn't use A right after the quoted comma because SAS might decide to use that to indicate some type of special constant. Like they use D, T and DT for date, time and datetime constants.
Look higher up somewhere in your code for unbalanced quotes.
Of course it depends on exactly which race/ethnicity coding you are emulating. There is more than one "census" encoding scheme and you have several of those that would fall into "more than one race" in at least one of the schemes.
I've often used custom informats to address such issues to map relatively open text to specific values with an Other= _error_ option to have the log tell me when there are values I didn't expect.
One small advantage of this is you can use the UPCASE option with the invalue so that the case is converted to all uppercase when the comparison is made so that these are the same:
"Native_American" "native_American" "native_american"
as well as any other mix of capitalization.
Yes it can lead to longish Proc Format code but sometimes having all the values in one place makes keeping track of such things easier.
And the proc informat code currently doesn't generate such warnings:
13 proc format; 14 invalue $re (upcase) 15 "AMERICAN_INDIAN", "AMERICAN_INDIAN_(CHEEROKE)", 15 ! "AMERICAN_INDIAN_OR_ALASKAN NATIVE", "APACHE", 15 ! "CAUCASIAN/_NATIVE_AMERICAN_COW_CREEK", 15 ! "CAUCASIAN/_NATIVE_AMERICAN", 16 "NATIVE_AMERICAN", "NATIVE_AMERICAN_-_CHEROKEE", 16 ! "NATIVE_AMERICAN_-_TURTLE_MOUNTAIN_RESERVATION", 16 ! "NATIVE_AMERICAN_GERMAN", "NATIVE_AMERICAN_IRISH", 17 "NATIVE_AMERICAN_AND_WHITE", 17 ! "NATIVE_AMERICAN,_HISPANIC,_CAUCASIAN", 17 ! "NATIVE_AMERICAN/HISPANIC", "NATIVE_AMERICA/MEXICAN", 17 ! "NATIVE_AMERICAN/WHITE", "NATIVE/INDIGENOUS", 18 "ROSEBUD_SIOUX_TRIBE", "AMERICAN_INDIAN_+_OTHER_WHITE", 18 ! "AMERICAN_INDIAN_AND_CAUCASION", "ALASKA_NATIVE", 18 ! "ALASKAN_NATIVE", 19 "AMERICAN_INDIAN/HISPANIC" = 'NATIVE_AMERICAN' 20 ; NOTE: Informat $RE has been output. 21 run; NOTE: PROCEDURE FORMAT used (Total process time): real time 0.02 seconds cpu time 0.01 seconds
The error message is just the place where SAS compiler was able to see something strange. It has tagged this code was strange.
","A
So it is saying that you shouldn't use A right after the quoted comma because SAS might decide to use that to indicate some type of special constant. Like they use D, T and DT for date, time and datetime constants.
Look higher up somewhere in your code for unbalanced quotes.
Hi @wlierman ,
Further evidence of what @Tom has said can be seen by looking at your log, one data step has ended with a run statement and another has begun with a data statement, but no notes have appeared in the log about how many observations and variables are in your data set.
If all of your quotes do appear to be balanced then try closing and restarting your session.
If you still have problems after that then, after another session restart, try running one data step at a time and make sure you get a data set created at each step. As soon as a data step does not create a data set then your problem is likely in that data step.
HTH.
Kind regards,
Amir.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.