Hi everyone,
I am trying to create a new variable but am running into a problem where the new variable returns "." for all observations, even though I can see from my data that these should not be missing. I am using SAS9. Please see my code below to illustrate the problem I am having:
data mergedtest; set work.merged_all; if completed="Y" then test1=1; if experience= "1 to 5 years" then test2=1; if stream= "General Labour" then test3=1; run;
The "completed", "experience", and "stream" variables are all string variables. SAS does not tell me that there are any errors when I run this data step. However, when I run the tables command on test1, test2, and test3 (to check to see if variables were created properly), test1 and test2 are fine, but test3 returns all missing values (even though i can see from the raw data that several observations have a value of "General Labour" for the "stream" variable).
Has anyone had this issue before? Any thoughts on what might be causing this problem? Any suggestions would be very appreciated.
Thank you for taking the time to help!
Take care,
Yuna
So if DEBUG2 is Y, it suggests there is something else in the data besides blanks.
The most common suspects: a carriage return, or hex null characters instead of blanks.
The way to find this out is to print out the values in hex form. For example, if STREAM is defined as $ 25, try:
hex_format = put(stream, $hex50.);
Then run the PROC FREQ on HEX_FORMAT.
If you want a quick fix for the program without addressing stray characters in the data, you can always use:
if stream =: 'General Labour' then test3 = 1;
@Yuna1 wrote:
Hi everyone,
I am trying to create a new variable but am running into a problem where the new variable returns "." for all observations, even though I can see from my data that these should not be missing. I am using SAS9. Please see my code below to illustrate the problem I am having:
data mergedtest; set work.merged_all; if completed="Y" then test1=1; if experience= "1 to 5 years" then test2=1; if stream= "General Labour" then test3=1; run;
The "completed", "experience", and "stream" variables are all string variables. SAS does not tell me that there are any errors when I run this data step. However, when I run the tables command on test1, test2, and test3 (to check to see if variables were created properly), test1 and test2 are fine, but test3 returns all missing values (even though i can see from the raw data that several observations have a value of "General Labour" for the "stream" variable).
When stream is not "General Labour", test3 gets assign a missing value. Similarly, if completed is not "Y", test1 gets assigned missing. And I'm sure you can see the pattern and so test2 also gets assigned missing.
Hi PaigeMiller,
Thank you for your response! I see what you're saying for sure; that makes complete sense.
I ran a table command on the "Stream" variable to see how many "General Labour" responses I should expect and I was hoping that test3 would equal 1 for 11,908 observations (and the rest would be missing). So I'm at a loss as to what might be going on here.
Thanks again - any additional thoughts you may have are, of course, very welcome!
Yuna
It's time to dig into the values of STREAM, to see what is really contained there. For example:
data want;
set have;
if stream ne 'General Labour' then do;
if left(stream) = 'General Labour' then debug1='Y';
if stream =: 'General Labour' then debug2='Y';
end;
run;
DEBUG1: find out whether the values of STREAM contain leading blanks
DEBUG2: find out whether some stray character such as a carriage return appears after the characters "General Labour"
Anything else you could imagine can be tested. But SAS is telling you that the actual value is actually a little bit different than "General Labour"
Some more options to test beyond given by @Astounding:
- can it be that "General Labour" has two spaces between the words or some unvisible characters at end?
Check by
- if compbl(stream) = 'General Labour' then debug3='Y';
- if length(strip(stream)) ne length('General Labour') then debug4='Y';
So if DEBUG2 is Y, it suggests there is something else in the data besides blanks.
The most common suspects: a carriage return, or hex null characters instead of blanks.
The way to find this out is to print out the values in hex form. For example, if STREAM is defined as $ 25, try:
hex_format = put(stream, $hex50.);
Then run the PROC FREQ on HEX_FORMAT.
If you want a quick fix for the program without addressing stray characters in the data, you can always use:
if stream =: 'General Labour' then test3 = 1;
Perhaps instead of
if completed="Y" then test1=1;
You actually want
Test1 = (completed="Y");
This will assign a value of 1 ("true", "yes" similar boolean interpretations apply) when completed equals 'Y' and 0 when completed is not equal to 'Y'.
And similar for you other test variables.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.