Hi,
I want to categorize Alcohol Consumption into 1, 2, and 3. When I used the below code I get missing values, But there should not be any missing values as I set the conditions in my code. I was wondering If I did something wrong in my code?
data work.temp ; set work.LLCP2014 ; where alcday5 in ( 101<=alcday5<=299, 888);
if 101<=alcday5<=199 then alcohol = 3;
if 201<=alcday5<=299 then alcohol = 2;
if alcday5 = 888 then alcohol = 1;
run;
P.S. Below is the original raw variable picture.
The code you used does not address either of the Alcday5 =777, 999 or "blank". Which from that summary you show would have roughly 25,400 missing values of your Alcohol variable if using the same data.
IMO you cannot ignore the missing and should understand it better and/or fix it first.
Also, it looks like your original variable is character? Are you getting a bunch of type conversion notes in your log?
A clean log is a good practice - your code shouldn't generate any warnings or errors. Otherwise you can't tell the difference between an actual issue or one that you thought wasn't an issue and left in your code.
After a recode I always check my results via PROC FREQ.
proc freq data=have;
table original_variable * new_variable / missprint ;
run;
@zahidhasandipu wrote:
Does this mean I can go ahead with my analysis and ignore the missing value?
Since I am going to continue assuming this is BRFSS data you first have to understand what a "blank" in a value means. It usually means that the question was not asked, often because of skip patterns.
Example (not quite from the survey but demonstrates the point);
Q5.1 Have you ever smoked cigarettes?
1 Yes
2 No
7 Don't know
9 Refused
If the response to the above is 2 (no) the survey very likely skips to the next section and the following questions "Q5.2 How frequently do you smoke", "Q5.3 How many cigarettes do you smoke on the days you smoke." would not be asked at all. So these would have blanks.
The BRFSS also has modules that are optional and some states will ask some modules and others don't. So you have entire blocks of questions that were not even presented to the respondent. These you definitely would not include in any way as there was never any chance of having data for the questions.
The skip pattern and optional modules are one reason that analysis code using the SAS survey procedures should include the option NOMCAR. The is actually an acronym for "NOt Missing Completely at Random". The option is used when there are known systematic missing values :ie skips.
How you interpret "Don't Know" or "Refused" in an analysis is part of the art.
From experience we typically look at the raw value frequencies to see about the "Don't Know" or "Refused" responses. Mostly because, in the terms of the BRFSS, high numbers of those responses are more likely to indicate a problem with the question wording or possibly an incorrect skip pattern (Example would be say "No" to the "ever smoked" example and then get asked "how many cigarettes do you smoke" question; very likely to get an answer other than the number of cigarettes".
Other topics are sensitive to different people and you get refusals: illicit drug use, sexual behaviors, weight, age for instance.
Typically most of the BRSS analysis does exclude the don't know/ refused and the analysis is phrased with something like "Among respondents that answered .... ".
You do need to be aware of the missing if using a regression procedure because by default any of the records that have a missing value for any of the dependent or independent variables on a model statement are excluded from analysis. So you may need to use options to include the missing.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.