I'm trying to use the 2017 BRFSS data to look at healthcare coverage (HLTHPLN1).
The CDC has the HLTHPLN1 coded by:
1=Yes
2= no
7=don't know
9=refused
I want to only look at YES and NO, so I'm trying to get rid of/delete those observations where HLTHPLN1=7 or 9. I've tried using the WHERE statement to just select those I'm interested in. I've tried the IF/Delete statement. I've tried with and without quotation marks.
Here is my code:
Libname BRFSS '/folders/myfolders/'; Run; Data BRFSS.BRF17HP; Set BRFSS.BRF17; If=(HLTHPLN1=1 or HLTHPLN1=2); If=(SEX=1 or SEX=2); Run; Proc sort data=BRFSS.BRF17HP; By SEX; Proc surveyfreq data=BRFSS.BRF17; table HLTHPLN1; Strata _STSTR; Cluster _PSU; Weight _LLCPWT; Run;
Here is log:
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 72 73 74 Libname BRFSS '/folders/myfolders/'; NOTE: Libref BRFSS was successfully assigned as follows: Engine: V9 Physical Name: /folders/myfolders 75 Run; 76 Data BRFSS.BRF17HP; 77 Set BRFSS.BRF17; 78 If=(HLTHPLN1=1 or HLTHPLN1=2); 79 If=(SEX=1 or SEX=2); 80 Run; NOTE: There were 5843 observations read from the data set BRFSS.BRF17. NOTE: The data set BRFSS.BRF17HP has 5843 observations and 359 variables. NOTE: DATA statement used (Total process time): real time 1.79 seconds cpu time 1.79 seconds 81 Proc sort data=BRFSS.BRF17HP; 82 By SEX; NOTE: There were 5843 observations read from the data set BRFSS.BRF17HP. NOTE: The data set BRFSS.BRF17HP has 5843 observations and 359 variables. NOTE: PROCEDURE SORT used (Total process time): real time 2.47 seconds cpu time 2.78 seconds 83 Proc surveyfreq data=BRFSS.BRF17; 84 table HLTHPLN1; 85 Strata _STSTR; 86 Cluster _PSU; 87 Weight _LLCPWT; 88 Run; NOTE: PROCEDURE SURVEYFREQ used (Total process time): real time 0.21 seconds cpu time 0.00 seconds 89 90 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 102 User: sasdemo DATA Step Statements
And here's the final output:
Clearly, my approach isn't working, because the categories for 7 and 9 are still there.
What am I doing wrong, and how can I fix it?
In addition to what @SASKiwi said, I think the problem is here:
Proc surveyfreq data=BRFSS.BRF17
I think you want
Proc surveyfreq data=BRFSS.BRF17HP;
Please consider one of your statements:
If=(HLTHPLN1=1 or HLTHPLN1=2);
What this actually does is create a variable called IF that is set to 1 if HLTHPLN1=1 or HLTHPLN1=2, and 0 if not. Is doesn't filter your data at all. What I suspect you want is:
If (HLTHPLN1=1 or HLTHPLN1=2); * Subsetting IF;
where (HLTHPLN1=1 or HLTHPLN1=2); * Subsetting WHERE;
BTW, the brackets aren't necessary. You will get the same answer by removing them.
In addition to what @SASKiwi said, I think the problem is here:
Proc surveyfreq data=BRFSS.BRF17
I think you want
Proc surveyfreq data=BRFSS.BRF17HP;
Gosh darn it, the devil is always in the details! Thank you! I just needed a new pair of eyes looking at it apparently.
New code (that produces a matching output to the official one that I'm trying to check myself against):
Libname BRFSS '/folders/myfolders/'; Run; Data BRFSS.BRF17HP; Set BRFSS.BRF17; If HLTHPLN1=1 then PLN='yes'; If HLTHPLN1=2 then PLN='no'; If SEX=1 then GEN='male'; If SEX=2 then GEN='female'; Run; Proc sort data=BRFSS.BRF17HP; By GEN; Proc surveyfreq data=BRFSS.BRF17HP; table PLN; Strata _STSTR; Cluster _PSU; Weight _LLCPWT; Run;
To also mention, the IF statement vs WHERE statement is a really good tip--I either forgot that concept or was never taught the difference. Thank you!
@SAS93 wrote:
I'm trying to use the 2017 BRFSS data to look at healthcare coverage (HLTHPLN1).
The CDC has the HLTHPLN1 coded by:
1=Yes
2= no
7=don't know
9=refused
I want to only look at YES and NO, so I'm trying to get rid of/delete those observations where HLTHPLN1=7 or 9. I've tried using the WHERE statement to just select those I'm interested in. I've tried the IF/Delete statement. I've tried with and without quotation marks.
Here is my code:
Libname BRFSS '/folders/myfolders/'; Run; Data BRFSS.BRF17HP; Set BRFSS.BRF17; If=(HLTHPLN1=1 or HLTHPLN1=2); If=(SEX=1 or SEX=2); Run; Proc sort data=BRFSS.BRF17HP; By SEX; Proc surveyfreq data=BRFSS.BRF17; table HLTHPLN1; Strata _STSTR; Cluster _PSU; Weight _LLCPWT; Run;Here is log:
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 72 73 74 Libname BRFSS '/folders/myfolders/'; NOTE: Libref BRFSS was successfully assigned as follows: Engine: V9 Physical Name: /folders/myfolders 75 Run; 76 Data BRFSS.BRF17HP; 77 Set BRFSS.BRF17; 78 If=(HLTHPLN1=1 or HLTHPLN1=2); 79 If=(SEX=1 or SEX=2); 80 Run; NOTE: There were 5843 observations read from the data set BRFSS.BRF17. NOTE: The data set BRFSS.BRF17HP has 5843 observations and 359 variables. NOTE: DATA statement used (Total process time): real time 1.79 seconds cpu time 1.79 seconds 81 Proc sort data=BRFSS.BRF17HP; 82 By SEX; NOTE: There were 5843 observations read from the data set BRFSS.BRF17HP. NOTE: The data set BRFSS.BRF17HP has 5843 observations and 359 variables. NOTE: PROCEDURE SORT used (Total process time): real time 2.47 seconds cpu time 2.78 seconds 83 Proc surveyfreq data=BRFSS.BRF17; 84 table HLTHPLN1; 85 Strata _STSTR; 86 Cluster _PSU; 87 Weight _LLCPWT; 88 Run; NOTE: PROCEDURE SURVEYFREQ used (Total process time): real time 0.21 seconds cpu time 0.00 seconds 89 90 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 102 User: sasdemo DATA Step StatementsAnd here's the final output:
Clearly, my approach isn't working, because the categories for 7 and 9 are still there.
What am I doing wrong, and how can I fix it?
Something else that you want to consider are the CDC added variables they use for analysis with the BRFSS data. These will have names that start with an _ . Depending on the analysis you want to perform you may find some things, such as age groups and some of the values that use 2 or more variables to determine such as BMI categories, already coded in the data. Labels are your friend.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.