BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SAS93
Quartz | Level 8

I'm trying to use the 2017 BRFSS data to look at healthcare coverage (HLTHPLN1).

 

The CDC has the HLTHPLN1 coded by:

1=Yes

2= no

7=don't know

9=refused

 

I want to only look at YES and NO, so I'm trying to get rid of/delete those observations where HLTHPLN1=7 or 9. I've tried using the WHERE statement to just select those I'm interested in. I've tried the IF/Delete statement. I've tried with and without quotation marks.

 

Here is my code:

Libname BRFSS '/folders/myfolders/';
Run;
Data BRFSS.BRF17HP;
	Set BRFSS.BRF17;
	If=(HLTHPLN1=1 or HLTHPLN1=2);
	If=(SEX=1 or SEX=2);
Run;
Proc sort data=BRFSS.BRF17HP;
	By SEX;
Proc surveyfreq data=BRFSS.BRF17;
	table HLTHPLN1;
	Strata _STSTR;
	Cluster _PSU;
	Weight _LLCPWT;
Run;

Here is log:

 1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 72         
 73         
 74         Libname BRFSS '/folders/myfolders/';
 NOTE: Libref BRFSS was successfully assigned as follows: 
       Engine:        V9 
       Physical Name: /folders/myfolders
 75         Run;
 76         Data BRFSS.BRF17HP;
 77         Set BRFSS.BRF17;
 78         If=(HLTHPLN1=1 or HLTHPLN1=2);
 79         If=(SEX=1 or SEX=2);
 80         Run;
 
 NOTE: There were 5843 observations read from the data set BRFSS.BRF17.
 NOTE: The data set BRFSS.BRF17HP has 5843 observations and 359 variables.
 NOTE: DATA statement used (Total process time):
       real time           1.79 seconds
       cpu time            1.79 seconds
       
 
 81         Proc sort data=BRFSS.BRF17HP;
 82         By SEX;
 
 NOTE: There were 5843 observations read from the data set BRFSS.BRF17HP.
 NOTE: The data set BRFSS.BRF17HP has 5843 observations and 359 variables.
 NOTE: PROCEDURE SORT used (Total process time):
       real time           2.47 seconds
       cpu time            2.78 seconds
       
 
 83         Proc surveyfreq data=BRFSS.BRF17;
 84         table HLTHPLN1;
 85         Strata _STSTR;
 86         Cluster _PSU;
 87         Weight _LLCPWT;
 88         Run;
 
 NOTE: PROCEDURE SURVEYFREQ used (Total process time):
       real time           0.21 seconds
       cpu time            0.00 seconds
       
 
 89         
 90         OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 102        

 
User: sasdemo
DATA Step Statements

And here's the final output:

 

Untitled.jpg

Clearly, my approach isn't working, because the categories for 7 and 9 are still there.

 

What am I doing wrong, and how can I fix it?

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

In addition to what @SASKiwi said, I think the problem is here:

 

Proc surveyfreq data=BRFSS.BRF17

 

I think you want

 

Proc surveyfreq data=BRFSS.BRF17HP;

 

--
Paige Miller

View solution in original post

5 REPLIES 5
SASKiwi
PROC Star

Please consider one of your statements:

If=(HLTHPLN1=1 or HLTHPLN1=2);

What this actually does is create a variable called IF that is set to 1 if HLTHPLN1=1 or HLTHPLN1=2, and 0 if not. Is doesn't filter your data at all. What I suspect you want is:

If (HLTHPLN1=1 or HLTHPLN1=2); * Subsetting IF;

where (HLTHPLN1=1 or HLTHPLN1=2); * Subsetting WHERE;

BTW, the brackets aren't necessary. You will get the same answer by removing them. 

PaigeMiller
Diamond | Level 26

In addition to what @SASKiwi said, I think the problem is here:

 

Proc surveyfreq data=BRFSS.BRF17

 

I think you want

 

Proc surveyfreq data=BRFSS.BRF17HP;

 

--
Paige Miller
SAS93
Quartz | Level 8

Gosh darn it, the devil is always in the details! Thank you! I just needed a new pair of eyes looking at it apparently.

 

New code (that produces a matching output to the official one that I'm trying to check myself against):

 

Libname BRFSS '/folders/myfolders/';
Run;
Data BRFSS.BRF17HP;
	Set BRFSS.BRF17;
	If HLTHPLN1=1 then PLN='yes';
	If HLTHPLN1=2 then PLN='no';
	If SEX=1 then GEN='male';
	If SEX=2 then GEN='female';
Run;
Proc sort data=BRFSS.BRF17HP;
	By GEN;
Proc surveyfreq data=BRFSS.BRF17HP;
	table PLN;
	Strata _STSTR;
	Cluster _PSU;
	Weight _LLCPWT;
Run;


To also mention, the IF statement vs WHERE statement is a really good tip--I either forgot that concept or was never taught the difference. Thank you!

ballardw
Super User

@SAS93 wrote:

I'm trying to use the 2017 BRFSS data to look at healthcare coverage (HLTHPLN1).

 

The CDC has the HLTHPLN1 coded by:

1=Yes

2= no

7=don't know

9=refused

 

I want to only look at YES and NO, so I'm trying to get rid of/delete those observations where HLTHPLN1=7 or 9. I've tried using the WHERE statement to just select those I'm interested in. I've tried the IF/Delete statement. I've tried with and without quotation marks.

 

Here is my code:

Libname BRFSS '/folders/myfolders/';
Run;
Data BRFSS.BRF17HP;
	Set BRFSS.BRF17;
	If=(HLTHPLN1=1 or HLTHPLN1=2);
	If=(SEX=1 or SEX=2);
Run;
Proc sort data=BRFSS.BRF17HP;
	By SEX;
Proc surveyfreq data=BRFSS.BRF17;
	table HLTHPLN1;
	Strata _STSTR;
	Cluster _PSU;
	Weight _LLCPWT;
Run;

Here is log:

 1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 72         
 73         
 74         Libname BRFSS '/folders/myfolders/';
 NOTE: Libref BRFSS was successfully assigned as follows: 
       Engine:        V9 
       Physical Name: /folders/myfolders
 75         Run;
 76         Data BRFSS.BRF17HP;
 77         Set BRFSS.BRF17;
 78         If=(HLTHPLN1=1 or HLTHPLN1=2);
 79         If=(SEX=1 or SEX=2);
 80         Run;
 
 NOTE: There were 5843 observations read from the data set BRFSS.BRF17.
 NOTE: The data set BRFSS.BRF17HP has 5843 observations and 359 variables.
 NOTE: DATA statement used (Total process time):
       real time           1.79 seconds
       cpu time            1.79 seconds
       
 
 81         Proc sort data=BRFSS.BRF17HP;
 82         By SEX;
 
 NOTE: There were 5843 observations read from the data set BRFSS.BRF17HP.
 NOTE: The data set BRFSS.BRF17HP has 5843 observations and 359 variables.
 NOTE: PROCEDURE SORT used (Total process time):
       real time           2.47 seconds
       cpu time            2.78 seconds
       
 
 83         Proc surveyfreq data=BRFSS.BRF17;
 84         table HLTHPLN1;
 85         Strata _STSTR;
 86         Cluster _PSU;
 87         Weight _LLCPWT;
 88         Run;
 
 NOTE: PROCEDURE SURVEYFREQ used (Total process time):
       real time           0.21 seconds
       cpu time            0.00 seconds
       
 
 89         
 90         OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 102        

 
User: sasdemo
DATA Step Statements

And here's the final output:

 

Untitled.jpg

Clearly, my approach isn't working, because the categories for 7 and 9 are still there.

 

What am I doing wrong, and how can I fix it?


Something else that you want to consider are the CDC added variables they use for analysis with the BRFSS data. These will have names that start with an _ . Depending on the analysis you want to perform you may find some things, such as age groups and some of the values that use 2 or more variables to determine such as BMI categories, already coded in the data. Labels are your friend.

Cynthia_sas
Diamond | Level 26
Hi:
The syntax of your IF statement is incorrect.

Well, is it an IF statement? Or are you trying to create a new variable called IF? Or are you trying to subset the original data? And with a subsetting IF, it's not clear what you want to do.

Do you want to ONLY get something like this:
If HLTHPLN1 in (1,2) and
SEX in (1,2);
or you could code it as
If (HLTHPLN1 =1 or HLTHPLN1=2) and
(SEX =1 or SEX=2);

But your current syntax is probably creating a new variable called IF in the output data (which is why there's no error) and you can confirm that by running a PROC CONTENTS.

Hope this helps,
Cynthia