- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am doing a project about risk factors for developing cervical cancer among women. My outcome variable is Pap Smear Status which has 5 categories, negative, reactive, LGSIL, ASCUS, and HGSIL.
The professor recommended that we dichotomize the outcome variable to include "1" for any type of positive test, and "0" for a negative test. I am struggling to figure out how to accomplish this.
I was attempting if then statements, but I kept getting a column of all zeros.
Can anyone help?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Since you didn't not share any data (or code for that matter) let's just assume you have dataset named HAVE with a character variable named STATUS that can have one of 5 values:
negative
reactive
LGSIL
ASCUS
HGSIL
And you want to make a new numeric variable named NSTATUS with possible values of 1 or 0.
Since SAS will evaluate a boolean expression as either 1 (TRUE) or 0 (FLASE) there should not be any need for IF/THEN logic. Just a simple assignment statement.
So assuming you want everything except "negative" to be coded as 1 you could do something like this:
data want;
set have;
nstatus = not (status = 'negative');
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, I am so new to SAS. I know the bare minimum. The data is 267 values. I am looking to do multilinear regression to compare pap results to multiple variables including race, age, smoking status, ect. I was thinking that I had to code a dummy variable for either a positive or negative test. I cant understand how to make 4 categorical variables into 1 dummy variable to do regression.
Sorry =, if theres more info you need please let me know.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Do you have ONE variable with the status that can have one of five different values?
Or do you have FIVE variables? If you have FIVE variables how is each one coded?
Show an example of your data. Does not need to be the real data, just some thing that makes it clearer what you have.
Example:
data have;
input id age smoke $ status $ ;
datalines;
1 23 no negative
2 34 yes reactive
3 41 no LGSIL
4 36 yes ASCUS
5 18 no HGSIL
;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Data PapResults;
input id papresults;
datalines;
1 negative
2 reactive
3 LGSIL
4 ASCUS
5 HGSIL
6 reactive
7 LGSIL
;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
set Work.papdummy;
if pap = 'reactive' then Reactive = 1;
else Reactive = 0;
if pap = 'LGSIL' then LGSIL = 1;
else LGSIL = 0;
if pap = 'ASCUS' then ASCUS = 1;
else ASCUS = 0;
if pap = 'HGSIL' then HGSIL = 1;
else HGSIL = 0;
I was able to do this to get dummy variables all individually. But what I guess im trying to do now is to get them to be one single "positive" category
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The way I read what you said the professor asked you to do was to convert your 5 level variable into a 2 level variable so you could use it as the target variable in a logistic type regression. You do not need to make those DUMMY variables first to do that.
Note and you should not need to make DUMMY variables yourself anyway. SAS would do that for you if you just specify that papresults is a CLASS variable in your analysis.
What SAS procedure are you planning to run to do your analysis?
What is your model?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So that is what I expected and what the code I posted before should work with. Your dataset is named: PapResults and your variable is named papresults. You need to add a $ in the INPUT statement for that code to run since papresults needs to be character to have those types of values.
Data PapResults;
input id papresults $;
datalines;
1 negative
2 reactive
3 LGSIL
4 ASCUS
5 HGSIL
6 reactive
7 LGSIL
;
Here is code to make PapResults2 that adds a new variable named papresults2.
data papresults2;
set papresults;
papresults2 = not (papresults = 'negative');
run;
If you run PROC PRINT on that new dataset you will get:
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @kklotz, if you specifically wanted to use IF/THEN logic, you could try:
data want;
set have;
if status = 'negative' then nstatus = 0;
else nstatus=1;
run;
This assumes that:
1) there are no missing values for status and
2) all other values of status are valid and considered "positive"
You can check this with a call to PROC FREQ:
proc freq data=have;
table status;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, I am so new to SAS. I know the bare minimum. The data is 267 values. I am looking to do multilinear regression to compare pap results to multiple variables including race, age, smoking status, ect. I was thinking that I had to code a dummy variable for either a positive or negative test. I cant understand how to make 4 categorical variables into 1 dummy variable to do regression.
So that is how I was attempting to do it, but I would end up only getting 0's in my column that was made. I really do not know what I am doing unfortunately
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please share your code and a sample dataset we can work with and we'll be able to help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@kklotz wrote:
Sorry, I am so new to SAS. I know the bare minimum. The data is 267 values. I am looking to do multilinear regression to compare pap results to multiple variables including race, age, smoking status, ect. I was thinking that I had to code a dummy variable for either a positive or negative test. I cant understand how to make 4 categorical variables into 1 dummy variable to do regression.
So that is how I was attempting to do it, but I would end up only getting 0's in my column that was made. I really do not know what I am doing unfortunately
If you mean creating a SINGLE variable with information of RACE, AGE, Smoking status and some thing else then Do Not do it. That should be 4 separate variables (or 5 or 12 or what ever). Combining levels of a single variable is pretty common.
The times to worry about combining results from different variables is when they measure the same think such as different tests using different methods but reporting on similar condition. Then you might have a some records using one test and others with a different. But the question of interest is the result. So you would be looking for the key values from one test for positive and the other test for positive and report positive if either of the tests showed a positive result and negative if neither had positive result. (The case of what to do with a designed experiment with both tests at the same time is likely going to involve other questions than "was the patient positive for condition XXXX" ).
This would be analogous to reporting standardized height and weight if some patients records have metric measurements and others in a different measurement system that are recorded in two different variables such as WeightKG and WeightLB.