New SAS User

Completely new to SAS or trying something new with SAS? Post here for help getting started.
BookmarkSubscribeRSS Feed
kklotz
Fluorite | Level 6

Hello,

I am doing a project about risk factors for developing cervical cancer among women. My outcome variable is Pap Smear Status which has 5 categories, negative, reactive, LGSIL, ASCUS, and HGSIL.

The professor recommended that we dichotomize the outcome variable to include "1" for any type of positive test, and "0" for a negative test. I am struggling to figure out how to accomplish this. 

I was attempting if then statements, but I kept getting a column of all zeros. 

 

Can anyone help? 

12 REPLIES 12
Tom
Super User Tom
Super User

Since you didn't not share any data (or code for that matter) let's just assume you have dataset named HAVE with a character variable named STATUS that can have one of 5 values:

negative
reactive
LGSIL
ASCUS
HGSIL

And you  want to make a new numeric variable named NSTATUS with possible values of 1 or 0.

 

Since SAS will evaluate a boolean expression as either 1 (TRUE) or 0 (FLASE) there should not be any need for IF/THEN logic.  Just a simple assignment statement.

So assuming you want everything except "negative" to be coded as 1 you could do something like this:

data want;
  set have;
  nstatus = not (status = 'negative');
run;
kklotz
Fluorite | Level 6

Sorry, I am so new to SAS. I know the bare minimum. The data is 267 values. I am looking to do multilinear regression to compare pap results to multiple variables including race, age, smoking status, ect. I was thinking that I had to code a dummy variable for either a positive or negative test. I cant understand how to make 4 categorical variables into 1 dummy variable to do regression. 

 

Sorry =, if theres more info you need please let me know.

Tom
Super User Tom
Super User

Do you have ONE variable with the status that can have one of five different values? 

 

Or do you have FIVE variables? If you have FIVE variables how is each one coded?  

 

Show an example of your data.  Does not need to be the real data, just some thing that makes it clearer what you have.  

 

Example:

data have;
  input id age smoke $ status $ ;
datalines;
1 23 no negative
2 34 yes reactive
3 41 no LGSIL
4 36 yes ASCUS
5 18 no HGSIL
;

 

kklotz
Fluorite | Level 6
Data PapResults;
     input id papresults;
datalines;
1 negative
2 reactive
3 LGSIL
4 ASCUS
5 HGSIL
6 reactive
7 LGSIL
;
kklotz
Fluorite | Level 6
data papdummy1;
set Work.papdummy;
if pap = 'reactive' then Reactive = 1;
else Reactive = 0;
if pap = 'LGSIL' then LGSIL = 1;
else LGSIL = 0;
if pap = 'ASCUS' then ASCUS = 1;
else ASCUS = 0;
if pap = 'HGSIL' then HGSIL = 1;
else HGSIL = 0;

I was able to do this to get dummy variables all individually. But what I guess im trying to do now is to get them to be one single "positive" category
Tom
Super User Tom
Super User

The way I read what you said the professor asked you to do was to convert your 5 level variable into a 2 level variable so you could use it as the target variable in a logistic type regression.   You do not need to make those DUMMY variables first to do that. 

 

Note and you should not need to make DUMMY variables yourself anyway.  SAS would do that for you if you just specify that papresults is a CLASS variable in your analysis.

 

What SAS procedure are you planning to run to do your analysis?

What is your model?

 

Tom
Super User Tom
Super User

So that is what I expected and what the code I posted before should work with.  Your dataset is named: PapResults and your variable is named papresults.    You need to add a $ in the INPUT statement for that code to run since papresults needs to be character to have those types of values.

Data PapResults;
     input id papresults $;
datalines;
1 negative
2 reactive
3 LGSIL
4 ASCUS
5 HGSIL
6 reactive
7 LGSIL
;

 

Here is code to make PapResults2 that adds a new variable named papresults2.

data papresults2;
  set papresults;
  papresults2 = not (papresults = 'negative');
run;

If you run PROC PRINT on that new dataset you will get:

Tom_0-1740776136500.png

 

 

kklotz
Fluorite | Level 6
yes, I have One variable with the status that can have one of five values
antonbcristina
SAS Super FREQ

Hi @kklotz, if you specifically wanted to use IF/THEN logic, you could try:

data want;
  set have;
  if status = 'negative' then nstatus = 0;  
  else nstatus=1;
run;

 

This assumes that:

1) there are no missing values for status and

2) all other values of status are valid and considered "positive"

 

You can check this with a call to PROC FREQ:

proc freq data=have;
  table status; 
run;

 

kklotz
Fluorite | Level 6

Sorry, I am so new to SAS. I know the bare minimum. The data is 267 values. I am looking to do multilinear regression to compare pap results to multiple variables including race, age, smoking status, ect. I was thinking that I had to code a dummy variable for either a positive or negative test. I cant understand how to make 4 categorical variables into 1 dummy variable to do regression. 

 

So that is how I was attempting to do it, but I would end up only getting 0's in my column that was made. I really do not know what I am doing unfortunately 

antonbcristina
SAS Super FREQ

Please share your code and a sample dataset we can work with and we'll be able to help.

ballardw
Super User

@kklotz wrote:

Sorry, I am so new to SAS. I know the bare minimum. The data is 267 values. I am looking to do multilinear regression to compare pap results to multiple variables including race, age, smoking status, ect. I was thinking that I had to code a dummy variable for either a positive or negative test. I cant understand how to make 4 categorical variables into 1 dummy variable to do regression. 

 

So that is how I was attempting to do it, but I would end up only getting 0's in my column that was made. I really do not know what I am doing unfortunately 


If you mean creating a SINGLE variable with information of RACE, AGE, Smoking status and some thing else then Do Not do it. That should be 4 separate variables (or 5 or 12 or what ever).  Combining levels of a single variable is pretty common.

 

The times to worry about combining results from different variables is when they measure the same think such as different tests using different methods but reporting on similar condition. Then you might have a some records using one test and others with a different. But the question of interest is the result. So you  would be looking for the key values from one test for positive and the other test for positive and report positive if either of the tests showed a positive result and negative if neither had positive result. (The case of what to do with a designed experiment with both tests at the same time is likely going to involve other questions than "was the patient positive for condition XXXX" ).

 

This would be analogous to reporting standardized height and weight if some patients records have metric measurements and others in a different measurement system that are recorded in two different variables such as WeightKG and WeightLB.

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 12 replies
  • 1387 views
  • 2 likes
  • 4 in conversation