BookmarkSubscribeRSS Feed
trt3s
Calcite | Level 5

Hey guys, so I thought I had this all figured out unitl I went on to make a final table and it was off on counts. Basically, I have five columns for a variable tx1-tx5 and I want to say that if something is possible, probable, or defiinite then it is a 'Y' but if something is unrelated or unlikely then it is an 'N'. My approach was to use my beginning data set and then make mutiple new data sets and finally merge them in the end.

Example:

 

data tx1;
set org;

if tx1='Definite' then tx='Y';

else if tx1='Probable' then tx='Y';

else if tx1='Possible' then tx='Y';

else if tx1='Unlikely' then tx='N';

else if tx1='Unrelated' then tx='N';

if tx eq "" then delete;

run;

 

proc sort data=tx1;
by id;
run;

 

and I did this for tx2, tx3, tx4, and tx5. My issue is that once I merge them I am getting false negative and false positives... 'Y' for Unrelated and 'N' for Possible and so on. I cannot sort them by another variable becasue I have another data step where I have created a new variable as well and cannot sort by this new variable tx in that data set. Please help...I am still trying to figure out SAS.

9 REPLIES 9
Astounding
PROC Star

I think the key phrase here is "once I merge them".  Each data set has a single variable TX representing categories for a variable.  When you merge them together, you still only have one variable TX.  So what does it represent at that point?

 

While there are a few ways to go about this, picking a good way depends on knowing what the variables represent.  This is only one possibility:

 

proc format;

value $yn 'Definite', 'Probable', 'Possible' = 'Y'

'Unlikely', 'Unrelated' = 'N'

' ' = ' '

other='?';

run;

 

proc freq data=org;

tables tx1-tx5 / missing;

format tx1-tx5 $yn.;

run;

 

But this may be a bad solution if the various tx variables are interchangeable with one another.  You could start there and see if the result needs to change.

trt3s
Calcite | Level 5
This solution did not seem to work... I was left with many ? marks, and I actually need the a new variable TX to be defined for all TX1-TX5.. Maybe I am not looking at this correctly
Astounding
PROC Star

If you end up with many "?" in the report, it means that there are other values for your variables that are not accounted for.  Maybe they are all uppercase or all llowercase.  The easiest way to find out is to re-run the PROC FREQ without the FORMAT statement.  That will reveal what your variables contain.

trt3s
Calcite | Level 5
Okay, I ran a proc freq without the format statement, and you are correct there are blank values. What should I do with those, delete them?
Astounding
PROC Star

Only you can answer that.  What is your program supposed to accomplish?

 

Deleting them is probably wrong, because deleting them removes the entire observation.  A blank for TX5 doesn't mean you should remove TX1.

 

However, if you got "?" it is not because of blank values (at least, not if you used the program I supplied which defines a separate category for blanks).  Check the spelling to see if it matches exactly the spelling that you expect.

trt3s
Calcite | Level 5
You're right, I do not need to delete because even if tx5 is "blank" tx1 may not be... I still am not sure how to make tx1-tx5 one variable. I checked all the spelling when I did proc freq and everything was spelled correctly. I feel extremely lost on this... SAS is not doing what I need it to do, but it's because I don't know how to ask it to do it.
Astounding
PROC Star

Well, I can't tell if this would be correct or not but here is one way:

 

data want;

set have;

if tx1 > ' ' then do;

   tx = tx1;

   output;

end;

if tx2 > ' ' then do;

   tx = tx2;

   output;

end;

if tx3 > ' ' then do;

   tx = tx3;

   output;

end;

if tx4 > ' ' then do;

  tx = tx4;

  output;

end;

if tx5 > ' ' then do;

   tx = tx5;

   output;

end;

drop tx1-tx5;

run;

 

It creates anywhere from 0 to 5 observations where you used to have one.  Is that a good result?  It depends on what  you need to do with the data.

 

There are ways to abbreviate the above code, using arrays.  For now, I used the simpler version.

ballardw
Super User

You should provide some example input data and the desired result for that data.

 

When you have multiple variables coding into one single then you have to address ALL of the cases of combinations in your rules.

5 variables with 6 possible values ( the five mentioned plus missing) for each is 6*6*6*6*6 possible cases to consider. If there is a heirarchy such that if ANY of the variables has a single value then assign you need to be explicit about defining the heirarchy.

Kurt_Bremser
Super User

Before we go into coding, we have to check the design first.

 

You have one final tx variable, but you have five input variables. What will take precedence?

Does any "Unlikely" or "Unrelated" mean that tx has to be 'N', or should tx be 'Y' if any of the other values is encountered in one of tx1-tx5?

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 2012 views
  • 4 likes
  • 4 in conversation