DATA Step, Macro, Functions and more

Multiple Columns into one binary variable

Reply
Occasional Contributor
Posts: 19

Multiple Columns into one binary variable

Hey guys, so I thought I had this all figured out unitl I went on to make a final table and it was off on counts. Basically, I have five columns for a variable tx1-tx5 and I want to say that if something is possible, probable, or defiinite then it is a 'Y' but if something is unrelated or unlikely then it is an 'N'. My approach was to use my beginning data set and then make mutiple new data sets and finally merge them in the end.

Example:

 

data tx1;
set org;

if tx1='Definite' then tx='Y';

else if tx1='Probable' then tx='Y';

else if tx1='Possible' then tx='Y';

else if tx1='Unlikely' then tx='N';

else if tx1='Unrelated' then tx='N';

if tx eq "" then delete;

run;

 

proc sort data=tx1;
by id;
run;

 

and I did this for tx2, tx3, tx4, and tx5. My issue is that once I merge them I am getting false negative and false positives... 'Y' for Unrelated and 'N' for Possible and so on. I cannot sort them by another variable becasue I have another data step where I have created a new variable as well and cannot sort by this new variable tx in that data set. Please help...I am still trying to figure out SAS.

Super User
Posts: 5,085

Re: Multiple Columns into one binary variable

I think the key phrase here is "once I merge them".  Each data set has a single variable TX representing categories for a variable.  When you merge them together, you still only have one variable TX.  So what does it represent at that point?

 

While there are a few ways to go about this, picking a good way depends on knowing what the variables represent.  This is only one possibility:

 

proc format;

value $yn 'Definite', 'Probable', 'Possible' = 'Y'

'Unlikely', 'Unrelated' = 'N'

' ' = ' '

other='?';

run;

 

proc freq data=org;

tables tx1-tx5 / missing;

format tx1-tx5 $yn.;

run;

 

But this may be a bad solution if the various tx variables are interchangeable with one another.  You could start there and see if the result needs to change.

Occasional Contributor
Posts: 19

Re: Multiple Columns into one binary variable

This solution did not seem to work... I was left with many ? marks, and I actually need the a new variable TX to be defined for all TX1-TX5.. Maybe I am not looking at this correctly
Super User
Posts: 5,085

Re: Multiple Columns into one binary variable

If you end up with many "?" in the report, it means that there are other values for your variables that are not accounted for.  Maybe they are all uppercase or all llowercase.  The easiest way to find out is to re-run the PROC FREQ without the FORMAT statement.  That will reveal what your variables contain.

Occasional Contributor
Posts: 19

Re: Multiple Columns into one binary variable

Okay, I ran a proc freq without the format statement, and you are correct there are blank values. What should I do with those, delete them?
Super User
Posts: 5,085

Re: Multiple Columns into one binary variable

Only you can answer that.  What is your program supposed to accomplish?

 

Deleting them is probably wrong, because deleting them removes the entire observation.  A blank for TX5 doesn't mean you should remove TX1.

 

However, if you got "?" it is not because of blank values (at least, not if you used the program I supplied which defines a separate category for blanks).  Check the spelling to see if it matches exactly the spelling that you expect.

Occasional Contributor
Posts: 19

Re: Multiple Columns into one binary variable

You're right, I do not need to delete because even if tx5 is "blank" tx1 may not be... I still am not sure how to make tx1-tx5 one variable. I checked all the spelling when I did proc freq and everything was spelled correctly. I feel extremely lost on this... SAS is not doing what I need it to do, but it's because I don't know how to ask it to do it.
Super User
Posts: 5,085

Re: Multiple Columns into one binary variable

Well, I can't tell if this would be correct or not but here is one way:

 

data want;

set have;

if tx1 > ' ' then do;

   tx = tx1;

   output;

end;

if tx2 > ' ' then do;

   tx = tx2;

   output;

end;

if tx3 > ' ' then do;

   tx = tx3;

   output;

end;

if tx4 > ' ' then do;

  tx = tx4;

  output;

end;

if tx5 > ' ' then do;

   tx = tx5;

   output;

end;

drop tx1-tx5;

run;

 

It creates anywhere from 0 to 5 observations where you used to have one.  Is that a good result?  It depends on what  you need to do with the data.

 

There are ways to abbreviate the above code, using arrays.  For now, I used the simpler version.

Super User
Posts: 10,516

Re: Multiple Columns into one binary variable

You should provide some example input data and the desired result for that data.

 

When you have multiple variables coding into one single then you have to address ALL of the cases of combinations in your rules.

5 variables with 6 possible values ( the five mentioned plus missing) for each is 6*6*6*6*6 possible cases to consider. If there is a heirarchy such that if ANY of the variables has a single value then assign you need to be explicit about defining the heirarchy.

Super User
Posts: 6,948

Re: Multiple Columns into one binary variable

Before we go into coding, we have to check the design first.

 

You have one final tx variable, but you have five input variables. What will take precedence?

Does any "Unlikely" or "Unrelated" mean that tx has to be 'N', or should tx be 'Y' if any of the other values is encountered in one of tx1-tx5?

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Ask a Question
Discussion stats
  • 9 replies
  • 409 views
  • 4 likes
  • 4 in conversation