10-18-2016 12:54 PM
Hey guys, so I thought I had this all figured out unitl I went on to make a final table and it was off on counts. Basically, I have five columns for a variable tx1-tx5 and I want to say that if something is possible, probable, or defiinite then it is a 'Y' but if something is unrelated or unlikely then it is an 'N'. My approach was to use my beginning data set and then make mutiple new data sets and finally merge them in the end.
if tx1='Definite' then tx='Y';
else if tx1='Probable' then tx='Y';
else if tx1='Possible' then tx='Y';
else if tx1='Unlikely' then tx='N';
else if tx1='Unrelated' then tx='N';
if tx eq "" then delete;
proc sort data=tx1;
and I did this for tx2, tx3, tx4, and tx5. My issue is that once I merge them I am getting false negative and false positives... 'Y' for Unrelated and 'N' for Possible and so on. I cannot sort them by another variable becasue I have another data step where I have created a new variable as well and cannot sort by this new variable tx in that data set. Please help...I am still trying to figure out SAS.
10-18-2016 01:14 PM
I think the key phrase here is "once I merge them". Each data set has a single variable TX representing categories for a variable. When you merge them together, you still only have one variable TX. So what does it represent at that point?
While there are a few ways to go about this, picking a good way depends on knowing what the variables represent. This is only one possibility:
value $yn 'Definite', 'Probable', 'Possible' = 'Y'
'Unlikely', 'Unrelated' = 'N'
' ' = ' '
proc freq data=org;
tables tx1-tx5 / missing;
format tx1-tx5 $yn.;
But this may be a bad solution if the various tx variables are interchangeable with one another. You could start there and see if the result needs to change.
10-18-2016 01:23 PM
10-18-2016 01:30 PM
If you end up with many "?" in the report, it means that there are other values for your variables that are not accounted for. Maybe they are all uppercase or all llowercase. The easiest way to find out is to re-run the PROC FREQ without the FORMAT statement. That will reveal what your variables contain.
10-18-2016 01:37 PM
10-18-2016 01:47 PM
Only you can answer that. What is your program supposed to accomplish?
Deleting them is probably wrong, because deleting them removes the entire observation. A blank for TX5 doesn't mean you should remove TX1.
However, if you got "?" it is not because of blank values (at least, not if you used the program I supplied which defines a separate category for blanks). Check the spelling to see if it matches exactly the spelling that you expect.
10-18-2016 01:53 PM
10-18-2016 01:59 PM
Well, I can't tell if this would be correct or not but here is one way:
if tx1 > ' ' then do;
tx = tx1;
if tx2 > ' ' then do;
tx = tx2;
if tx3 > ' ' then do;
tx = tx3;
if tx4 > ' ' then do;
tx = tx4;
if tx5 > ' ' then do;
tx = tx5;
It creates anywhere from 0 to 5 observations where you used to have one. Is that a good result? It depends on what you need to do with the data.
There are ways to abbreviate the above code, using arrays. For now, I used the simpler version.
10-18-2016 05:20 PM
You should provide some example input data and the desired result for that data.
When you have multiple variables coding into one single then you have to address ALL of the cases of combinations in your rules.
5 variables with 6 possible values ( the five mentioned plus missing) for each is 6*6*6*6*6 possible cases to consider. If there is a heirarchy such that if ANY of the variables has a single value then assign you need to be explicit about defining the heirarchy.
10-19-2016 02:43 AM
Before we go into coding, we have to check the design first.
You have one final tx variable, but you have five input variables. What will take precedence?
Does any "Unlikely" or "Unrelated" mean that tx has to be 'N', or should tx be 'Y' if any of the other values is encountered in one of tx1-tx5?