Thanks so much for your detailed response. I suspect that you are right, this is a much harder solution than I originally thought. Your PDF shows 3 columns, H C and R. How does that relate to data like HA, H#, H6, H*, R4, R5, & scoring sheet? That is what you have said is in your data. Help us help you.do you mean that H1, H5, H6, H10 & Scoring sheet are the names of your variables? What are the entire range of possible letter number combinations in the data? -Risk_Factors is the name of the variable containing several different combinations. H1, H5, H6, H10, Scoring Sheet, etc, are all possible risk factors for whatever outcome is indicated on the observation. The range is H1 through H10, C1 through C5, and R1 through R5. If you show an actual result there is likely to be not need for separate steps to "reformat" and then "add flags". If the goal is to create flags, then show what the flag results should be. -The flag should indicate, using a 0 or 1, whether the observation had a risk factor of H1, H2, H3, or any of the above range of values. Incomplete rules: Consider this line: H1-2,5-6,9-10;C1-2,4;R3-5 does a - mean that a continuous sequence is involved? Such as R3-5 means that at the end there should be an R3 R4 R5? If there is to be a "flag" then what is the value of the flag. -To my understanding, you are correct. If there is a flag, then another separate column would show 0 if the observation did not indicate the presence of H8, for example, but a 1 for the presence of H1, H2, H5, H6, H9, and H10. Also, did you retype the example data? The line H1,2,3,5,6,10; C1,3,R3,5 does not show a ; before the R that appears in other lines (very helpful if the ; is actually consistent). -It would be helpful, but the punctuation is incredibly inconsistent. I did not show all rows of the datasets, but some values have a ;, some have a /, some have a &, and on and on and on... How many H, C and R flags are to be created for each? Someone needs to go to a class on data entry. It appears, which you should state in your rules, that some of the values are entered as LetterNumber combinations, some are entered are ranges in different formats: H1-2 H:5,6,8,10. But we need some rule of what the heck to do with H# HA H* (you said letter and number #; # A and * are not numbers). -You are correct, I do not know what to do with the #, *, etc.! I assume, once the complete LetterNumber values are reformatted, then something can be done with those (potentially just delete them because they're seemingly useless). Here is one way that creates flags with values of 1 when the H, C and R a delimited by semicolons. That is only one case of your data. I create the flags here because the steps to make columns, especially with H , C and R values on the same "row" as your PDF shows, is actually harder. Note: If there are only a "few" records this may be easier done by hand. Few depends on complexity but I suspect up to a 100 or so reentering the data, or making the version you use, may be done by hand. 1000's not so much. - I think you're right - this might be easier done in Excel or something. There's only a few hundred observations, so we'll see what the rest of my team thinks.
... View more