I have data and need to code 0/1 (no/yes) to having a medical condition for multiple conditions derived from specific codes that contain a letter followed by numbers (format: LETTERnumbers; no other punctuation or spaces in this code format). This question is a matter of efficiency.
Some of the medical conditions have 1 specific letter/number code, while others have up to 200 specific letter/number codes. The potential avenue for efficiency is that each "family" of medical conditions starts with a unique letter and number combination. Then, all of the specific conditions that make up that larger grouping has subsequent numbers. For example, the larger grouping of "Depression" may be G40, and recurrent depression may be G4001, major depressive disorder may be G4089, etc. As of now, I am entering in each letter/number code as an example for 1 medical condition below:
length Code10 $1379; 
Code10=catx(" ", "G40001", "G40009", "G40011", "G40019", "G40101", "G40109", "G40111", "G40119", 
"G40201", "G40209", "G40211", "G40219", "G40301", "G40309", "G40311", "G40319", "G40A01", 
"G40A09", "G40A11", "G40A19", "G40B01", "G40B09", "G40B11", "G40B19", "G40401", "G40409",
"G40411", "G40419", "G40501", "G40509", "G40801", "G40802", "G40803", "G40804", "G40811",
"G40812", "G40813", "G40814", "G40821", "G40822", "G40823", "G40824", "G4089", "G40901",
"G40909","G40911","G40919"); 
VARIABLE=0; 
retain VARIABLE; 
array groupallocation{11945} $ Diag1-Diag11945; 
do i=1 to dim(groupallocation);
if indexw(Code10, groupallocation(i)) then VARIABLE=1; 
end;
drop i;
Is there a way to flag a condition based on the starting characters only? This will save tremendous time if I can flag for "G40" and capture all of the individuals codes in the above example.
Thank you in advance for any input!!
if code10=:'G40' then ... ;The =: indicates that the test is to see if the value of CODE10 begins with 'G40'
Hi Paige,
Thank you for taking a look at this! I don't fully understand how to incorporate your code into what I have. I have tried a few ways, but so far, I am not getting correct group allocation as everything is just showing up as 0.
Would you mind copying and pasting my code and providing your edits so I can see exactly how to incorporate?
Thank you!!
Dan
@dwhitney wrote:
I have data and need to code 0/1 (no/yes) to having a medical condition for multiple conditions derived from specific codes that contain a letter followed by numbers (format: LETTERnumbers; no other punctuation or spaces in this code format). This question is a matter of efficiency.
Some of the medical conditions have 1 specific letter/number code, while others have up to 200 specific letter/number codes. The potential avenue for efficiency is that each "family" of medical conditions starts with a unique letter and number combination. Then, all of the specific conditions that make up that larger grouping has subsequent numbers. For example, the larger grouping of "Depression" may be G40, and recurrent depression may be G4001, major depressive disorder may be G4089, etc. As of now, I am entering in each letter/number code as an example for 1 medical condition below:
length Code10 $1379;
Code10=catx(" ", "G40001", "G40009", "G40011", "G40019", "G40101", "G40109", "G40111", "G40119",
"G40201", "G40209", "G40211", "G40219", "G40301", "G40309", "G40311", "G40319", "G40A01",
"G40A09", "G40A11", "G40A19", "G40B01", "G40B09", "G40B11", "G40B19", "G40401", "G40409",
"G40411", "G40419", "G40501", "G40509", "G40801", "G40802", "G40803", "G40804", "G40811",
"G40812", "G40813", "G40814", "G40821", "G40822", "G40823", "G40824", "G4089", "G40901",
"G40909","G40911","G40919");
VARIABLE=0;
retain VARIABLE;
array groupallocation{11945} $ Diag1-Diag11945;
do i=1 to dim(groupallocation);
if indexw(Code10, groupallocation(i)) then VARIABLE=1;
end;
drop i;
Is there a way to flag a condition based on the starting characters only? This will save tremendous time if I can flag for "G40" and capture all of the individuals codes in the above example.
Thank you in advance for any input!!
Hint: almost any approach that involves creating or using nearly 12,000 variables (array groupallocation{11945} or Diag1-Diag11945 ) is likely to indicate that a process needs serious consideration if all of those variables are actually needed. If your system is creating 12,000 diagnoses codes for every patient how many are actually typically set? It seems likely that a great deal of disk space is used that is not needed and then searching every single variable when you really seem to only care that any one of multiple conditions is encountered is very inefficient.
Consider the following code which stops the loop as soon as any condition is found that meets criteria:
do i=1 to dim(groupallocation);
   if indexw(Code10, groupallocation(i)) then do;
      VARIABLE=1; 
      leave;
   end;
end;
Coupled with @PaigeMiller's "start's with" this may speed up code considerably.
But I still think that many variables and why they are all there needs some thinking.
Thank you very much for this! You are correct, my current code is taking up so much disk space and usually up to a day to run. This will be extremely helpful!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
