I would like to match a maximum of 10 pharmaceuticals to each patient based on the symptoms addressed by the drugs and the symptoms recorded for each patient. However, there is not a common variable to which to merge on and merging all pharmaceuticals to all patients and using the index function to find suitable drugs is not practical as it would result in a dataset with the number of records equivalent to 200 million drugs x 25 million patients (or 5,000 trillion records). * THE DATASET PHARMACEUTICALS IN REALITY HAS 200 MILLION RECORDS; DATA PHARMACEUTICALS;INPUT DRUG_ID SYMPTOM_WORD1 $ SYMPTOM_WORD2 $ SYMPTOM_WORD3 $; CARDS; 1022 BP ARRYTHMIA 1023 ANGINA 1024 BP ANGINA PALP ; RUN; * THE DATASET PATIENTS IN REALITY HAS 25 MILLION RECORDS; * SYMPTOM_WORDS ARE CONCATENATED TOGETHER AND SEPARTED BY COMMAS; DATA PATIENTS;INPUT PATIENT_ID AILMENT_WORDS $; CARDS; 245762 DIAHREA,NASEAU,ABDOMINAL CRAMPS 238761 MIGRAINE,ANGINA 239978 BP,HEADACHE,COUGH,THROAT PAIN Ideally, if there was a common variable I would merge the two datasets and execute an INDEX function to find drugs that match patient symptoms and set a counter to limit these matches to 10 per patient, like this: IF INDEX(AILMENT_WORDS, SEARCH_WORD1) GE 1 THEN EVENTMATCH+1; IF INDEX(AILMENT_WORDS, SEARCH_WORD2) GE 1 THEN EVENTMATCH+1; IF INDEX(AILMENT_WORDS, SEARCH_WORD3) GE 1 THEN EVENTMATCH+1; I would set a loop counter using first.patient_ID to ensure no more than 10 drug matches per patient. However, as I cannot perform the merge “BY PATIENT_ID” as it is not common to both datasets I therefore cannot use first.patient_ID. Any ideas?
... View more