DATA Step, Macro, Functions and more

Conditionally match merging records from two datasets without a common variable?

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 9
Accepted Solution

Conditionally match merging records from two datasets without a common variable?

I would like to match a maximum of 10 pharmaceuticals to each patient based on the symptoms addressed by the drugs and the symptoms recorded for each patient.

However, there is not a common variable to which to merge on and merging all pharmaceuticals to all patients and using the index function to find suitable drugs is not practical as it would result in a dataset with the number of records equivalent to 200 million drugs x 25 million patients (or 5,000 trillion records).

* THE DATASET PHARMACEUTICALS IN REALITY HAS 200 MILLION RECORDS;

DATA PHARMACEUTICALS;INPUT DRUG_ID SYMPTOM_WORD1 $ SYMPTOM_WORD2 $ SYMPTOM_WORD3 $;

CARDS;

1022 BP ARRYTHMIA

1023 ANGINA

1024 BP ANGINA PALP

;

RUN;

* THE DATASET PATIENTS IN REALITY HAS 25 MILLION RECORDS;

* SYMPTOM_WORDS ARE CONCATENATED TOGETHER AND SEPARTED BY COMMAS;

DATA PATIENTS;INPUT PATIENT_ID AILMENT_WORDS $;

CARDS;

245762 DIAHREA,NASEAU,ABDOMINAL CRAMPS

238761 MIGRAINE,ANGINA

239978 BP,HEADACHE,COUGH,THROAT PAIN

Ideally, if there was a common variable I would merge the two datasets and execute an INDEX function to find drugs that match patient symptoms and set a counter to limit these matches to 10 per patient, like this:

IF INDEX(AILMENT_WORDS, SEARCH_WORD1) GE 1 THEN EVENTMATCH+1;

IF INDEX(AILMENT_WORDS, SEARCH_WORD2) GE 1 THEN EVENTMATCH+1;

IF INDEX(AILMENT_WORDS, SEARCH_WORD3) GE 1 THEN EVENTMATCH+1;

I would set a loop counter using first.patient_ID to ensure no more than 10 drug matches per patient.  However, as I cannot perform the merge “BY PATIENT_ID” as it is not common to both datasets I therefore cannot use first.patient_ID.

Any ideas?


Accepted Solutions
Solution
‎06-30-2015 02:05 PM
Respected Advisor
Posts: 4,927

Re: Conditionally match merging records from two datasets without a common variable?

A conventional approach seems the most scalable:

DATA PHARMACEUTICALS;

infile datalines missover;

INPUT DRUG_ID (SYMPTOM_WORD1 SYMPTOM_WORD2 SYMPTOM_WORD3) (:$16.);

datalines;

1022 BP ARRYTHMIA

1023 ANGINA

1024 BP ANGINA PALP

;

data pharm;

set pharmaceuticals;

array symptom_{*} SYMPTOM_WORD:;

do i = 1 to dim(symptom_);

    symptom = symptom_{i};

    if not missing(symptom) then output;

    end;

keep DRUG_ID symptom;

run;

DATA PATIENTS;

INPUT PATIENT_ID AILMENT_WORDS &:$40.;

datalines;

245762 DIAHREA,NASEAU,ABDOMINAL CRAMPS

238761 MIGRAINE,ANGINA

239978 BP,HEADACHE,COUGH,THROAT PAIN

;

data pat;

set PATIENTS;

length ailment $16;

do i = 1 to countw(AILMENT_WORDS, ",");

    ailment = scan(AILMENT_WORDS, i, ",");

    output;

    end;

keep PATIENT_ID ailment;

run;

proc sql;

create table drugs as

select PATIENT_ID, DRUG_ID

from pat as a inner join

    pharm as b on b.symptom = a.ailment

order by PATIENT_ID;

quit;

data drugs10;

set drugs; by PATIENT_ID;

if first.PATIENT_ID then count = 0;

count + 1;

if count <= 10;

drop count;

run;

PG

PG

View solution in original post


All Replies
Super User
Posts: 11,343

Re: Conditionally match merging records from two datasets without a common variable?

Do you want different lists of 10 drugs for each instance of BP or the same for all BP (and so on for each ailment)?

Occasional Contributor
Posts: 9

Re: Conditionally match merging records from two datasets without a common variable?

Hi Ballardw,

I'm just looking for 10 match drugs for any given ailment.  Therefore, it is possible that some ailments will not be addressed for given patients.

So the end result would be something along the lines of:

User1 drug1

User1 drug2

user1 drug3

...

user1 drug10

I know that sounds strange, but it is a theoretical rather than a practical exercise.

Thank you for replying!

Occasional Contributor
Posts: 9

Re: Conditionally match merging records from two datasets without a common variable?

I'm looking for a list of a maximum of 10 drugs per patient.  Accordingly some ailments will not be addressed, but all patients will be addressed.

Super Contributor
Posts: 298

Re: Conditionally match merging records from two datasets without a common variable?

Ian,

You have ailment-words from a Patient and have symptom lists by drug_id. How would the OUTPUT look-like? Should there be the Drug_Id or Symptom_word for the matching Ailment_word? A simple OUTPUT data set with the related PHARMA and PATIENT data sets will easily explain what you want, instead of thousand words of explanation.

Super User
Posts: 10,041

Re: Conditionally match merging records from two datasets without a common variable?

It sounds like you need Hash Table. Hope you have enough Memory to handle it .

Solution
‎06-30-2015 02:05 PM
Respected Advisor
Posts: 4,927

Re: Conditionally match merging records from two datasets without a common variable?

A conventional approach seems the most scalable:

DATA PHARMACEUTICALS;

infile datalines missover;

INPUT DRUG_ID (SYMPTOM_WORD1 SYMPTOM_WORD2 SYMPTOM_WORD3) (:$16.);

datalines;

1022 BP ARRYTHMIA

1023 ANGINA

1024 BP ANGINA PALP

;

data pharm;

set pharmaceuticals;

array symptom_{*} SYMPTOM_WORD:;

do i = 1 to dim(symptom_);

    symptom = symptom_{i};

    if not missing(symptom) then output;

    end;

keep DRUG_ID symptom;

run;

DATA PATIENTS;

INPUT PATIENT_ID AILMENT_WORDS &:$40.;

datalines;

245762 DIAHREA,NASEAU,ABDOMINAL CRAMPS

238761 MIGRAINE,ANGINA

239978 BP,HEADACHE,COUGH,THROAT PAIN

;

data pat;

set PATIENTS;

length ailment $16;

do i = 1 to countw(AILMENT_WORDS, ",");

    ailment = scan(AILMENT_WORDS, i, ",");

    output;

    end;

keep PATIENT_ID ailment;

run;

proc sql;

create table drugs as

select PATIENT_ID, DRUG_ID

from pat as a inner join

    pharm as b on b.symptom = a.ailment

order by PATIENT_ID;

quit;

data drugs10;

set drugs; by PATIENT_ID;

if first.PATIENT_ID then count = 0;

count + 1;

if count <= 10;

drop count;

run;

PG

PG
Occasional Contributor
Posts: 9

Re: Conditionally match merging records from two datasets without a common variable?

Thank you PG Stats for your partial solution - much appreciated!!

Ian

Each step works fine until we get to the PROC SQLs.  You can see below that at line 44 we end up with zero obs. 

Here is the log file from the SQL steps:

39   proc sql;

40   create table drugs as

41   select PATIENT_ID, DRUG_ID

42   from pat as a inner join

43       pharm as b on b.symptom = a.ailment

44   order by PATIENT_ID;

NOTE: Table WORK.DRUGS created, with 0 rows and 2 columns.

45   quit;

NOTE: PROCEDURE SQL used (Total process time):

      real time           0.31 seconds

      cpu time            0.06 seconds

46

47   data drugs10;

48   set drugs; by PATIENT_ID;

49   if first.PATIENT_ID then count = 0;

50   count + 1;

51   if count <= 10;

52   drop count;

53   run;

NOTE: There were 0 observations read from the data set WORK.DRUGS.

NOTE: The data set WORK.DRUGS10 has 0 observations and 2 variables.

NOTE: DATA statement used (Total process time):

      real time           0.03 seconds

      cpu time            0.01 seconds

Respected Advisor
Posts: 4,927

Re: Conditionally match merging records from two datasets without a common variable?

When I run the code that I posted, I get 4 rows from the SQL step. Note that the join in the SQL step is case-sensitive, it requires exact matches between symptom and ailment words.

PG

PG
Respected Advisor
Posts: 4,927

Re: Conditionally match merging records from two datasets without a common variable?

Note also that it might be a good idea to add the DISTINCT option to the query for patients with many ailments related to the same drug :

proc sql;

create table drugs as

select DISTINCT PATIENT_ID, DRUG_ID

from pat as a inner join

    pharm as b on b.symptom = a.ailment

order by PATIENT_ID;

quit;


PG

PG
Occasional Contributor
Posts: 9

Re: Conditionally match merging records from two datasets without a common variable?

Hello PG Stats!

You are amazing!!  Thank you once again.

I entered your code "exactly as is" (I copy and pasted it into SAS) at part 6 of this conversation and I still get 0 obs.

Sorry to bother you, but do you have any further ideas?

All the best to you,

Ian

Occasional Contributor
Posts: 9

Re: Conditionally match merging records from two datasets without a common variable?

PG STATS had some good suggestions, but they didn't completely work out.

Anybody else have any great ideas to solve this one?

Thank you to all!

Ian

Super Contributor
Posts: 298

Re: Conditionally match merging records from two datasets without a common variable?

Ian,

PGStats solution gives the following OUTPUT. Do you want a different output? Please tell what you now get and what you want. It will help others to help you.

       PATIENT_

Obs       ID       DRUG_ID

1      238761       1023

2      238761       1024

3      239978       1022

4      239978       1024

Occasional Contributor
Posts: 9

Re: Conditionally match merging records from two datasets without a common variable?

Hello datasp,

My comment (#7 above shows my LOG) says it all.  When I copy and paste the code as is and run it in SAS I end up with Zero OBS in the SQL steps.  I'm not sure why.... the output you show is exactly what I need, but it just isn't happening for me.

Thank you for replying,

Ian

Super Contributor
Posts: 298

Re: Conditionally match merging records from two datasets without a common variable?

Ian,

I suspect your reading the input data sets is not OK. Better use Proc Print to see the values. Proc SQL is OK.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 18 replies
  • 741 views
  • 12 likes
  • 5 in conversation