Hello again,
I am back for another round with interleaving. First both @ChrisNZ and smantha provided great insight to my problem earlier - thank you again. In fact, I thought the code (shown below) gave me what I needed. Unfortunately, I spoke too soon.
Briefly to summarize the datasets I am working with, DATASET 1 The variables are"
zipcode,
county,
CCO_1 to CCO_15 (I can't have just one CCO field since more than one CCO overlap in several parts of the state across several zipcodes)
PCPCH_clinic_ID,
clinic_address,
recognition tier of the clinic,
no. of physicians in clinic
no. of other care providers in clinic
DATASET 2
Provider_name,
Specialty,
zipcode,
address of practice,
. . . (other variables)
I interleaved the two sets using zipcode. So the CCO, PCPCH_clinic address, and provider location address are all aligned. So a schematic of the interleaved dataset is (with some illustrative values)
these variables are from DATASET 1 and the important variables from DS2
zipcode county CCO_1 .... CCO_15 PCPCH_clinic_address Provider provider_location_add
97056 Multnomah HealthShare 3008 Sw Salmon . . ...
Dr. Smith 128 S Burns St
Dr. Brown 15595 Sw Teton
Dr. Batra 3008 Sw Salmon
Dr. Glenn 2223 Sw Salmon
Dr. Williams 3008 Sw Salmon
and so on
I want to connect clinic by address with only the physicians at the same address. This is the problem that the code below addresses.
Data CCO_Area.CCO_PCPCH_clinic_interleaved_1;
Set CCO_Area.CCO_PCPCH_clinic_interleaved;
Address = lower(address);
If Address ne ' ' the Add_fac = Address;
Retain Add_fac;
distance = compged(Add_fac,Address);
If distance < 50;
run;
However, I am not getting the result. The result would look like
zipcode county CCO_1 .... CCO_15 PCPCH_clinic_address Provider provider_location_add
97056 Multnomah HealthShare 3008 Sw Salmon . . ...
Dr. Batra 3008 Sw Salmon
and Williams et al.
Oregon has 36 counties and, after using zipcodes to align county, CCO, and clinic there are 647 obs in DATASET 1 and over 57K (roughly) in the provider file or DATASET 2.
So the final dataset will have 647 separate county/CCO/clinic_addresses matched against a varying number of providers by addresses depending on clinic size, county (urban, rural, frontier) and so on.
I tried using Soundex
Data CCO_Area.CCO_PCPCH_Prov_Interleaved_C; Set CCO_Area.CCO_PCPCH_Prov_Interleaved_; Where Address_1 =* '15950 Sw Millikan Way'; run;
That did identify every provider at the address specified which is also the PCPCH-clinic address for the zipcode. But it also pulled variants of the desired address (like 15950 Sw Millikan Way Ste 5, etc).
I have one interleaved dataset with the two address fields that I want to match together but can't seem to
get the result needed.
Again your help and expertise is much appreciated.
wlierman
... View more