BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
wlierman
Lapis Lazuli | Level 10

I am working to combine two SAS datasets. One has zipcodes, coordinated_care_organizations, and the PCPCH-clinic associated with that zipcode.  The second is a provider data set with provider, specialty, address, zipcode and several other fields.  I have connected the PCPCH-clinic and providers by zipcode - this was to ensure that the clinic and providers lined up for each CCO.

The next goal (and what I am writing about) is to match the providers that work in the PCPCH-clinic by address of the clinic.  I ended up interleaving the two datasets and this following schematic is what the

output dataset looks like after the interleave operation. The numbers are illustrative.

 

zipcode        county                CCO                   address                       provider                address

96007           Lincoln      Healthcare_CCO       234 SW Green St            .                            .                   ....

                                                                                                             R Smith             12345 Brown Rd

                                                                                                             E Collins               234 SW Green St

                                                                                                             R Batra                4355  Plaza Dr

                                                                                                             J Frank                  235 SW Green St

                                                                                                             B Jones                 234 SW Green St

                                                                                                             R Budden             6775 Evergreen Cir

                                                                                                             P Smith                 234 SW Green St

 

                                                                                                            . . .

While the PCPCH-clinic is identified by zipcode there are several clinics or practices within that zipcode

that are not a PCPCH-clinic.   So what I would like to do is match only the providers at the same address

as the PCPCH-clinic.  So the desired result would look like:

 

zipcode        county                CCO                       address                       provider                address

96007           Lincoln      Healthcare_CCO       234 SW Green St                  .                               .        ...

                                                                                                                 E Collins            234 SW Green St

                                                                                                                 B Jones              234 SW Green St

                                                                                                                 P Smith              234 SW Green St

 

An so on for each PCPCH-clinic identified by zipcode for the state of OR.

 

My question is what approach should I take to link the PCPCH-clinic to the providers that practice in the

clinic?

 

Thank you in advance for your help.

 

Walt Lierman

1 ACCEPTED SOLUTION

Accepted Solutions
smantha
Lapis Lazuli | Level 10

I put in datalines code as I did not have data and had to generate it. In your case  you have to modify the code below

Data CCO_Area.CCO_PCPCH_clinic_interleaved_1;

   Set CCO_Area.CCO_PCPCH_clinic_interleaved;
   Address = lower(address);
   If Address ne '  ' the Add_fac = Address;

     Retain Add_fac;
   distance = compged(Add_fac,Address);
  If distance < 50;

run;

View solution in original post

9 REPLIES 9
ChrisNZ
Tourmaline | Level 20

Try adding these lines?

if ADDRESS_CLINIC ne ' ' then A = ADDRESS_CLINIC;
retain A;
if A = ADDRESS_PROVIDER;

 

 

wlierman
Lapis Lazuli | Level 10

Hello,

Thank you for responding. I added those lines but the data set is empty

 

NOTE: There were 51749 observations read from the data set CCO_AREA.CCO_PCPCH_PROV_INTERLEAVED.
NOTE: The data set CCO_AREA.CCO_PCPCH_PROV_INTERLEAVED_1 has 0 observations and 42 variables.
NOTE: DATA statement used (Total process time):
real time 0.05 seconds
cpu time 0.04 seconds

 

This is how I used the code:

 

Data CCO_Area.CCO_PCPCH_clinic_interleaved_1;

   Set CCO_Area.CCO_PCPCH_clinic_interleaved;

   If Address ne '  ' the Add_fac = Address;

     Retain Add_fac;

  If Add_fac = Address_1;

run;

 

Address is the field name in the clinic dataset and Address_1 the field name in the provider dataset.

 

One thing that I didn't mention in my original post is that the address fields like many text string data are

prone to slight anomalies like  one set may have   345 SW Pine St  Ste 2 and the other may have 345 Sw Pine Stste .  These types of coding problems occur in very few records actually.

 

But this result (0 obs) occurs over the entire dataset.

 

wlierman

 

smantha
Lapis Lazuli | Level 10

1. Look at the usual suspects

  • Case of strings on both sides (change case to either upper or lower)
  • leading trailing spaces (use strip when comparing)
  • spaces within the string (remove all spaces within a string (may be useful)

2. If there are slight variations in your string use edit distance such as compged, complev and spedis. set a threshold to filter. 

smantha
Lapis Lazuli | Level 10

informat zipcode 5. cco $32. county $32. address $64.;
input zipcode county $ cco $ address $;
infile cards dlm='|';
cards;
96007|Lincoln|Healthcare_CCO|234 SW Green St
96008|Lincoln|Healthcare_CCO|6775 Evergreen Cir ste 200
;;;
run;

proc sort; by zipcode address; run;

data cc2;
informat zipcode 5. provider $32. address $64.;
input zipcode provider $ address $;
infile cards dlm='|';
cards;
96007|R Smith|12345 Brown Rd
96007|E Collins|234 SW Green St
96007|R Batra|4355 Plaza Dr
96007|J Frank|235 SW Green St
96007|B Jones|234 SW Green St ste2
96008|R Budden|6775 Evergreen Cir
96007|P Smith|234 SW Green St
;;;
run;

proc sort; by zipcode address; run;

data cc3;
length prv_address $32.;
set cc1(in=cc1) cc2(in=cc2);
by zipcode;
retain prv_address ' ';
if cc1 then prv_address = address;
distance=spedis(strip(tranwrd(address,' ','')),strip(tranwrd(prv_address,' ','')));
put _all_;
if distance < 50;
run;

wlierman
Lapis Lazuli | Level 10

Thank you both for your help.

 

I like the coding provided and the approach.  But in order to implement the coding approach I need to ask another question.

The PCPCH_clinic data set has 649 obs whereas the provider dataset has over 51K obs.

 

How do I implement the delimiter step using the pipe | over two data sets like this? 

 

Thank you.

 

wlierman

smantha
Lapis Lazuli | Level 10

I put in datalines code as I did not have data and had to generate it. In your case  you have to modify the code below

Data CCO_Area.CCO_PCPCH_clinic_interleaved_1;

   Set CCO_Area.CCO_PCPCH_clinic_interleaved;
   Address = lower(address);
   If Address ne '  ' the Add_fac = Address;

     Retain Add_fac;
   distance = compged(Add_fac,Address);
  If distance < 50;

run;
wlierman
Lapis Lazuli | Level 10

smantha,

 

Thank you for your help and insight.

I really appreciate it as this overcomes a hurdle to completion.

Take care.

 

wlierman

 

smantha
Lapis Lazuli | Level 10

You are welcome! We should thank @ChrisNZ as he started the original solution and I built on top of it.

wlierman
Lapis Lazuli | Level 10

Also did not mean to overlook the work from @ChrisNZ 

Thank you for your help.  I appreciate it.

 

wlierman

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 872 views
  • 8 likes
  • 3 in conversation