I have approx 12-20 words in the Address field. My Location field in Country dataset has 3-5 words. I cannot use find as the spellings in both the datasets are incorrect. So i use functions like comged and comlev to match the words.
Below is the code:
""""""
data work.namedata; set work.namedata; n+1; do i=1 to countw(Address," "); Address_Split=compress(scan(Address,i," ","m"),", / : @ # & ( ) ; \ . ! 0 1 2 3 4 5 6 7 8 9");output; end;
drop n i; run;
data work.namedata; set work.namedata; Add_split_lag = lag (Address_Split); Location_Count = _N_; run;
proc sort data = work.namedata ; by descending Location_Count ; run;
data work.namedata; set work.namedata; Add_split_lead = lag (Address_split); run;
proc sql; create table Name_Country_Cart as select * from Namedata as a left join Countrydata as b on a.Country=b.Country; quit;
data Name_Country_Cart _01; set Name_Country_Cart ; m+1; do j=1 to countw(Location," "); n+1; do i=1 to countw(Sub_Location," "); Sub_Split=compress(scan(Sub_Location,i," ","m"),", / - : @ # & ( ) ; \ . ! 0 1 2 3 4 5 6 7 8 9");output; Location_Split=compress(scan(Location,j," ","m"),", / - : @ # & ( ) ; \ . ! 0 1 2 3 4 5 6 7 8 9");output; end; end; drop j m i n; run;
proc sort data = Name_Country_Cart _01 nodupkey; by Name Address Address_split Add_split_lag Add_split_lead Location Location_Split Sub_Split; where Sub_split is not missing; run;
data Name_Country_Cart_02;
set Name_Country_Cart_01;
where Location_Split is not missing;
Ged_Score_SDT = compged (Address_split , Location_Split); Lev_Score_SDT = complev (Address_split , Location_Split);
run;
''''''
The code runs for some observations but gets stuck after a certain point while running Name_Country_Cart_01. Which is why I decided to spilt the Country dataset and run the name dataset one name at a time.
... View more