Hi Folks:
I have about 20 survey tables that needed to be aligned to a REFERENCE data. ID1NAME and IDNAME are keys to merge them. They are province(ID1NAME) and city(IDNAME) names. City names are not unique but combination of ID1NAME and IDNAME are. ID1NAME is cleanable since at least there is a logic. However, IDNAME in survey tables are messy and no logic, to me, at least. Province name is included in the city name at front OR at the end of the variable varying across rows. Explanatory but non-standardized texts appear like: integrated, combined or altogether just to name few. There can be 1-4 open spaces as a delimiter varying across rows. I see the way to standardize IDNAME in survey table probably is to create separate each string separated by space into different variables and use array. Then find if IDNAME of the REFERENCE data in the array of these newly created variables. For example, the IDNAME of third row would lead to three variables such as: var1 for Integrated, var2 for Cheongju and var3 for City. Which is the array of var1-var3.
If IDNAME of REFERENCE data is found in this array of var1-var3 then output as a clean desired IDNAME as a new variable in survey datasets.
Can you please help execute this idea in SAS if this makes sense? Below is mock datasets. But I couldn't figure how to input strings separated by space(s) as one variable. I apologize!
DATA SURVEY;
INPUT ID1NAME $25 IDNAME $25;
CARDS;
Chung sub Total
Chung Chungju Chung
Chung Jecheon
Chung Integrated Cheongju City
Chung Boeun-gun
Chung Okcheon-gun
Chang Yeongdong-gun
Chang Chang Jincheon-gun
Chang Chang Goesan-gun
Chang Combined Eumseong City
Chang Danyang-gun
Chang Jeungpyeong
;
DATA REFERENCE;
INPUT ID1NAME $25 IDNAME $25;
CARDS;
Chung Chungju
Chung Jecheon
Chung Cheongju
Chung Boeun
Chung Okcheon
Chang Yeongdong
Chang Jincheon
Chang Goesan
Chang Eumseong
Chang Danyang
Chang Jeungpyeong
;
... View more