2 weeks ago - last edited 2 weeks ago
/*in the list of Cards - first one is exact word, followup are have typo err.*/ data CATEGORY; input MODE :$20; cards; A.VELOCITY A.VALOSITY A.VELOCIY B.HYDROCHLORIC B.HYDRRACLORIK B.HYDROCLOKIK C.GEOMETRY C.GEMENTRY run; /* the above dataset is just for an example */
/* I have a table in SQL with same kind of problem, As now just keep on eye with cards */
proc sql; update CATEGORY set Column2 = 'physics' where MODE =* 'A.VELOCITY'; run; proc sql; update CATEGORY set Column2 = 'chemistry' where MODE =* 'B.HYDROCLORIC'; run; proc sql; update CATEGORY set Column2 = 'maths' where MODE =* 'C.GEOMENTRY'; run;
/*Is there any effective way to resolve this than some other methods in SAS */
/*I don't care about the initial (A. B. C.) but i need to focus on methods(velocity geometry hydrochloric)
MY MAJOR CONCERN ON CATEGORIZING "TYPO ERR"
2 weeks ago
You need to somehow standardize your data using some mechanism to group/cluster values with similar strings.
There are quite a few discussions and solution approaches around such "typo" problems in this forum.
To start with use search terms like: SPEDIS, COMPGED, FUZZY ...and I'm sure the posts you find with these terms will give you ideas for other search terms you could use as well.