What I need to do is group similar occupations / places of employment together in a contact tracing dataset.
I have 124,803 rows and the field I want to "scrape" is Place of Employment. What I am trying to do is get a picture of the impact of covid-19 on contacts from an "economic" POV. I want to group together like work / occupations and then label them as one might see in Bureau of Labor Stats e.g., finance, trade, manufacturing, services and so on. Then hopefully the development team can incorporate these occupational titles into subsequent surveys and make it easier for work / employment analyses in the future.
As usual not every obs has response. A small example of some of the responses include
row Place of Employment
10000 unemployed
10210 XYZ elementary school
11800 Seven - 11
23453 retired
86754 Tri-state aviation
100256 student
111876 City of Richland
123245 St. Randall's Hospital
and so. As I mentioned there there are a lot of missing / no responses. Of the 124,803 obs there are around
35,000 that have something included. So this will be a somewhat tedious to crawl through even 35,000 plus obs but it is a needed data cleaning exercise. Hopefully it will help provide a little more information / knowledge for stakeholders and interested parties.
My question is: Are there any SAS tips, techniques, coding, and data tricks that could make the "scraping / aggregation" not quite as tedious but more accurate in the end?
Thank you for any ideas, techniques, or processes in advance.
wklierman
Have a look at OpenRefine (AKA Google Refine). A free tool for dealing with messy data.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.