<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Techniques to use to aggregated names and terms into common classifications in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Techniques-to-use-to-aggregated-names-and-terms-into-common/m-p/749946#M235794</link>
    <description>&lt;P&gt;Have a look at OpenRefine (AKA Google Refine). A free tool for dealing with messy data.&lt;/P&gt;</description>
    <pubDate>Wed, 23 Jun 2021 18:47:49 GMT</pubDate>
    <dc:creator>PGStats</dc:creator>
    <dc:date>2021-06-23T18:47:49Z</dc:date>
    <item>
      <title>Techniques to use to aggregated names and terms into common classifications</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Techniques-to-use-to-aggregated-names-and-terms-into-common/m-p/749935#M235785</link>
      <description>&lt;P&gt;What I need to do is group similar occupations / places of employment together in a contact tracing dataset.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have 124,803 rows and the field I want to "scrape" is Place of Employment.&amp;nbsp; What I am trying to do is get a picture of the impact of covid-19 on contacts from an "economic" POV.&amp;nbsp; I want to group together like work / occupations and then label them as one might see in Bureau of Labor Stats e.g., finance, trade, manufacturing, services and so on.&amp;nbsp; Then hopefully the development team can incorporate these occupational titles into subsequent surveys and make it easier for work / employment analyses in the future.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As usual not every obs has&amp;nbsp; response.&amp;nbsp; A small example of some of the responses include&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;row&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Place of Employment&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;10000&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; unemployed&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;10210&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; XYZ elementary school&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;11800&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Seven - 11&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;23453&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; retired&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;86754&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Tri-state aviation&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;100256&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; student&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;111876&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; City of Richland&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;123245&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; St. Randall's Hospital&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;and so.&amp;nbsp; As I mentioned there there are a lot of missing /&amp;nbsp; no responses. Of the 124,803 obs there are around&lt;/P&gt;
&lt;P&gt;35,000 that have something included.&amp;nbsp; So this will be a somewhat tedious to crawl through even 35,000 plus obs but it is a needed data cleaning exercise. Hopefully it will help provide a little more information&amp;nbsp; / knowledge for stakeholders and interested parties.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;My question is: Are there any SAS tips, techniques, coding, and data tricks that could make the "scraping / aggregation" not quite as tedious but more accurate in the end?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you for any ideas, techniques, or processes in advance.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;wklierman&lt;/P&gt;</description>
      <pubDate>Wed, 23 Jun 2021 18:12:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Techniques-to-use-to-aggregated-names-and-terms-into-common/m-p/749935#M235785</guid>
      <dc:creator>wlierman</dc:creator>
      <dc:date>2021-06-23T18:12:47Z</dc:date>
    </item>
    <item>
      <title>Re: Techniques to use to aggregated names and terms into common classifications</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Techniques-to-use-to-aggregated-names-and-terms-into-common/m-p/749946#M235794</link>
      <description>&lt;P&gt;Have a look at OpenRefine (AKA Google Refine). A free tool for dealing with messy data.&lt;/P&gt;</description>
      <pubDate>Wed, 23 Jun 2021 18:47:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Techniques-to-use-to-aggregated-names-and-terms-into-common/m-p/749946#M235794</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2021-06-23T18:47:49Z</dc:date>
    </item>
    <item>
      <title>Re: Techniques to use to aggregated names and terms into common classifications</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Techniques-to-use-to-aggregated-names-and-terms-into-common/m-p/750055#M235856</link>
      <description>Thank you for the suggestion. From what I can see it isn't what I need.  It isn't so much that the data is messy - the data is what it is.  This seems like it will take more of "brute force" approach (a piece-meal approach) with a lot of sub-setting If / then statements.  Maybe some type of 'fuzzy-matching' in addition to spedis or soundex.&lt;BR /&gt;&lt;BR /&gt;I will keep looking.  Thanks for the OpenRefine suggestion.&lt;BR /&gt;&lt;BR /&gt;wklierman</description>
      <pubDate>Wed, 23 Jun 2021 23:31:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Techniques-to-use-to-aggregated-names-and-terms-into-common/m-p/750055#M235856</guid>
      <dc:creator>wlierman</dc:creator>
      <dc:date>2021-06-23T23:31:32Z</dc:date>
    </item>
  </channel>
</rss>

