<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Data cleaning techniques : duplicate proc freq variable in New SAS User</title>
    <link>https://communities.sas.com/t5/New-SAS-User/Data-cleaning-techniques-duplicate-proc-freq-variable/m-p/554863#M9583</link>
    <description>&lt;P&gt;You may only need to remove leading blanks. The sort order is apparently affected by what ever is causing your problem. A space appears before any letter in the sort order. Proc freq, and others, by default do not display the leading spaces but the order of the data is affected.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;data example;
  length x $25 ;
  x=' Death'; output;
  x='Accident';output;
  x='Death'; output;
run;

proc freq; run;

data example2;
   set example;
   /* removes leading spaces among other things*/
   x=strip(x);
run;

proc freq data=example2;run;&lt;/PRE&gt;
&lt;P&gt;other procedures allow you to use style options to reveal such:&lt;/P&gt;
&lt;PRE&gt;proc tabulate data=example;
   class x;
   classlev x /style=[Asis=on];
   table x,n;
run;&lt;/PRE&gt;
&lt;P&gt;ASIS=on tells the procedure not to remove the leading space for display.&lt;/P&gt;</description>
    <pubDate>Mon, 29 Apr 2019 21:39:42 GMT</pubDate>
    <dc:creator>ballardw</dc:creator>
    <dc:date>2019-04-29T21:39:42Z</dc:date>
    <item>
      <title>Data cleaning techniques : duplicate proc freq variable</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Data-cleaning-techniques-duplicate-proc-freq-variable/m-p/554855#M9581</link>
      <description>&lt;P&gt;I have some pretty messy text data that I need to clean for consistency (case, spaces, spelling, etc.).&amp;nbsp; Anyway, I'm using proc freq to check my progress in dealing with the entries and come up with Death showing up twice in my frequency table.&amp;nbsp; I'd appreciate any guidance.&amp;nbsp; Here is an example of where I'm going:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;data import1_2;&lt;BR /&gt;set Import1 (rename='Harm Code'n = 'Harm Code: Raw'n);&lt;BR /&gt;'Harm Code'n = compbl(strip(left(upcase('Harm Code: Raw'n))));&lt;/P&gt;&lt;P&gt;...&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;*Split observations by delimiter;&lt;/P&gt;&lt;P&gt;Data Import1_Harms (rename=new='Harm Code 2'n);&lt;/P&gt;&lt;P&gt;length new $50.;&lt;BR /&gt;set Import1_2;&lt;BR /&gt;do i=1 by 1 while(scan('Harm Code'n,i,',') ^=' ');&lt;BR /&gt;new=scan('Harm Code'n,i,',');&lt;BR /&gt;output;&lt;BR /&gt;end;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;proc freq data=Import1_Harms;&lt;BR /&gt;table 'Harm Code 2'n / missing;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;Harm Code 2&lt;/TD&gt;&lt;TD&gt;Frequency&lt;/TD&gt;&lt;TD&gt;Percent&lt;/TD&gt;&lt;TD&gt;Cumulative&lt;/TD&gt;&lt;TD&gt;Cumulative&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;Frequency&lt;/TD&gt;&lt;TD&gt;Percent&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;ABNORMAL BLOOD LOSS&lt;/TD&gt;&lt;TD&gt;7&lt;/TD&gt;&lt;TD&gt;0.37&lt;/TD&gt;&lt;TD&gt;7&lt;/TD&gt;&lt;TD&gt;0.37&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;ACCESS SITE COMPLICATIONS&lt;/TD&gt;&lt;TD&gt;5&lt;/TD&gt;&lt;TD&gt;0.27&lt;/TD&gt;&lt;TD&gt;13&lt;/TD&gt;&lt;TD&gt;0.69&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;CONVERSION&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;0.05&lt;/TD&gt;&lt;TD&gt;251&lt;/TD&gt;&lt;TD&gt;13.33&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;DEATH&lt;/TD&gt;&lt;TD&gt;31&lt;/TD&gt;&lt;TD&gt;1.65&lt;/TD&gt;&lt;TD&gt;287&lt;/TD&gt;&lt;TD&gt;15.24&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;DEATH (AAA RELATED)&lt;/TD&gt;&lt;TD&gt;4&lt;/TD&gt;&lt;TD&gt;0.21&lt;/TD&gt;&lt;TD&gt;291&lt;/TD&gt;&lt;TD&gt;15.45&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;DEATH (AAA)&lt;/TD&gt;&lt;TD&gt;6&lt;/TD&gt;&lt;TD&gt;0.32&lt;/TD&gt;&lt;TD&gt;297&lt;/TD&gt;&lt;TD&gt;15.77&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;DEATH (INCONCLUSIVE)&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;0.11&lt;/TD&gt;&lt;TD&gt;299&lt;/TD&gt;&lt;TD&gt;15.88&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;DEATH (INDETERMINATE)&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;0.05&lt;/TD&gt;&lt;TD&gt;300&lt;/TD&gt;&lt;TD&gt;15.93&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;…….&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;AS1&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;0.05&lt;/TD&gt;&lt;TD&gt;902&lt;/TD&gt;&lt;TD&gt;47.9&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;BL3&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;0.11&lt;/TD&gt;&lt;TD&gt;904&lt;/TD&gt;&lt;TD&gt;48.01&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;CMP&lt;/TD&gt;&lt;TD&gt;5&lt;/TD&gt;&lt;TD&gt;0.27&lt;/TD&gt;&lt;TD&gt;909&lt;/TD&gt;&lt;TD&gt;48.27&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;COMPLICATIONS&lt;/TD&gt;&lt;TD&gt;10&lt;/TD&gt;&lt;TD&gt;0.53&lt;/TD&gt;&lt;TD&gt;919&lt;/TD&gt;&lt;TD&gt;48.81&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;CONVERSION&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;0.05&lt;/TD&gt;&lt;TD&gt;920&lt;/TD&gt;&lt;TD&gt;48.86&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;CTI&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;0.05&lt;/TD&gt;&lt;TD&gt;923&lt;/TD&gt;&lt;TD&gt;49.02&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;DEATH&lt;/TD&gt;&lt;TD&gt;13&lt;/TD&gt;&lt;TD&gt;0.69&lt;/TD&gt;&lt;TD&gt;936&lt;/TD&gt;&lt;TD&gt;49.71&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;DEATH(UNKNOWN CAUSE)&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;0.05&lt;/TD&gt;&lt;TD&gt;953&lt;/TD&gt;&lt;TD&gt;50.61&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Death (and other values) show up twice in the frequency list.&amp;nbsp; Does this have something to do with the original format and field value?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Wes&lt;/P&gt;</description>
      <pubDate>Mon, 29 Apr 2019 21:10:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Data-cleaning-techniques-duplicate-proc-freq-variable/m-p/554855#M9581</guid>
      <dc:creator>uopsouthpaw</dc:creator>
      <dc:date>2019-04-29T21:10:05Z</dc:date>
    </item>
    <item>
      <title>Re: Data cleaning techniques : duplicate proc freq variable</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Data-cleaning-techniques-duplicate-proc-freq-variable/m-p/554859#M9582</link>
      <description>Use COMPRESS() to remove invisible blanks such as tabs or returns. You can use the modifiers to specify invisible blanks.</description>
      <pubDate>Mon, 29 Apr 2019 21:27:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Data-cleaning-techniques-duplicate-proc-freq-variable/m-p/554859#M9582</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2019-04-29T21:27:25Z</dc:date>
    </item>
    <item>
      <title>Re: Data cleaning techniques : duplicate proc freq variable</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Data-cleaning-techniques-duplicate-proc-freq-variable/m-p/554863#M9583</link>
      <description>&lt;P&gt;You may only need to remove leading blanks. The sort order is apparently affected by what ever is causing your problem. A space appears before any letter in the sort order. Proc freq, and others, by default do not display the leading spaces but the order of the data is affected.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;data example;
  length x $25 ;
  x=' Death'; output;
  x='Accident';output;
  x='Death'; output;
run;

proc freq; run;

data example2;
   set example;
   /* removes leading spaces among other things*/
   x=strip(x);
run;

proc freq data=example2;run;&lt;/PRE&gt;
&lt;P&gt;other procedures allow you to use style options to reveal such:&lt;/P&gt;
&lt;PRE&gt;proc tabulate data=example;
   class x;
   classlev x /style=[Asis=on];
   table x,n;
run;&lt;/PRE&gt;
&lt;P&gt;ASIS=on tells the procedure not to remove the leading space for display.&lt;/P&gt;</description>
      <pubDate>Mon, 29 Apr 2019 21:39:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Data-cleaning-techniques-duplicate-proc-freq-variable/m-p/554863#M9583</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2019-04-29T21:39:42Z</dc:date>
    </item>
  </channel>
</rss>

