<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Program removes duplicates that are no duplicates in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Program-removes-duplicates-that-are-no-duplicates/m-p/477190#M24842</link>
    <description>&lt;P&gt;How is your "&lt;SPAN&gt;work.falk_u250test" dataset created? Did you check the values in it.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;proc print data=work.falk_u250test;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;run;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;If your dataset is very large that you can't print then pull only the records you want to validate.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;proc print data=work.falk_u250test;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;where job in ('job10','job1000');&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;run;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;I'm not much aware of this procedure.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 11 Jul 2018 18:22:05 GMT</pubDate>
    <dc:creator>SuryaKiran</dc:creator>
    <dc:date>2018-07-11T18:22:05Z</dc:date>
    <item>
      <title>Program removes duplicates that are no duplicates</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Program-removes-duplicates-that-are-no-duplicates/m-p/477115#M24839</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a program that reads data which has to be processed, the data consists of two columns (Jobs, length) with Jobs rows as follows = (Job1, Job2, ...., Job250).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;SAS seems to falsely identifying duplicate rows (i.e. Job100 is a duplicate of Job 10). The following screenshot shows the line of code and duplicate message:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Capture.JPG" style="width: 600px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/21716iC5472CC30FC38F6D/image-size/large?v=v2&amp;amp;px=999" role="button" title="Capture.JPG" alt="Capture.JPG" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;How do I specify the way duplicates are identified and where do I embed this in my code?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you in advance!&lt;/P&gt;</description>
      <pubDate>Wed, 11 Jul 2018 15:07:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Program-removes-duplicates-that-are-no-duplicates/m-p/477115#M24839</guid>
      <dc:creator>Hendrik</dc:creator>
      <dc:date>2018-07-11T15:07:01Z</dc:date>
    </item>
    <item>
      <title>Re: Program removes duplicates that are no duplicates</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Program-removes-duplicates-that-are-no-duplicates/m-p/477141#M24840</link>
      <description>&lt;P&gt;Are you sure your values in the dataset are not truncated to length of 5.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data test;
length var $5;
input var $;
datalines ;
job10
job1000
;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;In this example both the values will be "job10", since length is 5 SAS reads only first 5 characters.&lt;/P&gt;</description>
      <pubDate>Wed, 11 Jul 2018 16:11:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Program-removes-duplicates-that-are-no-duplicates/m-p/477141#M24840</guid>
      <dc:creator>SuryaKiran</dc:creator>
      <dc:date>2018-07-11T16:11:57Z</dc:date>
    </item>
    <item>
      <title>Re: Program removes duplicates that are no duplicates</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Program-removes-duplicates-that-are-no-duplicates/m-p/477148#M24841</link>
      <description>&lt;P&gt;Thanks Suryakiran,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I first thought, that the variables are shortened to a default length of 5, but I changed the names and SAS still removes quite a lot. Here is the first part of my code until the error occurs:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="2"&gt;&lt;STRONG&gt;data&lt;/STRONG&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt; myContent;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;set&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt; work.falk_u250test;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="2"&gt;&lt;STRONG&gt;run&lt;/STRONG&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="2"&gt;&lt;STRONG&gt;proc&lt;/STRONG&gt;&lt;/FONT&gt; &lt;STRONG&gt;&lt;FONT color="#000080" face="Courier New" size="2"&gt;print&lt;/FONT&gt;&lt;/STRONG&gt; &lt;FONT color="#0000ff" face="Courier New" size="2"&gt;data&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt;=myContent(keep= Jobs length);&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="2"&gt;&lt;STRONG&gt;run&lt;/STRONG&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="2"&gt;&lt;STRONG&gt;proc&lt;/STRONG&gt;&lt;/FONT&gt; &lt;STRONG&gt;&lt;FONT color="#000080" face="Courier New" size="2"&gt;optmodel&lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT face="Courier New" size="2"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#008000" face="Courier New" size="2"&gt;/* read the product and size data */&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;set&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt; &amp;lt;&lt;/FONT&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;str&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt;&amp;gt; PRODUCTS;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;num&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt; length {PRODUCTS};&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;read&lt;/FONT&gt; &lt;FONT color="#0000ff" face="Courier New" size="2"&gt;data&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt; myContent &lt;/FONT&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;into&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt; PRODUCTS=[Jobs] length;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="2"&gt;[&lt;/FONT&gt;&lt;STRONG&gt;&lt;FONT color="#008080" face="Courier New" size="2"&gt;...&lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT face="Courier New" size="2"&gt;]&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="2"&gt;&lt;STRONG&gt;quit&lt;/STRONG&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Could it be some default setting of the 'optmodel'?&lt;/P&gt;</description>
      <pubDate>Wed, 11 Jul 2018 16:49:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Program-removes-duplicates-that-are-no-duplicates/m-p/477148#M24841</guid>
      <dc:creator>Hendrik</dc:creator>
      <dc:date>2018-07-11T16:49:10Z</dc:date>
    </item>
    <item>
      <title>Re: Program removes duplicates that are no duplicates</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Program-removes-duplicates-that-are-no-duplicates/m-p/477190#M24842</link>
      <description>&lt;P&gt;How is your "&lt;SPAN&gt;work.falk_u250test" dataset created? Did you check the values in it.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;proc print data=work.falk_u250test;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;run;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;If your dataset is very large that you can't print then pull only the records you want to validate.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;proc print data=work.falk_u250test;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;where job in ('job10','job1000');&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;run;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;I'm not much aware of this procedure.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 11 Jul 2018 18:22:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Program-removes-duplicates-that-are-no-duplicates/m-p/477190#M24842</guid>
      <dc:creator>SuryaKiran</dc:creator>
      <dc:date>2018-07-11T18:22:05Z</dc:date>
    </item>
    <item>
      <title>Re: Program removes duplicates that are no duplicates</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Program-removes-duplicates-that-are-no-duplicates/m-p/477260#M24843</link>
      <description>&lt;P&gt;The Jobs column of the dataset is created in Excel with an autofill (drag down) option. So far, the only way to avoid the duplicate deletion is to insert random names for the jobs and work with that.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;As I thought that maybe SAS automatically reduces the number of used signs to '5', I also tried&amp;nbsp;to manually set it to '6'&amp;nbsp;during the data import:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="2"&gt;&lt;STRONG&gt;data&lt;/STRONG&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt; myContent;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;length&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt; var $&lt;/FONT&gt;&lt;STRONG&gt;&lt;FONT color="#008080" face="Courier New" size="2"&gt;6&lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT face="Courier New" size="2"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;set&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt; work.falk_u250test;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="2"&gt;&lt;STRONG&gt;run&lt;/STRONG&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="2"&gt;However, it does not work either&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 11 Jul 2018 21:55:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Program-removes-duplicates-that-are-no-duplicates/m-p/477260#M24843</guid>
      <dc:creator>Hendrik</dc:creator>
      <dc:date>2018-07-11T21:55:04Z</dc:date>
    </item>
    <item>
      <title>Re: Program removes duplicates that are no duplicates</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Program-removes-duplicates-that-are-no-duplicates/m-p/477275#M24844</link>
      <description>&lt;P&gt;You can use PROC SORT to check if there are duplicates.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=work.falk_u250test out=sorted nodupkey ;
  by var ;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Also make sure that you have not accidentally assigned a short FORMAT to the variable.&amp;nbsp; If you have a variable with a length of $10 but have assigned a format of only $5. to it then many procedures (like PROC FREQ) will use the FORMATTED value instead of the actual value.&amp;nbsp; This can result in duplicates.&lt;/P&gt;</description>
      <pubDate>Wed, 11 Jul 2018 23:25:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Program-removes-duplicates-that-are-no-duplicates/m-p/477275#M24844</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2018-07-11T23:25:52Z</dc:date>
    </item>
    <item>
      <title>Re: Program removes duplicates that are no duplicates</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Program-removes-duplicates-that-are-no-duplicates/m-p/477281#M24845</link>
      <description>&lt;P&gt;Hm, It would be better if you share your whole code on how your importing the data&amp;nbsp; and what your doing next. No one here knows your data more than you, we only can give suggestion on what we understand.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Since you mentioned you are importing the data from Excel, you first need to understand that PROC IMPORT works on guessing data. It scans first 20 rows and determine the data types before importing. Your Job variable might have length of 5 for the first 20 rows (ie: values Job1-job20 ) so SAS imports data with length 5 for Job variable. Values where length&amp;gt;5 will be truncated during the import process and assigning greater length later will not work because already data was truncated.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Solution:&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;1) Try EXCEL LIBNAME (&lt;/SPAN&gt;&lt;SPAN&gt;libname xl EXCEL 'D:\SASUniversityEdition\myfolders\Import_Data.xlsx'; ) and read the excel file as dataset.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;2) Save your Excel file as CSV and then import using DATA STEP infile,&amp;nbsp;this way you have more control over data types.&lt;/P&gt;
&lt;P&gt;3) You can also use GUESSINGROWS option in PROC IMPORT if your data is in the form of raw data (csv,txt..)&lt;/P&gt;
&lt;P&gt;4) Make sure your excel file is sorted in the order to have max length values in the first 20 rows.&lt;/P&gt;
&lt;P&gt;5)&amp;nbsp;&lt;SPAN&gt;SAS 9.4, and in the command line – in the upper left corner of the screen – type&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;regedit&lt;/STRONG&gt;&lt;SPAN&gt;. Then navigate to&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;Products--&amp;gt;BASE--&amp;gt;EFI--&amp;gt;GuessingRows&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN&gt;and set the value to the number of rows in the CSV or Excel file you want SAS to scan to find the longest&amp;nbsp;value.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="image.png" style="width: 600px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/21731i9D961E2CBBBEE407/image-size/large?v=v2&amp;amp;px=999" role="button" title="image.png" alt="image.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 11 Jul 2018 23:50:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Program-removes-duplicates-that-are-no-duplicates/m-p/477281#M24845</guid>
      <dc:creator>SuryaKiran</dc:creator>
      <dc:date>2018-07-11T23:50:10Z</dc:date>
    </item>
    <item>
      <title>Re: Program removes duplicates that are no duplicates</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Program-removes-duplicates-that-are-no-duplicates/m-p/477423#M24856</link>
      <description>&lt;P&gt;Thank you!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I did not know that SAS looks at the first 20 values to estimate the variables. In my case, the variable names were 'Job' + # which resulted in 6 instead of 5 characters for values from 100. If the variable length was set to $5 before, the last character was automatically cut of and hence, the duplicates were found and eliminated.&lt;/P&gt;</description>
      <pubDate>Thu, 12 Jul 2018 12:46:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Program-removes-duplicates-that-are-no-duplicates/m-p/477423#M24856</guid>
      <dc:creator>Hendrik</dc:creator>
      <dc:date>2018-07-12T12:46:03Z</dc:date>
    </item>
  </channel>
</rss>

