<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Duplicates in sentences in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Duplicates-in-sentences/m-p/830388#M328109</link>
    <description>&lt;P&gt;It's not clear what you're considering a duplicate, can you clarify?&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;What is the input? Is this the sample data posted? If so, please post as a data step in code block.&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;How is a duplicate defined?&lt;/LI&gt;
&lt;LI&gt;What is the expected output?&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 25 Aug 2022 17:35:59 GMT</pubDate>
    <dc:creator>Reeza</dc:creator>
    <dc:date>2022-08-25T17:35:59Z</dc:date>
    <item>
      <title>Duplicates in sentences</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Duplicates-in-sentences/m-p/830378#M328104</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I have a variable called Q6 in sas dataset. It is a survey questionnaire. I will like to check if there are duplicates in the sentences in Q6. If there are duplicates I will like to create a new variable called Q6_dup =1 or Q6_dup =0.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For example:&lt;/P&gt;&lt;P&gt;data want;&lt;/P&gt;&lt;P&gt;input ID$ Q6$;&lt;/P&gt;&lt;P&gt;1233&amp;nbsp; Any drug has certain side effects, we must strictly under the guidance of the doctor's rational use of drugs, do not abuse drugs.&amp;nbsp;&lt;/P&gt;&lt;P&gt;3656&amp;nbsp; Abuse of drugs may aggravate the adverse reactions of drugs and affect the recovery of the disease&lt;/P&gt;&lt;P&gt;8677 As far as addiction being a disease…in some cases yes…say in the instance where someone had a serious accident&lt;/P&gt;&lt;P&gt;3455 Abuse of drugs may aggravate the adverse reactions of drugs and affect the recovery of the disease.&lt;/P&gt;&lt;P&gt;I don't know how to use (if first.var) because sentences may start with the same word but may not continue to be the same.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 25 Aug 2022 17:10:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Duplicates-in-sentences/m-p/830378#M328104</guid>
      <dc:creator>CathyVI</dc:creator>
      <dc:date>2022-08-25T17:10:33Z</dc:date>
    </item>
    <item>
      <title>Re: Duplicates in sentences</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Duplicates-in-sentences/m-p/830382#M328105</link>
      <description>&lt;P&gt;Please identify which exact "duplicates" you want marked in the example.&lt;/P&gt;
&lt;P&gt;And which aren't.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you have many of these "duplicates" I would ask if this text is actually generated as the result of selecting an option in the survey (a common issue with survey entry software) in which case your duplicate indicator is not going to be very helpful.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Typically for my surveys I would read the data with custom informats so that the known and expected responses are coded to standard value and then only worry about the possible open text responses if they appear in the same responses.&lt;/P&gt;</description>
      <pubDate>Thu, 25 Aug 2022 17:21:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Duplicates-in-sentences/m-p/830382#M328105</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2022-08-25T17:21:10Z</dc:date>
    </item>
    <item>
      <title>Re: Duplicates in sentences</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Duplicates-in-sentences/m-p/830387#M328108</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13884"&gt;@ballardw&lt;/a&gt;&amp;nbsp; From the example I gave ID 3656 and 3455 are duplicates. See below:&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;FONT size="2" color="#000000"&gt;3656 Abuse of drugs may aggravate the adverse reactions of drugs and affect the recovery of the disease&lt;/FONT&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;3455 Abuse of drugs may aggravate the adverse reactions of drugs and affect the recovery of the disease.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;I could like to mark the &lt;FONT color="#FF0000"&gt;3656&lt;/FONT&gt; as&amp;nbsp;Q6_dup =1 and &lt;FONT color="#FF0000"&gt;3455&lt;/FONT&gt; as Q6_dup =0.&lt;/P&gt;&lt;P&gt;Yes, text is generated as the result of selecting an option in the survey&lt;/P&gt;&lt;P&gt;I am still learning sas and not sure how to approach this using the custom informats.&lt;/P&gt;&lt;P&gt;Please could you or anyone help me with a sample code and explanation using the example above.&lt;/P&gt;</description>
      <pubDate>Thu, 25 Aug 2022 17:35:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Duplicates-in-sentences/m-p/830387#M328108</guid>
      <dc:creator>CathyVI</dc:creator>
      <dc:date>2022-08-25T17:35:09Z</dc:date>
    </item>
    <item>
      <title>Re: Duplicates in sentences</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Duplicates-in-sentences/m-p/830388#M328109</link>
      <description>&lt;P&gt;It's not clear what you're considering a duplicate, can you clarify?&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;What is the input? Is this the sample data posted? If so, please post as a data step in code block.&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;How is a duplicate defined?&lt;/LI&gt;
&lt;LI&gt;What is the expected output?&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 25 Aug 2022 17:35:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Duplicates-in-sentences/m-p/830388#M328109</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2022-08-25T17:35:59Z</dc:date>
    </item>
    <item>
      <title>Re: Duplicates in sentences</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Duplicates-in-sentences/m-p/830393#M328111</link>
      <description>&lt;P&gt;If the text isn't free text and ID isn't per person, then sort it and flag the first record? Do you care which ID is marked as non-duplicate? What happens if there are more than 2 instances?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=have;
by Q6 ID;
run;

data want;
set have;
by q6;
q6_dup=first.q6;
run;

&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 25 Aug 2022 17:55:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Duplicates-in-sentences/m-p/830393#M328111</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2022-08-25T17:55:51Z</dc:date>
    </item>
  </channel>
</rss>

