<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Compare rows in a dataset in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938424#M368608</link>
    <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/19879"&gt;@Quentin&lt;/a&gt;&amp;nbsp;Is there any short way to list the variables. I have more than 300 variables to compare. To list each single variable will not be funny&lt;/P&gt;</description>
    <pubDate>Tue, 06 Aug 2024 20:28:18 GMT</pubDate>
    <dc:creator>Anita_n</dc:creator>
    <dc:date>2024-08-06T20:28:18Z</dc:date>
    <item>
      <title>Compare rows in a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938353#M368580</link>
      <description>&lt;P&gt;Dear all,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have a table that looks like this&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;data have;&lt;BR /&gt;input id $3 pnr pid time cn var1 var2 var3 var4 var5;&lt;BR /&gt;datalines;&lt;BR /&gt;1 12 2 1 1 5 6 7 3 4&lt;BR /&gt;1 12 2 1 2 5 7 9 3 5&lt;BR /&gt;1 12 2 2 2 3 8 5 1 5&lt;BR /&gt;1 12 2 2 2 3 8 7 1 6&lt;BR /&gt;;&lt;BR /&gt;run;&lt;/PRE&gt;
&lt;P&gt;I want to first compare the values in the rows to see if id, pnr, pid and time are equal. If yes then&amp;nbsp; compare var1 to var5 if the are the same.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 06 Aug 2024 13:13:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938353#M368580</guid>
      <dc:creator>Anita_n</dc:creator>
      <dc:date>2024-08-06T13:13:52Z</dc:date>
    </item>
    <item>
      <title>Re: Compare rows in a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938355#M368581</link>
      <description>&lt;P&gt;What would the result of "comparing" multiple observations (data sets have observations, not rows) for 5 variables look like? What information do you expect to see in the result?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;And in your "real" data are all of your Var1 through Var5 variables actually numeric? With numeric variables the Range statistic is useful as a result of 0 means "all non-missing values are the same".&lt;/P&gt;
&lt;P&gt;For example&lt;/P&gt;
&lt;PRE&gt;proc summary data=have nway;
   class id pnr pid time;
   var var1-var5;
   output out=useful (drop=_type_) range=;
run;&lt;/PRE&gt;
&lt;P&gt;pr id=1 pnr=2 pid=2 and time=1 then the range of Var1=0 and Var4=0 meaning the same for all the observations with those values of your grouping variable.&lt;/P&gt;
&lt;P&gt;For time=2 then var1, var2 and var4 have range of 0 meaning the same.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;IF you might have missing values for Var1 - Var5 you would want to look at the NMISS statistic as well as missing values are excluded from range calculations.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 06 Aug 2024 13:31:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938355#M368581</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2024-08-06T13:31:40Z</dc:date>
    </item>
    <item>
      <title>Re: Compare rows in a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938359#M368582</link>
      <description>&lt;P&gt;Sorry, var1 to var5 are not numeric&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
 input id $3 pnr pid time cn var1 $ var2 $ var3 $ var4 $ var5 $ ;
datalines;
1   12  2  1 1 5 6 7 3 4
1   12  2  1 2 5 7 9 3 5
1   12  2  2 2 3 8 5 1 5
1   12  2  2 2 3 8 7 1 6
;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;and the results should look like this&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
 input id $3 pnr pid time cn var1 $ flag1 var2 $ flag2 var3 $ flag3  var4 $ flag4 var5 $ flag5 ;
datalines;
1   12  2  1 1 5 1 6 0 7 0 3 1 4 0
1   12  2  1 2 5 1 7 0 9 0 3 1 5 0
1   12  2  2 2 3 1 8 1 5 0 1 1 5 0
1   12  2  2 2 3 1 8 1 7 0 1 1 6 0
;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;It's just that two different people entered this data at different times. And the values they entered should be same. So, the idea behind it is to check if the values entered by the two different people are the same, if not then the data should be corrected. If values have not been entered at time point(time) then this should be done. Time is either 1 or 2 , cn is the control which is also 1 or 2. So at the end I will like to print all id where error occured during entry. And also all ids with missing entries. That is why I wanted to use the flags (1=same, 0=not the same)&lt;/P&gt;</description>
      <pubDate>Tue, 06 Aug 2024 13:59:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938359#M368582</guid>
      <dc:creator>Anita_n</dc:creator>
      <dc:date>2024-08-06T13:59:14Z</dc:date>
    </item>
    <item>
      <title>Re: Compare rows in a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938366#M368583</link>
      <description>&lt;P&gt;Since you have a requirement " all ids with missing entries" then your example data should include at least one example of what "missing entry" means to you, point it or them out so we can see them and indicate what the output looks like for a missing entry.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If your example data included a "who entered the data" variable that might help such as identifying which combination of ID variables were only entered by one person...&lt;/P&gt;</description>
      <pubDate>Tue, 06 Aug 2024 14:28:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938366#M368583</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2024-08-06T14:28:37Z</dc:date>
    </item>
    <item>
      <title>Re: Compare rows in a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938388#M368590</link>
      <description>&lt;P&gt;The easiest way to compare variable values is between datasets.&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If sounded from your description like the data is entered by two different individuals.&amp;nbsp;So if you can split the data into two groups you can use PROC COMPARE to compare them.&amp;nbsp; Does one of those variables represent the person that entered the data?&amp;nbsp; If so then us it to split the data.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Otherwise you could just split between the first and second records for each set of ID variables.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Which of those variables are the ID (or BY) variables that uniquely identify the observations you want to compare?&amp;nbsp; To me it looks like the first 4 variables from ID to TIME.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Tom_0-1722962322550.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/99057i237F9389D122B2BB/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Tom_0-1722962322550.png" alt="Tom_0-1722962322550.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 06 Aug 2024 16:38:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938388#M368590</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2024-08-06T16:38:49Z</dc:date>
    </item>
    <item>
      <title>Re: Compare rows in a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938399#M368598</link>
      <description>&lt;P&gt;First, get rid of the "3" in the INPUT statement:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;input id $3 &lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Just a dollar sign would be sufficient.&amp;nbsp; The "3" tells SAS to take the contents of column 3, which it always blank.&lt;/P&gt;
&lt;P&gt;Then sort your data:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=have;
   by id pnr pid time cn var1 var2 var3 var4 var5;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Then pick out the observations that need to be investigated:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
   set have;
   by id pnr pid time cn var1 var2 var3 var4 var5;
   if first.var5 and last.var5 then delete;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;It's untested code, but easy enough to test since you have the data.&amp;nbsp; This just picks out the observations, but doesn't tell you where the differences lie.&lt;/P&gt;
&lt;P&gt;Follow-up:&amp;nbsp; right idea but wrong logic&amp;nbsp; will post an update later.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 06 Aug 2024 18:30:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938399#M368598</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2024-08-06T18:30:59Z</dc:date>
    </item>
    <item>
      <title>Re: Compare rows in a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938405#M368599</link>
      <description>Sorry, this did not work  even using the test data</description>
      <pubDate>Tue, 06 Aug 2024 19:12:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938405#M368599</guid>
      <dc:creator>Anita_n</dc:creator>
      <dc:date>2024-08-06T19:12:39Z</dc:date>
    </item>
    <item>
      <title>Re: Compare rows in a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938406#M368600</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/159"&gt;@Tom&lt;/a&gt;&amp;nbsp;I don't want to split the data. The data&amp;nbsp; shouldn't be splitted. Am only suppose to compare what person A and B entered if they are the same or not&lt;/P&gt;</description>
      <pubDate>Tue, 06 Aug 2024 19:15:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938406#M368600</guid>
      <dc:creator>Anita_n</dc:creator>
      <dc:date>2024-08-06T19:15:22Z</dc:date>
    </item>
    <item>
      <title>Re: Compare rows in a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938407#M368601</link>
      <description>&lt;P&gt;Which variable indicates if it was PERSON A or PERSON B?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Note you don't have to make a second copy of the dataset if you can use a WHERE= dataset option to select the appropriate observations.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc compare 
  data=have(where=(person='A'))
  compare=have(where=(person='B'))
;
  id id--time;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 06 Aug 2024 19:20:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938407#M368601</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2024-08-06T19:20:55Z</dc:date>
    </item>
    <item>
      <title>Re: Compare rows in a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938411#M368602</link>
      <description>&lt;P&gt;If you are POSITIVE there are always two observations per BY group then this code will compare the second to the first for all 5 of your variables and then remerge the results back onto both observations.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data diff;
do until (last.time);
  set have;
  by id--time ;
  array x var1-var5 ;
  array y flag1-flag5 ;
  do index=1 to dim(x);
    y[index] = x[index] = lag(x[index]);
  end;
end;
do until (last.time);
  set have;
  by id--time ;
  output;
end;
  drop index;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Tom_0-1722972354762.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/99061i03028C8F199822FC/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Tom_0-1722972354762.png" alt="Tom_0-1722972354762.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 06 Aug 2024 19:26:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938411#M368602</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2024-08-06T19:26:02Z</dc:date>
    </item>
    <item>
      <title>Re: Compare rows in a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938413#M368603</link>
      <description>&lt;P&gt;While I would go with PROC COMPARE, PROC SQL may be another option.&amp;nbsp; For numeric values, you can use the RANGE() function to make sure the values match.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql ;
  select id, pnr, pid, time, cn
        ,var1
        ,range(input(var1,8.))&amp;gt;0 as flag1
        ,var3
        ,range(input(var3,8.))&amp;gt;0 as flag3
  from have 
  group by id, pnr, pid, time
  ;
quit ;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;BR /&gt;You'd need to think about how to handle any missing values.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I don't know if there is an easy way to compare character values the same way.&amp;nbsp; Possibly you could try using the RANK() function if your values are only one character.&amp;nbsp; Maybe someone else has a better idea.&amp;nbsp; &amp;nbsp;A hashing function, perhaps?&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 06 Aug 2024 19:49:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938413#M368603</guid>
      <dc:creator>Quentin</dc:creator>
      <dc:date>2024-08-06T19:49:20Z</dc:date>
    </item>
    <item>
      <title>Re: Compare rows in a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938422#M368606</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/159"&gt;@Tom&lt;/a&gt;&amp;nbsp;This is the error message I get when using this code on the original data&lt;/P&gt;
&lt;P&gt;ERROR: Array subscript out of range at row 411 column 5.&lt;/P&gt;</description>
      <pubDate>Tue, 06 Aug 2024 20:25:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938422#M368606</guid>
      <dc:creator>Anita_n</dc:creator>
      <dc:date>2024-08-06T20:25:11Z</dc:date>
    </item>
    <item>
      <title>Re: Compare rows in a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938424#M368608</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/19879"&gt;@Quentin&lt;/a&gt;&amp;nbsp;Is there any short way to list the variables. I have more than 300 variables to compare. To list each single variable will not be funny&lt;/P&gt;</description>
      <pubDate>Tue, 06 Aug 2024 20:28:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938424#M368608</guid>
      <dc:creator>Anita_n</dc:creator>
      <dc:date>2024-08-06T20:28:18Z</dc:date>
    </item>
    <item>
      <title>Re: Compare rows in a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938446#M368616</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/168930"&gt;@Anita_n&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/159"&gt;@Tom&lt;/a&gt;&amp;nbsp;This is the error message I get when using this code on the original data&lt;/P&gt;
&lt;P&gt;ERROR: Array subscript out of range at row 411 column 5.&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Learn how to read the SAS log.&amp;nbsp; Look at what statement was shown on LINE number 411 of your SAS log.&amp;nbsp; What part of the statement was in the fifth column on that line of code?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you ran the code I provided the only way that the INDEX could be invalid is if you did not define the array of FLAG variables with the same number (or more) of members as the array of original variables.&lt;/P&gt;</description>
      <pubDate>Tue, 06 Aug 2024 23:09:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938446#M368616</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2024-08-06T23:09:12Z</dc:date>
    </item>
    <item>
      <title>Re: Compare rows in a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938495#M368629</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/168930"&gt;@Anita_n&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/19879"&gt;@Quentin&lt;/a&gt;&amp;nbsp;Is there any short way to list the variables. I have more than 300 variables to compare. To list each single variable will not be funny&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Are you familiar with the macro language?&amp;nbsp; That's the approach I use most often for code generation.&amp;nbsp; So I would write a macro which has a parameter for the list of variables, then the macro would return a block of code like:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt; ,range(input(var1,8.))&amp;gt;0 as var1flag
 ,range(input(var2,8.))&amp;gt;0 as var2flag
 ,range(input(var3,8.))&amp;gt;0 as var3flag&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;If you're familiar with the macro language, please try writing a macro like that, and if you have problems, post your macro and people can help.&lt;BR /&gt;&lt;BR /&gt;Or if you're familiar with other code generation approaches (e.g. CALL EXECUTE, or using DATA step to write a .sas file that you %include), those approaches could work as well.&amp;nbsp; There are lots of different ways to generate code in SAS.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;That said, I think probably PROC COMPARE is the better way to go.&amp;nbsp; If I decided to do it with SQL or DATA step, I would probably validate my results with PROC COMPARE, just because I trust PROC COMPARE to do its job correctly.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Did you try PROC COMPARE? What went wrong with that approach?&lt;/P&gt;</description>
      <pubDate>Wed, 07 Aug 2024 11:00:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938495#M368629</guid>
      <dc:creator>Quentin</dc:creator>
      <dc:date>2024-08-07T11:00:52Z</dc:date>
    </item>
    <item>
      <title>Re: Compare rows in a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938536#M368639</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/159"&gt;@Tom&lt;/a&gt;&amp;nbsp;&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/19879"&gt;@Quentin&lt;/a&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/4954"&gt;@Astounding&lt;/a&gt;&amp;nbsp;&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13884"&gt;@ballardw&lt;/a&gt;&amp;nbsp; Thankyou all for your contribution to find a solution to this problem&lt;/P&gt;
&lt;P&gt;After consultation with the customer. Here is a detailed explanation of his intentions with the data. I have adjusted (added the variable that contains the person who entered the data) the data a little.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
input id $ pnr pid time cn editor var1 var2 var3 var4 var5;
datalines;
1 12 2 1 1 1 5 6 7 3 4
1 12 2 1 2 2 5 7 9 3 5
1 12 2 2 2 3 3 8 5 1 5
1 12 2 2 2 4 3 8 7 1 6

15 6 3 1 1 1 3 5 8 3 4
15 6 3 2 2 2 3 6 8 3 5
15 6 3 2 2 3 1 7 5 1 5

25 7 3 1 1 1 3 5 8 3 4
25 7 3 2 2 2 3 6 8 3 5
25 7 3 1 2 3 1 7 5 1 5
25 7 3 1 2 4 3 8 7 1 6

;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;a) Identify the matching pairs of the 1st and 2nd input&amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;- there should always be 2 observations that match in the variables with match in the&amp;nbsp;id, pnr and pid&amp;nbsp;&amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;- these observations differ by the number of the control input in the variable 'control' (1 or 2)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;b) Check all data records to see whether they already have a control entry or not; these two groups should be outputted&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;separately&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; - a group with observations that do not yet have a 2nd input&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;- a group with observations for which there is a 2nd input&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;c) Compare the observations that exist twice with each other, prerequisite as described above with match in id, pnr, pid&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;- then compare whether the var1-var5 match (by rows) or not, &lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; - observations that do not match should be output so that we know what needs to be corrected&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So these are the notes I made. I hope someone can give me a helping hand. Even if it's just the start because am little bit confused of where to start from. Thanks&lt;/P&gt;</description>
      <pubDate>Wed, 07 Aug 2024 15:50:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938536#M368639</guid>
      <dc:creator>Anita_n</dc:creator>
      <dc:date>2024-08-07T15:50:11Z</dc:date>
    </item>
    <item>
      <title>Re: Compare rows in a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938546#M368640</link>
      <description>&lt;P&gt;Editor values of 1,2,3, 4 sure don't look like two people entering data which was part of the prior description.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Please restrict use of "data set" to discussion about SAS data sets. You are apparently talking about rows of values in the text file as a data set and that doesn't make much sense (at least not to me) and confuses where a data set may actually be needed.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You may need to provide a bit more detail about this&lt;/P&gt;
&lt;PRE&gt;b) Check all data records to see whether they already have a control entry or not; these two groups should be outputted                   separately
      - a group with data records that do not yet have a 2nd input
       - a group with data records for which there is a 2nd input&lt;/PRE&gt;
&lt;P&gt;As in which records in your example meet each requirement and it would be nice for consistency sake to provide something like the name of an output data set to hold which "separate" outputs&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 07 Aug 2024 15:01:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938546#M368640</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2024-08-07T15:01:16Z</dc:date>
    </item>
    <item>
      <title>Re: Compare rows in a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938563#M368645</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13884"&gt;@ballardw&lt;/a&gt;&amp;nbsp; Thanks for your remarks:&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000FF"&gt;&lt;SPAN&gt;Q: Editor values of 1,2,3, 4 sure don't look like two people entering data which was part of the prior description.&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;A: I did not want to use 1, 2 for that this doesn't confuse with time and control&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000FF"&gt;&lt;SPAN&gt;Q:&amp;nbsp;Please restrict use of "data set" to discussion about SAS data sets. You are apparently talking about rows of values in the text file as a data set and that doesn't make much sense (at least not to me) and confuses where a data set may actually be needed.&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;A: Sorry I have editted this above&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000FF"&gt;&lt;SPAN&gt;Q:&amp;nbsp;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000FF"&gt;You may need to provide a bit more detail about this.&lt;/FONT&gt;&lt;/P&gt;
&lt;PRE&gt;b) Check all data records to see whether they already have a control entry or not; these two groups should be outputted                   separately
      - a group with data records that do not yet have a 2nd input
       - a group with data records for which there is a 2nd input&lt;/PRE&gt;
&lt;P&gt;&lt;FONT color="#0000FF"&gt;&lt;SPAN&gt;As in which records in your example meet each requirement and it would be nice for consistency sake to provide something like the name of an output data set to hold which "separate" outputs&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;A:&amp;nbsp;&lt;SPAN&gt;&amp;nbsp;Identify the matching pairs of the 1st and 2nd input&amp;nbsp;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;- there should always be 2 observations that match in the variables with match in the&amp;nbsp;id, pnr, pid and time&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;FONT color="#FF0000"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;- The values of id, pnr, pid&amp;nbsp; should always be the same to become a matching pair. In variable "control" there should be a first and second entry to see that two different people entered the data&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;FONT color="#FF0000"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;FONT color="#FF0000"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;- Like in observation one and two&amp;nbsp; id=1 , since&amp;nbsp;id, pnr, pid match they are pairs and there exist control 1 and 2 for the values of var1 to var5 could be checked to if if they are the same or not. If they are the same I output them in a dataset&amp;nbsp; "all_values_match". If they do not match I output them in a data set "values_doesnt_match"&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;FONT color="#FF0000"&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;FONT color="#FF0000"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;-Then I check for the next pair with the same id,&amp;nbsp; pnr, pid but I have only "control"=2 so&amp;nbsp; I know control one is definitely missing.&amp;nbsp;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;FONT color="#FF0000"&gt;&amp;nbsp;I then check if these are duplicates if not I output them in dataset "missing_entry" etc&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#000000"&gt;&lt;SPAN&gt;I hope I could answer your questions&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 07 Aug 2024 19:52:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938563#M368645</guid>
      <dc:creator>Anita_n</dc:creator>
      <dc:date>2024-08-07T19:52:43Z</dc:date>
    </item>
    <item>
      <title>Re: Compare rows in a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938568#M368647</link>
      <description>&lt;P&gt;So trying to pull key information from your post it looks like groups in the data is identified by the values of three variables:&amp;nbsp;&amp;nbsp;id pnr pid&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Each group could appear once or twice. (Are you sure there is never a third or fourth observation in a group?)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So split the data into three files.&amp;nbsp; The singletons.&amp;nbsp; The first and the second.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data single first second;
  set have;
  by id pnr pid;
  if first.pid and last.pid then output single;
  else if first.pid then output first;
  else output second;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;So the dataset SINGLE is the answer to which ones have not had what we used to call second pass data entry.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You can then PROC COMPARE with the other two datasets to see which records had differences.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 07 Aug 2024 16:29:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938568#M368647</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2024-08-07T16:29:02Z</dc:date>
    </item>
    <item>
      <title>Re: Compare rows in a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938583#M368650</link>
      <description>&lt;P&gt;Let's make a simpler example dataset.&amp;nbsp; The data has one row per PatientID.&amp;nbsp; The data for each PatienID was entered by two different people (EntryID):&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
input ptid entryid var1 var2 var3;
datalines;
1 1 2 4 6
1 2 2 5 6
2 1 10 12 14
2 2 10 12 14
3 1 2 4 6
;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;So that is basically your data, but with fewer ID variables.&amp;nbsp; When asking questions, it's helpful to make the HAVE data as simple as possible.&amp;nbsp; Anything that works for one ID variable can work for multiple.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;With that, I would use PROC COMPARE:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc compare base=have(where=(entryid=1)) compare=have(where=(entryid=2)) listobs;
  var var1-var3 ;
  id ptid ;
run ;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;That will show you where are are differences in the value of var1-var3.&amp;nbsp; It will also show you when data for a patient was only entered once.&amp;nbsp; If data for a patient was entered 3 times, you'll get a warning in your log about duplicate data. It will work for numeric data and character data.&amp;nbsp; You don't even have to list all of the variables.&amp;nbsp; PROC COMPARE is fabulous!&lt;BR /&gt;&lt;BR /&gt;If you want output data instead of a report, you can explore the options in PROC COMPARE to create output datasets.&lt;/P&gt;</description>
      <pubDate>Wed, 07 Aug 2024 18:04:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Compare-rows-in-a-dataset/m-p/938583#M368650</guid>
      <dc:creator>Quentin</dc:creator>
      <dc:date>2024-08-07T18:04:50Z</dc:date>
    </item>
  </channel>
</rss>

