<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Eliminate redunant information across 3 variables in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Eliminate-redunant-information-across-3-variables/m-p/527481#M143812</link>
    <description>&lt;P&gt;I need all of the variables in the example records.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Obs 1 and 3 combined provides the values of Obs 2 in these particular examples.&amp;nbsp; But the redundant records are not always Obs 2.&amp;nbsp; Also, there may be any number of records for a particular kid on a particular date.&lt;/P&gt;</description>
    <pubDate>Tue, 15 Jan 2019 20:02:16 GMT</pubDate>
    <dc:creator>Wolverine</dc:creator>
    <dc:date>2019-01-15T20:02:16Z</dc:date>
    <item>
      <title>Eliminate redunant information across 3 variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Eliminate-redunant-information-across-3-variables/m-p/527467#M143802</link>
      <description>&lt;P&gt;I'm working with medical claims data, and I'm trying to create a file that will show the medical histories for a select group of kids.&amp;nbsp; For each claim date for each kid, I need a list of diagnosis, procedure, and revenue codes.&amp;nbsp; But many of the records do not include any new information.&amp;nbsp; In each of the 2 examples below, the middle line contains information that is already present in the other lines.&amp;nbsp; So the middle line is redundant and could be deleted.&lt;/P&gt;
&lt;TABLE width="608"&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD width="116"&gt;encrypted_id&lt;/TD&gt;
&lt;TD width="132"&gt;DATE_OF_BIRTH&lt;/TD&gt;
&lt;TD width="72"&gt;GENDER&lt;/TD&gt;
&lt;TD width="72"&gt;CLAIM_SERVICE_FROM_DATE&lt;/TD&gt;
&lt;TD width="72"&gt;DIAGNOSIS_CODE_t&lt;/TD&gt;
&lt;TD width="72"&gt;PRCDR_CODE&lt;/TD&gt;
&lt;TD width="72"&gt;REVENUE_CODE&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD width="116"&gt;02D957E3EDF5&lt;/TD&gt;
&lt;TD width="132"&gt;23-Aug-05&lt;/TD&gt;
&lt;TD width="72"&gt;F&lt;/TD&gt;
&lt;TD width="72"&gt;26-Apr-16&lt;/TD&gt;
&lt;TD width="72"&gt;H52223&lt;/TD&gt;
&lt;TD width="72"&gt;V2715&lt;/TD&gt;
&lt;TD width="72"&gt;0611&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD width="116"&gt;02D957E3EDF5&lt;/TD&gt;
&lt;TD width="132"&gt;23-Aug-05&lt;/TD&gt;
&lt;TD width="72"&gt;F&lt;/TD&gt;
&lt;TD width="72"&gt;26-Apr-16&lt;/TD&gt;
&lt;TD width="72"&gt;H5203&lt;/TD&gt;
&lt;TD width="72"&gt;V2715&lt;/TD&gt;
&lt;TD width="72"&gt;&amp;nbsp;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD width="116"&gt;02D957E3EDF5&lt;/TD&gt;
&lt;TD width="132"&gt;23-Aug-05&lt;/TD&gt;
&lt;TD width="72"&gt;F&lt;/TD&gt;
&lt;TD width="72"&gt;26-Apr-16&lt;/TD&gt;
&lt;TD width="72"&gt;H5203&lt;/TD&gt;
&lt;TD width="72"&gt;V2020&lt;/TD&gt;
&lt;TD width="72"&gt;&amp;nbsp;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD width="116"&gt;encrypted_id&lt;/TD&gt;
&lt;TD width="132"&gt;DATE_OF_BIRTH&lt;/TD&gt;
&lt;TD width="72"&gt;GENDER&lt;/TD&gt;
&lt;TD width="72"&gt;CLAIM_SERVICE_FROM_DATE&lt;/TD&gt;
&lt;TD width="72"&gt;DIAGNOSIS_CODE_t&lt;/TD&gt;
&lt;TD width="72"&gt;PRCDR_CODE&lt;/TD&gt;
&lt;TD width="72"&gt;REVENUE_CODE&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD width="116"&gt;14A17629B53944&lt;/TD&gt;
&lt;TD width="132"&gt;31-Jul-07&lt;/TD&gt;
&lt;TD width="72"&gt;F&lt;/TD&gt;
&lt;TD width="72"&gt;20-Apr-17&lt;/TD&gt;
&lt;TD width="72"&gt;R480&lt;/TD&gt;
&lt;TD width="72"&gt;99174&lt;/TD&gt;
&lt;TD&gt;0610&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD width="116"&gt;14A17629B53944&lt;/TD&gt;
&lt;TD width="132"&gt;31-Jul-07&lt;/TD&gt;
&lt;TD width="72"&gt;F&lt;/TD&gt;
&lt;TD width="72"&gt;20-Apr-17&lt;/TD&gt;
&lt;TD width="72"&gt;Z6852&lt;/TD&gt;
&lt;TD width="72"&gt;99174&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD width="116"&gt;14A17629B53944&lt;/TD&gt;
&lt;TD width="132"&gt;31-Jul-07&lt;/TD&gt;
&lt;TD width="72"&gt;F&lt;/TD&gt;
&lt;TD width="72"&gt;20-Apr-17&lt;/TD&gt;
&lt;TD width="72"&gt;Z6852&lt;/TD&gt;
&lt;TD width="72"&gt;99214&lt;/TD&gt;
&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Based on an approach I found in another post, I tried the code below.&amp;nbsp; But for some reason, the rev_new variable contained diagnosis codes and DX_new contained rev codes.&amp;nbsp; And there was still redundant information in the output file.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;DATA temp.numer_ID_codesort; SET temp.numer_ID_list_dd;

LENGTH DX_new proc_new rev_new $10;

DX_new = DIAGNOSIS_CODE_t;
proc_new = PRCDR_CODE;
rev_new = REVENUE_CODE;

CALL SORTC(rev_new,proc_new,DX_new);
RUN;

PROC SORT DATA = temp.numer_ID_codesort OUT = temp.numer_ID_final nodupkey;
BY DX_new proc_new rev_new;
RUN;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 15 Jan 2019 19:14:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Eliminate-redunant-information-across-3-variables/m-p/527467#M143802</guid>
      <dc:creator>Wolverine</dc:creator>
      <dc:date>2019-01-15T19:14:40Z</dc:date>
    </item>
    <item>
      <title>Re: Eliminate redunant information across 3 variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Eliminate-redunant-information-across-3-variables/m-p/527471#M143805</link>
      <description>&lt;P&gt;So, the only variables of interest are the three last variables? And the middle observation is redundant because obs 1 and 3 combined provides the variable values of the second observation, correct?&lt;/P&gt;</description>
      <pubDate>Tue, 15 Jan 2019 19:33:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Eliminate-redunant-information-across-3-variables/m-p/527471#M143805</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2019-01-15T19:33:55Z</dc:date>
    </item>
    <item>
      <title>Re: Eliminate redunant information across 3 variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Eliminate-redunant-information-across-3-variables/m-p/527481#M143812</link>
      <description>&lt;P&gt;I need all of the variables in the example records.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Obs 1 and 3 combined provides the values of Obs 2 in these particular examples.&amp;nbsp; But the redundant records are not always Obs 2.&amp;nbsp; Also, there may be any number of records for a particular kid on a particular date.&lt;/P&gt;</description>
      <pubDate>Tue, 15 Jan 2019 20:02:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Eliminate-redunant-information-across-3-variables/m-p/527481#M143812</guid>
      <dc:creator>Wolverine</dc:creator>
      <dc:date>2019-01-15T20:02:16Z</dc:date>
    </item>
    <item>
      <title>Re: Eliminate redunant information across 3 variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Eliminate-redunant-information-across-3-variables/m-p/527691#M143917</link>
      <description>&lt;P&gt;Conceptually, one potential solution would be to have SAS group all records that have the same ID and claim date.&amp;nbsp; Then, within each of those groups, have SAS look at the DX codes and assign a flag variable.&amp;nbsp; If a given code appears only once, it would be flagged as 0.&amp;nbsp; Other codes appearing more than once would be flagged as 1.&amp;nbsp; Repeat this process for proc and rev codes.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Then an IF statement could use those flags to delete records that don't contain any new information:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;IF DX_flag &amp;gt; 0 AND Proc_flag &amp;gt; 0 AND Rev_flag &amp;gt; 0 THEN delete;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So that's the idea... can anyone help me figure out how to actually program that?&lt;/P&gt;</description>
      <pubDate>Wed, 16 Jan 2019 12:42:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Eliminate-redunant-information-across-3-variables/m-p/527691#M143917</guid>
      <dc:creator>Wolverine</dc:creator>
      <dc:date>2019-01-16T12:42:00Z</dc:date>
    </item>
    <item>
      <title>Re: Eliminate redunant information across 3 variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Eliminate-redunant-information-across-3-variables/m-p/527731#M143937</link>
      <description>&lt;P&gt;You want to remove any observation that is a subset of the union of all the other observations.&amp;nbsp; So which one do you remove if you data are:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; A&amp;nbsp;&amp;nbsp;&amp;nbsp; B&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp; C&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;1&amp;nbsp;&amp;nbsp; 11&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;.&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp; .&amp;nbsp;&amp;nbsp;&amp;nbsp; 111&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; .&amp;nbsp;&amp;nbsp; 11&amp;nbsp;&amp;nbsp;&amp;nbsp; 111&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Any of these three observations could be removed.&amp;nbsp; Which one does your rule require?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;And also note this has to be a recursive process.&amp;nbsp; Because if you find two observations that qualify for deletion, once you remove the first one,&amp;nbsp;you have to recheck that the other can still be removed.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Perhaps what you should do instead is to &lt;EM&gt;&lt;STRONG&gt;consolidate&lt;/STRONG&gt;&lt;/EM&gt; records = i.e. synthetically create a minimal set of records that contain the entire collection of observed values.&amp;nbsp; For instance, from the above 3 records, instead of deleting one of them (leaving 2 records), make a single new record with&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; A&amp;nbsp;&amp;nbsp;&amp;nbsp; B&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp; C&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;1&amp;nbsp;&amp;nbsp; 11&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;111&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;even though this record doesn't exist in your original data.&amp;nbsp; And of course the same principle can be applied if some variables have multiple values.&lt;/P&gt;</description>
      <pubDate>Wed, 16 Jan 2019 15:13:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Eliminate-redunant-information-across-3-variables/m-p/527731#M143937</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2019-01-16T15:13:21Z</dc:date>
    </item>
    <item>
      <title>Re: Eliminate redunant information across 3 variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Eliminate-redunant-information-across-3-variables/m-p/527744#M143947</link>
      <description>&lt;P&gt;I see your point.&amp;nbsp; If I apply my rule to your example, it won't delete any records.&amp;nbsp; I certainly like the idea of consolidating your example to a single line... now how do I actually program that?&lt;/P&gt;</description>
      <pubDate>Wed, 16 Jan 2019 15:37:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Eliminate-redunant-information-across-3-variables/m-p/527744#M143947</guid>
      <dc:creator>Wolverine</dc:creator>
      <dc:date>2019-01-16T15:37:16Z</dc:date>
    </item>
    <item>
      <title>Re: Eliminate redunant information across 3 variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Eliminate-redunant-information-across-3-variables/m-p/527755#M143954</link>
      <description>&lt;P&gt;In your sample data, it won't actually be a single line, since your first set has 2 values for diagnosis, 2 for prcdr, and 1&amp;nbsp;for revenue.&amp;nbsp; So it should have 2 lines.&amp;nbsp; The first line would have one of the diagnosis codes, one of the prcdr codes, and the only value for revenue code.&amp;nbsp; The second would have the alternative value for diagnosis code, the alternative for prcdr code, and a missing value for revenue.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc freq data=have noprint;
  tables encrypted_id*diagnosis_code_t / out=freqs1 (drop=percent count where=(diagnosis_code_t^=.));
  tables encrypted_id*prcdr_code       / out=freqs2 (drop=percent count where=(prcdr_code^=.));
  tables encrypted_id*revenue_code     / out=freqs3 (drop=percent count where=(revenue_code^=.));
run;
data want;
  merge have (drop=diagnosis_code_t prcdr_code revenue_code) 
        freqs1 (in=in1)  freqs2 (in=in2) freqs3 (in=in3);
  by id;
  if max(in1,in2,in3)=1 ;
  output;
  call missing (diagnosis_code_t,prcdr_code,revenue_code);
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;The proc freq tabulates the cross-tabulation of ID by each of the variables of interest and outputs the frequencies to FREQS1, FREQS2, and FREQS3, respectively.&amp;nbsp; Ordinarily the output dataset would have 4 variables: the ID var, the var of interest, the variable COUNT (actually frequency of the ID*var-of-interest), and PERCENT.&amp;nbsp; WE don't need the last 2, so they are dropped, leaving only ID and var-of-interest.
&lt;OL&gt;
&lt;LI&gt;The FREQS files will be ordered by ID and the var-of-interest.&lt;/LI&gt;
&lt;LI&gt;The where-filter eliminates frequencies of missing values.&lt;/LI&gt;
&lt;/OL&gt;
&lt;/LI&gt;
&lt;LI&gt;The DATA WANT step
&lt;OL&gt;
&lt;LI&gt;Use the "match merge", which requires all data sets to be sorted by id.&lt;/LI&gt;
&lt;LI&gt;There may be 3 records in have for a given ID, even though none of the FREQS data sets has more than 2.&amp;nbsp; That's why there is the "if max(in1,in2,ine)=1;"&amp;nbsp; subseting IF statement.&lt;/LI&gt;
&lt;LI&gt;The call missing (after the output statement) assures that none of the variable of interest will have values retained in the next observations.&amp;nbsp; This could otherwise happen if FREQS1 and FREQS2 have&amp;nbsp;2 obs, but FREQS3 has only 1.&lt;/LI&gt;
&lt;/OL&gt;
&lt;/LI&gt;
&lt;LI&gt;The resulting dataset&amp;nbsp; will have each of the variables-of-interest in sorted order, because that is how proc freq would produce the FREQS data sets.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;True, this program requires 2 passes through the data set, but it has a very simple structure.&amp;nbsp; One could do it in one pass, using hash objects, but that would be a good deal more programming.&amp;nbsp; I wouldn't consider the hash object approach unless the data set was very large, and not sorted.&lt;/P&gt;</description>
      <pubDate>Wed, 16 Jan 2019 16:19:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Eliminate-redunant-information-across-3-variables/m-p/527755#M143954</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2019-01-16T16:19:42Z</dc:date>
    </item>
    <item>
      <title>Re: Eliminate redunant information across 3 variables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Eliminate-redunant-information-across-3-variables/m-p/529691#M144767</link>
      <description>Thank u not only for the code, but also for the explanation of what the code does.  This is a great approach that was simple to program and provided exactly what I was looking for!</description>
      <pubDate>Thu, 24 Jan 2019 14:17:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Eliminate-redunant-information-across-3-variables/m-p/529691#M144767</guid>
      <dc:creator>Wolverine</dc:creator>
      <dc:date>2019-01-24T14:17:58Z</dc:date>
    </item>
  </channel>
</rss>

