<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Check data before combining datasets in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Check-data-before-combining-datasets/m-p/485732#M126244</link>
    <description>&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data Mylib.consolidated_data;
       set Mylib.Consolidated_data work.daily_data;
run;

proc sort data Mylib.consolidated_data nodupkey;
by report_date;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;After the data is appended you can sort the data set with nodupkey option which will remove duplicate report_date.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Note: The second occurrence of the same by variable record will be deleted.&amp;nbsp; &amp;nbsp;&lt;/P&gt;
&lt;P&gt;example:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data test;
input id num;
datalines;
1 10
1 20
2 10
3 10
;
run;
proc sort data=test nodupkey;
by id;
run;
/* If you want to keep the highest number */
data test;
input id num;
datalines;
1 10
1 20
2 10
3 10
;
run;
proc sort data=test;
by id descending num;
run;
proc sort data=test nodupkey;
by id;
run;
&lt;/CODE&gt;&lt;/PRE&gt;</description>
    <pubDate>Fri, 10 Aug 2018 12:00:33 GMT</pubDate>
    <dc:creator>SuryaKiran</dc:creator>
    <dc:date>2018-08-10T12:00:33Z</dc:date>
    <item>
      <title>Check data before combining datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Check-data-before-combining-datasets/m-p/485406#M126114</link>
      <description>&lt;P&gt;Hi everyone,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If I have 2 datasets having the same variables and both of them having the variable named "report_date":&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1. Daily_Data (imported into LIB "Work")&lt;/P&gt;&lt;P&gt;2. Consolidated_Data (stored at LIB "Mylib")&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;On daily basic, after importing "Daily_data", I am using the scripts:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data Mylib.consolidated_data;
       set Mylib.Consolidated_data work.daily_data;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;to extend the dataset "Consolidated_data". However, to avoid combining the same source data twice or 3 times&amp;nbsp;unintentionally, can we put a condition some thing like that:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Let say, the dataset "Daily_data" with the variable "Report_date" having one same value. For eg. 09/08/2018. Accordingly, if the variable "Report_date" in dataset "Consolidated_data" contains at least&amp;nbsp;one observation equal to 09/08/2018, the combining step will be skipped.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Thank you,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 09 Aug 2018 11:00:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Check-data-before-combining-datasets/m-p/485406#M126114</guid>
      <dc:creator>Tri_Luong</dc:creator>
      <dc:date>2018-08-09T11:00:21Z</dc:date>
    </item>
    <item>
      <title>Re: Check data before combining datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Check-data-before-combining-datasets/m-p/485421#M126118</link>
      <description>&lt;P&gt;To prevent accidental dupülication of entries, but still allow updates (if new data needs to be corrected), I'd do this:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data _null_;
set work.daily_data;
call symput('exclude',put(report_date,best.));
stop;
run;

data Mylib.consolidated_data;
set
  Mylib.Consolidated_data (
    where=(report_date ne &amp;amp;exclude)
  )
  work.daily_data
;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Note how a little visual formatting makes the set statement easier to grok.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 09 Aug 2018 11:21:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Check-data-before-combining-datasets/m-p/485421#M126118</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2018-08-09T11:21:36Z</dc:date>
    </item>
    <item>
      <title>Re: Check data before combining datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Check-data-before-combining-datasets/m-p/485424#M126119</link>
      <description>&lt;P&gt;Assuming that your constant Report_Date var is always larger than the report_date values in the consolidated data set, you can do something like this&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
   insert into Mylib.consolidated_data as cons
   select * from work.daily_data as daily
   having daily.Report_date &amp;gt; max(cons.Report_date);
quit;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 09 Aug 2018 11:26:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Check-data-before-combining-datasets/m-p/485424#M126119</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2018-08-09T11:26:39Z</dc:date>
    </item>
    <item>
      <title>Re: Check data before combining datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Check-data-before-combining-datasets/m-p/485426#M126121</link>
      <description>&lt;P&gt;First, note that your program will take longer and longer to run over time.&amp;nbsp; Instead of using&amp;nbsp; DATA step to combine the data sets, it would be much more efficient to use PROC APPEND:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;proc append data=daily_data base=Mylib.consolidated_data;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This assumes that the data set structure never changes (same variables, same lengths for each variable, every time).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Since the latest batch of data will appear at the end of the DATA step, you can rely on the final observation in the consolidated data set coming from the last batch of data.&amp;nbsp; To extract the final observation, you could use something like this:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data final;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;set Mylib.consolidated_data nobs=_nobs_ point=_nobs_;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;output;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;stop;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;keep report_date;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;rename report_date = final_previous_date;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Since it reads just 1 observation, it will be speedy.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Then get the earliest date in your daily data set:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;proc summary data=daily_data;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;var report_date;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;output out=daily_summary min=current_minimum_date;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Finally, combine these into a way that runs PROC APPEND only when needed;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data _null_;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;set daily_summary;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;set final;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;if current_minimum_date &amp;gt; final_previous_date then do;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; call execute("proc append data=daily_data base=Mylib.consolidated_data; run;");&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;end;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;</description>
      <pubDate>Thu, 09 Aug 2018 11:35:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Check-data-before-combining-datasets/m-p/485426#M126121</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2018-08-09T11:35:43Z</dc:date>
    </item>
    <item>
      <title>Re: Check data before combining datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Check-data-before-combining-datasets/m-p/485459#M126136</link>
      <description>&lt;P&gt;Check if PROC SORT with NODUPKEY will work for you.&lt;/P&gt;</description>
      <pubDate>Thu, 09 Aug 2018 13:11:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Check-data-before-combining-datasets/m-p/485459#M126136</guid>
      <dc:creator>SuryaKiran</dc:creator>
      <dc:date>2018-08-09T13:11:54Z</dc:date>
    </item>
    <item>
      <title>Re: Check data before combining datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Check-data-before-combining-datasets/m-p/485640#M126191</link>
      <description>&lt;P&gt;Thanks for your instruction but I would like to keep proc import. &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 10 Aug 2018 01:59:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Check-data-before-combining-datasets/m-p/485640#M126191</guid>
      <dc:creator>Tri_Luong</dc:creator>
      <dc:date>2018-08-10T01:59:52Z</dc:date>
    </item>
    <item>
      <title>Re: Check data before combining datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Check-data-before-combining-datasets/m-p/485641#M126192</link>
      <description>&lt;P&gt;Thanks. As I am quite newbie. Would you mind show me the detailed codes?&lt;/P&gt;</description>
      <pubDate>Fri, 10 Aug 2018 02:00:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Check-data-before-combining-datasets/m-p/485641#M126192</guid>
      <dc:creator>Tri_Luong</dc:creator>
      <dc:date>2018-08-10T02:00:49Z</dc:date>
    </item>
    <item>
      <title>Re: Check data before combining datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Check-data-before-combining-datasets/m-p/485642#M126193</link>
      <description>&lt;P&gt;Thank you very much. It was what I am looking for. My scripts have worked.&lt;/P&gt;</description>
      <pubDate>Fri, 10 Aug 2018 02:02:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Check-data-before-combining-datasets/m-p/485642#M126193</guid>
      <dc:creator>Tri_Luong</dc:creator>
      <dc:date>2018-08-10T02:02:34Z</dc:date>
    </item>
    <item>
      <title>Re: Check data before combining datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Check-data-before-combining-datasets/m-p/485732#M126244</link>
      <description>&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data Mylib.consolidated_data;
       set Mylib.Consolidated_data work.daily_data;
run;

proc sort data Mylib.consolidated_data nodupkey;
by report_date;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;After the data is appended you can sort the data set with nodupkey option which will remove duplicate report_date.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Note: The second occurrence of the same by variable record will be deleted.&amp;nbsp; &amp;nbsp;&lt;/P&gt;
&lt;P&gt;example:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data test;
input id num;
datalines;
1 10
1 20
2 10
3 10
;
run;
proc sort data=test nodupkey;
by id;
run;
/* If you want to keep the highest number */
data test;
input id num;
datalines;
1 10
1 20
2 10
3 10
;
run;
proc sort data=test;
by id descending num;
run;
proc sort data=test nodupkey;
by id;
run;
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 10 Aug 2018 12:00:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Check-data-before-combining-datasets/m-p/485732#M126244</guid>
      <dc:creator>SuryaKiran</dc:creator>
      <dc:date>2018-08-10T12:00:33Z</dc:date>
    </item>
  </channel>
</rss>

