<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Any possibility that compression cause data loss?! in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Any-possibility-that-compression-cause-data-loss/m-p/800519#M314943</link>
    <description>&lt;P&gt;So you made two datasets out of your original dataset.&lt;/P&gt;
&lt;P&gt;One with the first date and the other with that last date.&lt;/P&gt;
&lt;P&gt;Then you did a data step merge.&lt;/P&gt;
&lt;P&gt;So the value of TIMESEC will be the one from the LAST date dataset since it will be the last value loaded into that variable.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Then you did a join between the THREE datasets.&lt;/P&gt;
&lt;P&gt;This time the value of TIMESEC will come from the original dataset since it is the first time a variable with that name is included in the list of variables the SELECT is listing.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In SQL joins you have to be very careful when you are have multiple variables with the same name.&amp;nbsp; It is best to be very explicit about which version of the variable you want to keep.&amp;nbsp; Do not use * in the list of columns.&amp;nbsp; Use COALESCE() function to control the precedence for the selection.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Try this example where ID is the key variable and DAY and VALUE are two other variables that are common to both datasets.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have first last;
  input id day value ;
  if day=1 then output have first;
  else output have last;
cards;
1 1 10
1 2 20
;

data test1;
  merge first last;
  by id ;
run;

data test2;
  merge last first;
  by id;
run;

proc sql;
create table test3 as select * from first,last where first.id=last.id;
create table test4 as select * from last,first where first.id=last.id;
quit;

data all ;
 length dsn dsn2 $41;
 set test: indsname=dsn2;
 dsn=dsn2;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;PRE&gt;Obs       dsn        id    day    value

 1     WORK.TEST1     1     2       20
 2     WORK.TEST2     1     1       10
 3     WORK.TEST3     1     1       10
 4     WORK.TEST4     1     2       20

&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 07 Mar 2022 01:02:12 GMT</pubDate>
    <dc:creator>Tom</dc:creator>
    <dc:date>2022-03-07T01:02:12Z</dc:date>
    <item>
      <title>Any possibility that compression cause data loss?!</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Any-possibility-that-compression-cause-data-loss/m-p/800463#M314898</link>
      <description>&lt;P&gt;Run merge by two ways. One by dataset merge, the other by SQL. The SAS log/Dataset shows same row count(as expected).&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But when run subset of the merge dataset, the outcomes are diff(one is zero, the other is count as expected)...&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Awkward...&lt;/P&gt;</description>
      <pubDate>Sun, 06 Mar 2022 14:16:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Any-possibility-that-compression-cause-data-loss/m-p/800463#M314898</guid>
      <dc:creator>hellohere</dc:creator>
      <dc:date>2022-03-06T14:16:57Z</dc:date>
    </item>
    <item>
      <title>Re: Any possibility that compression cause data loss?!</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Any-possibility-that-compression-cause-data-loss/m-p/800465#M314900</link>
      <description>&lt;P&gt;Insufficient problem description.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Show the code and corresponding logs.&amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 06 Mar 2022 15:28:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Any-possibility-that-compression-cause-data-loss/m-p/800465#M314900</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2022-03-06T15:28:03Z</dc:date>
    </item>
    <item>
      <title>Re: Any possibility that compression cause data loss?!</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Any-possibility-that-compression-cause-data-loss/m-p/800473#M314908</link>
      <description>&lt;P&gt;You cannot do a MERGE with SQL, only JOINs.&amp;nbsp; They are not exactly the same thing, but are close when you have one to one or one to many matches.&lt;/P&gt;
&lt;P&gt;If you have non-key common variables you need to understand how each method will select which of the two values to use for those variables.&amp;nbsp; In general with a data step merge the last instance of the variable "wins" and with SQL the first instance.&lt;/P&gt;</description>
      <pubDate>Sun, 06 Mar 2022 16:27:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Any-possibility-that-compression-cause-data-loss/m-p/800473#M314908</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2022-03-06T16:27:47Z</dc:date>
    </item>
    <item>
      <title>Re: Any possibility that compression cause data loss?!</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Any-possibility-that-compression-cause-data-loss/m-p/800474#M314909</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/409584"&gt;@hellohere&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Run merge by two ways. One by dataset merge, the other by SQL. The SAS log/Dataset shows same row count(as expected).&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But when run subset of the merge dataset, the outcomes are diff(one is zero, the other is count as expected)...&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Awkward...&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;What is your question?&lt;/P&gt;</description>
      <pubDate>Sun, 06 Mar 2022 16:32:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Any-possibility-that-compression-cause-data-loss/m-p/800474#M314909</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2022-03-06T16:32:59Z</dc:date>
    </item>
    <item>
      <title>Re: Any possibility that compression cause data loss?!</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Any-possibility-that-compression-cause-data-loss/m-p/800514#M314939</link>
      <description>&lt;P&gt;%let ds=tick_out_all;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;proc sort data=&amp;amp;ds.; by tick tradingday timesec; run;quit;&lt;/P&gt;
&lt;P&gt;data &amp;amp;ds._first(keep=tick tradingday timesec lp_ rename=(lp_=lp_first)); set &amp;amp;ds.; by tick tradingday; if first.tradingday;run;quit;&lt;BR /&gt;data &amp;amp;ds._last (keep=tick tradingday timesec lp_ rename=(lp_=lp_last)); set &amp;amp;ds.; by tick tradingday; if last.tradingday;run;quit;&lt;/P&gt;
&lt;P&gt;/*get ret and alike, merge outcome timesec=93100000 is gone, evne SAS log says not!?&lt;BR /&gt;*/&lt;BR /&gt;data &amp;amp;ds._; merge &amp;amp;ds. &amp;amp;ds._first &amp;amp;ds._last;&lt;BR /&gt;by tick tradingday;&lt;BR /&gt;run;quit;&lt;BR /&gt;proc sql;&lt;BR /&gt;create table &amp;amp;ds._2 as&lt;BR /&gt;select a.*, b.lp_first, c.lp_last &lt;BR /&gt;from &amp;amp;ds. as a&lt;BR /&gt;left join &amp;amp;ds._first as b&lt;BR /&gt;on a.tick=b.tick and a.tradingday=b.tradingday&lt;BR /&gt;left join &amp;amp;ds._last as c&lt;BR /&gt;on a.tick=c.tick and a.tradingday=c.tradingday&lt;BR /&gt;order by a.tick, a.tradingday, a.timesec;&lt;BR /&gt;quit;&lt;/P&gt;
&lt;P&gt;proc sort data=&amp;amp;ds.; by tick tradingday timesec; run;quit;&lt;/P&gt;
&lt;P&gt;%let ts=93100000;&lt;BR /&gt;&lt;BR /&gt;data tick_out_all_ts_(keep=); set tick_out_all_(where=(timesec=&amp;amp;ts.)); run;quit;&lt;BR /&gt;data tick_out_all_ts_2(keep=); set tick_out_all_2(where=(timesec=&amp;amp;ts.)); run;quit;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;______________________________________________________________________&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;3816 data &amp;amp;ds._; merge &amp;amp;ds. &amp;amp;ds._first &amp;amp;ds._last;&lt;BR /&gt;3817 by tick tradingday;&lt;BR /&gt;3818 run;&lt;/P&gt;
&lt;P&gt;NOTE: There were 19397784 observations read from the data set WORK.TICK_OUT_ALL.&lt;BR /&gt;NOTE: There were 82194 observations read from the data set WORK.TICK_OUT_ALL_FIRST.&lt;BR /&gt;NOTE: There were 82194 observations read from the data set WORK.TICK_OUT_ALL_LAST.&lt;BR /&gt;NOTE: The data set WORK.TICK_OUT_ALL_ has 19397784 observations and 71 variables.&lt;BR /&gt;NOTE: Compressing data set WORK.TICK_OUT_ALL_ decreased size by 29.16 percent.&lt;BR /&gt;Compressed is 119484 pages; un-compressed would require 168677 pages.&lt;BR /&gt;NOTE: DATA statement used (Total process time):&lt;BR /&gt;real time 1:06.84&lt;BR /&gt;cpu time 1:02.64&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;3855 proc sql;&lt;BR /&gt;3856 create table &amp;amp;ds._2 as&lt;BR /&gt;3857 select a.*, b.lp_first, c.lp_last&lt;BR /&gt;3858 from &amp;amp;ds. as a&lt;BR /&gt;3859 left join &amp;amp;ds._first as b&lt;BR /&gt;3860 on a.tick=b.tick and a.tradingday=b.tradingday&lt;BR /&gt;3861 left join &amp;amp;ds._last as c&lt;BR /&gt;3862 on a.tick=c.tick and a.tradingday=c.tradingday&lt;BR /&gt;3863 order by a.tick, a.tradingday, a.timesec;&lt;BR /&gt;NOTE: Compressing data set WORK.TICK_OUT_ALL_2 decreased size by 29.16 percent.&lt;BR /&gt;Compressed is 119484 pages; un-compressed would require 168677 pages.&lt;BR /&gt;NOTE: Table WORK.TICK_OUT_ALL_2 created, with 19397784 rows and 71 columns.&lt;/P&gt;
&lt;P&gt;3864 quit;&lt;BR /&gt;NOTE: PROCEDURE SQL used (Total process time):&lt;BR /&gt;real time 1:46.56&lt;BR /&gt;cpu time 1:25.90&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;3873&lt;BR /&gt;3874 data tick_out_all_ts_(keep=); set tick_out_all_(where=(timesec=&amp;amp;ts.)); run;&lt;/P&gt;
&lt;P&gt;NOTE: There were 0 observations read from the data set WORK.TICK_OUT_ALL_.&lt;BR /&gt;WHERE timesec=93100000;&lt;BR /&gt;NOTE: The data set WORK.TICK_OUT_ALL_TS_ has 0 observations and 71 variables.&lt;BR /&gt;NOTE: DATA statement used (Total process time):&lt;BR /&gt;real time 30.95 seconds&lt;BR /&gt;cpu time 7.25 seconds&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;3874! quit;&lt;BR /&gt;3875 data tick_out_all_ts_2(keep=); set tick_out_all_2(where=(timesec=&amp;amp;ts.)); run;&lt;/P&gt;
&lt;P&gt;NOTE: There were 82194 observations read from the data set WORK.TICK_OUT_ALL_2.&lt;BR /&gt;WHERE timesec=93100000;&lt;BR /&gt;NOTE: The data set WORK.TICK_OUT_ALL_TS_2 has 82194 observations and 71 variables.&lt;BR /&gt;NOTE: Compressing data set WORK.TICK_OUT_ALL_TS_2 decreased size by 40.70 percent.&lt;BR /&gt;Compressed is 424 pages; un-compressed would require 715 pages.&lt;BR /&gt;NOTE: DATA statement used (Total process time):&lt;BR /&gt;real time 34.50 seconds&lt;BR /&gt;cpu time 11.07 seconds&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;3875! quit;&lt;/P&gt;</description>
      <pubDate>Mon, 07 Mar 2022 00:29:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Any-possibility-that-compression-cause-data-loss/m-p/800514#M314939</guid>
      <dc:creator>hellohere</dc:creator>
      <dc:date>2022-03-07T00:29:25Z</dc:date>
    </item>
    <item>
      <title>Re: Any possibility that compression cause data loss?!</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Any-possibility-that-compression-cause-data-loss/m-p/800515#M314940</link>
      <description>&lt;P&gt;Right click two datasets, both show up the same row count. BUT when simply do subset(same code), one is zero and the other is not. See code and log above.&lt;/P&gt;</description>
      <pubDate>Mon, 07 Mar 2022 00:32:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Any-possibility-that-compression-cause-data-loss/m-p/800515#M314940</guid>
      <dc:creator>hellohere</dc:creator>
      <dc:date>2022-03-07T00:32:28Z</dc:date>
    </item>
    <item>
      <title>Re: Any possibility that compression cause data loss?!</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Any-possibility-that-compression-cause-data-loss/m-p/800519#M314943</link>
      <description>&lt;P&gt;So you made two datasets out of your original dataset.&lt;/P&gt;
&lt;P&gt;One with the first date and the other with that last date.&lt;/P&gt;
&lt;P&gt;Then you did a data step merge.&lt;/P&gt;
&lt;P&gt;So the value of TIMESEC will be the one from the LAST date dataset since it will be the last value loaded into that variable.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Then you did a join between the THREE datasets.&lt;/P&gt;
&lt;P&gt;This time the value of TIMESEC will come from the original dataset since it is the first time a variable with that name is included in the list of variables the SELECT is listing.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In SQL joins you have to be very careful when you are have multiple variables with the same name.&amp;nbsp; It is best to be very explicit about which version of the variable you want to keep.&amp;nbsp; Do not use * in the list of columns.&amp;nbsp; Use COALESCE() function to control the precedence for the selection.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Try this example where ID is the key variable and DAY and VALUE are two other variables that are common to both datasets.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have first last;
  input id day value ;
  if day=1 then output have first;
  else output have last;
cards;
1 1 10
1 2 20
;

data test1;
  merge first last;
  by id ;
run;

data test2;
  merge last first;
  by id;
run;

proc sql;
create table test3 as select * from first,last where first.id=last.id;
create table test4 as select * from last,first where first.id=last.id;
quit;

data all ;
 length dsn dsn2 $41;
 set test: indsname=dsn2;
 dsn=dsn2;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;PRE&gt;Obs       dsn        id    day    value

 1     WORK.TEST1     1     2       20
 2     WORK.TEST2     1     1       10
 3     WORK.TEST3     1     1       10
 4     WORK.TEST4     1     2       20

&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 07 Mar 2022 01:02:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Any-possibility-that-compression-cause-data-loss/m-p/800519#M314943</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2022-03-07T01:02:12Z</dc:date>
    </item>
    <item>
      <title>Re: Any possibility that compression cause data loss?!</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Any-possibility-that-compression-cause-data-loss/m-p/800522#M314946</link>
      <description>&lt;P&gt;Thanks, Pal. The original dataset has 19397784 row. The row counts from both treatments show up identical(&lt;SPAN&gt;19397784, right click).&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;BUT run subset, timesec=93100000, one has zero and the other shows up thousands(as expected).&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Double click the dataset, timesec=93100000 does not show up in dataset-merge treatment(remember the row count is&amp;nbsp;19397784).&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;That is the myth.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 07 Mar 2022 01:21:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Any-possibility-that-compression-cause-data-loss/m-p/800522#M314946</guid>
      <dc:creator>hellohere</dc:creator>
      <dc:date>2022-03-07T01:21:06Z</dc:date>
    </item>
  </channel>
</rss>

