<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Dividing row count of 2 datasets in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Dividing-row-count-of-2-datasets/m-p/245863#M45922</link>
    <description>&lt;P&gt;The construct "&lt;FONT face="courier new,courier"&gt;if 0 then set&lt;/FONT&gt; ..." can be used for other purposes as well. See, for example, &lt;A href="https://communities.sas.com/t5/Base-SAS-Programming/why-if-0-then-set-still-copy-an-empty-record/td-p/57847" target="_blank"&gt;this older thread&lt;/A&gt;&amp;nbsp;and in particular p. 26, section E, of the paper linked there.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As mentioned in my previous post, the zero in&amp;nbsp;"&lt;FONT face="courier new,courier"&gt;if 0&lt;/FONT&gt;" can be replaced equivalently by&amp;nbsp;any logical expression which (always) evaluates to FALSE. An example which I have seen from time to time is "&lt;FONT face="courier new,courier"&gt;if 0=1 then set&lt;/FONT&gt; ...". Again, 0=1 is one of the simplest possible false expressions. But "&lt;FONT face="courier new,courier"&gt;if 0 then set&lt;/FONT&gt; ..." is still shorter, so perhaps more elegant (and programmers are sometimes lazy).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 25 Jan 2016 12:26:17 GMT</pubDate>
    <dc:creator>FreelanceReinh</dc:creator>
    <dc:date>2016-01-25T12:26:17Z</dc:date>
    <item>
      <title>Dividing row count of 2 datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Dividing-row-count-of-2-datasets/m-p/245632#M45863</link>
      <description>&lt;P&gt;I have a Dataset A and Dataset B and both data sets have only 1 variable, ID.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm having&amp;nbsp;a hard time trying to do this : (# of rows in Dataset A / # of rows in Dataset B) and then display that answer somehow.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I've tried this but it takes way to long because it involves cartesian product join:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;proc sql;&lt;BR /&gt;create table ppv as&lt;BR /&gt;select count(a.ID)/count(b.ID) as ppv5&lt;BR /&gt;from DatasetA as a, DatasetB&amp;nbsp;as b;&lt;BR /&gt;quit;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Overall I'm trying to program to calculate ppv after a model is generated.&lt;/P&gt;</description>
      <pubDate>Sat, 23 Jan 2016 16:17:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Dividing-row-count-of-2-datasets/m-p/245632#M45863</guid>
      <dc:creator>1ashg</dc:creator>
      <dc:date>2016-01-23T16:17:22Z</dc:date>
    </item>
    <item>
      <title>Re: Dividing row count of 2 datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Dividing-row-count-of-2-datasets/m-p/245633#M45864</link>
      <description>&lt;P&gt;Although it seems uncommon to me to calculate the PPV this way (cf. &lt;A href="http://support.sas.com/kb/24/170.html" target="_blank"&gt;http://support.sas.com/kb/24/170.html&lt;/A&gt;), you could do it as follows:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data _null_;
if 0 then set DatasetA nobs=na;
if 0 then set DatasetB nobs=nb;
ppv=na/nb;
put ppv=;
stop; /* Only to avoid NOTE: DATA STEP stopped due to looping. */
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The result is written to the log.&lt;/P&gt;</description>
      <pubDate>Sat, 23 Jan 2016 17:05:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Dividing-row-count-of-2-datasets/m-p/245633#M45864</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2016-01-23T17:05:31Z</dc:date>
    </item>
    <item>
      <title>Re: Dividing row count of 2 datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Dividing-row-count-of-2-datasets/m-p/245637#M45865</link>
      <description>&lt;P&gt;proc sql;&lt;/P&gt;
&lt;P&gt;create table ppv as&lt;BR /&gt;select count(ID)/(select count(ID) from DatasetB) as ppv5&lt;BR /&gt;from DatasetA;&lt;BR /&gt;quit;&lt;/P&gt;</description>
      <pubDate>Sat, 23 Jan 2016 17:26:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Dividing-row-count-of-2-datasets/m-p/245637#M45865</guid>
      <dc:creator>stat_sas</dc:creator>
      <dc:date>2016-01-23T17:26:04Z</dc:date>
    </item>
    <item>
      <title>Re: Dividing row count of 2 datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Dividing-row-count-of-2-datasets/m-p/245641#M45866</link>
      <description>Thank you for the reference, I'll have to read through that, it may be much more efficient way to get what we need.</description>
      <pubDate>Sat, 23 Jan 2016 17:56:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Dividing-row-count-of-2-datasets/m-p/245641#M45866</guid>
      <dc:creator>1ashg</dc:creator>
      <dc:date>2016-01-23T17:56:29Z</dc:date>
    </item>
    <item>
      <title>Re: Dividing row count of 2 datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Dividing-row-count-of-2-datasets/m-p/245649#M45868</link>
      <description>&lt;P&gt;You're welcome.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Please note that the two suggested solutions are not equivalent. You described your&amp;nbsp;requirement&amp;nbsp;as dividing the row counts of two datasets. This is what my suggested code does. The ppv5 value as&amp;nbsp;per&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/42042"&gt;@stat_sas﻿&lt;/a&gt;'s solution, however, is the quotient of the numbers of rows &lt;U&gt;with non-missing values of ID&lt;/U&gt;&lt;EM&gt;&amp;nbsp;&lt;/EM&gt;(see&amp;nbsp;&lt;A href="http://support.sas.com/documentation/cdl/en/sqlproc/69049/HTML/default/viewer.htm#n123fsko39j44pn16zlt087e1m2h.htm" target="_blank"&gt;documentation&lt;/A&gt;). So, the results are likely to be different if there are one or more missing values in either of the two datasets.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If no missing values are involved, the &lt;EM&gt;results&lt;/EM&gt; will be equal, but there is still a remarkable difference:&lt;/P&gt;
&lt;P&gt;The data step solution with the SET statements preceded by "if 0" (i.e., a condition which is definitely not met) retrieves the numbers of observations from &lt;EM&gt;header information&lt;/EM&gt; (cf. PROC CONTENTS output) already &lt;EM&gt;at compile time. &lt;/EM&gt;The SET statements are &lt;EM&gt;not executed&lt;/EM&gt;, because the IF conditions are not met. Therefore, the run time of this data step is almost zero,&amp;nbsp;no matter how large the datasets are.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In contrast, the PROC SQL step &lt;EM&gt;must&lt;/EM&gt; read through all observations of both datasets, because it has to look into the values of ID in order to check how many of them are missing and hence have to be disregarded in the count (see above). So, the run time of the PROC SQL step can be substantial if the datasets are large.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I've tested it with two datasets of about 7 GB each, stored on a fast SSD of a professional workstation. The PROC SQL step took 58.47 seconds, whereas the data step took between 0.00 and 0.02 seconds, &amp;gt;2000 times faster.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So, if your intention is to count rows rather than non-missing values of a particular variable, the data step solution could be a "&lt;SPAN&gt;much more efficient way to get what" you need.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 23 Jan 2016 21:06:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Dividing-row-count-of-2-datasets/m-p/245649#M45868</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2016-01-23T21:06:09Z</dc:date>
    </item>
    <item>
      <title>Re: Dividing row count of 2 datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Dividing-row-count-of-2-datasets/m-p/245815#M45905</link>
      <description>&lt;P&gt;What is "if 0" in following code? Will it execute only when there is zero observations?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;if 0 then set DatasetA nobs=na;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 25 Jan 2016 07:10:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Dividing-row-count-of-2-datasets/m-p/245815#M45905</guid>
      <dc:creator>Babloo</dc:creator>
      <dc:date>2016-01-25T07:10:15Z</dc:date>
    </item>
    <item>
      <title>Re: Dividing row count of 2 datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Dividing-row-count-of-2-datasets/m-p/245855#M45918</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/8409"&gt;@Babloo﻿&lt;/a&gt;: No, the 0 has nothing to do with the number of observations. It&amp;nbsp;stands for the logical (Boolean) value FALSE. You could replace it by any logical expression which is likewise never true, e.g. "&lt;FONT face="courier new,courier"&gt;if 4=7 then&lt;/FONT&gt; ..." or "&lt;FONT face="courier new,courier"&gt;if 1+1=3 then&lt;/FONT&gt; ...". Also, "&lt;FONT face="courier new,courier"&gt;if . then&lt;/FONT&gt; ..." (with the numeric missing value &lt;FONT face="courier new,courier"&gt;.&lt;/FONT&gt;) would work the same way, because both 0 and . are FALSE when evaluated as logical expressions. But 0 is simply the most "natural" way to express the Boolean value FALSE in SAS and it is evaluated more easily than a compound expression such as "1+1=3".&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So, the idea of "&lt;FONT face="courier new,courier"&gt;if 0 then set&lt;/FONT&gt; ..." is to tell SAS: "Do not execute the SET statement" (in order to save time, because we do not need to read any variable values from the dataset; we are only interested in the number of observations in the dataset and we will get it thanks to the effect of NOBS=... at &lt;EM&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;compile&lt;/FONT&gt;&lt;/EM&gt; time).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It can be regarded as an improved version of&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;set DatasetA(obs=1) nobs=na;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The above SET statement will be executed (there is no IF condition to prevent that), but only one observation will be read from DatasetA (unnecessarily, though). So, run time will also be very low, but probably slightly longer than if SET was not executed at all. (By the way, "obs=0" wouldn't work for our purpose, because in this case the statements following the SET statement would not be executed.)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 25 Jan 2016 11:45:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Dividing-row-count-of-2-datasets/m-p/245855#M45918</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2016-01-25T11:45:07Z</dc:date>
    </item>
    <item>
      <title>Re: Dividing row count of 2 datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Dividing-row-count-of-2-datasets/m-p/245860#M45921</link>
      <description>&lt;P&gt;Thank you for detailed explanation. So we're using "&lt;FONT face="courier new,courier"&gt;if 0 then set&lt;/FONT&gt; ..."&amp;nbsp;only to find out the total number of observations? To understand completely, any possibilities to replace this if clause ("&lt;FONT face="courier new,courier"&gt;if 0 then set&lt;/FONT&gt; ...")&amp;nbsp;with some other if clause to achive the same task?&lt;/P&gt;</description>
      <pubDate>Mon, 25 Jan 2016 12:06:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Dividing-row-count-of-2-datasets/m-p/245860#M45921</guid>
      <dc:creator>Babloo</dc:creator>
      <dc:date>2016-01-25T12:06:18Z</dc:date>
    </item>
    <item>
      <title>Re: Dividing row count of 2 datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Dividing-row-count-of-2-datasets/m-p/245863#M45922</link>
      <description>&lt;P&gt;The construct "&lt;FONT face="courier new,courier"&gt;if 0 then set&lt;/FONT&gt; ..." can be used for other purposes as well. See, for example, &lt;A href="https://communities.sas.com/t5/Base-SAS-Programming/why-if-0-then-set-still-copy-an-empty-record/td-p/57847" target="_blank"&gt;this older thread&lt;/A&gt;&amp;nbsp;and in particular p. 26, section E, of the paper linked there.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As mentioned in my previous post, the zero in&amp;nbsp;"&lt;FONT face="courier new,courier"&gt;if 0&lt;/FONT&gt;" can be replaced equivalently by&amp;nbsp;any logical expression which (always) evaluates to FALSE. An example which I have seen from time to time is "&lt;FONT face="courier new,courier"&gt;if 0=1 then set&lt;/FONT&gt; ...". Again, 0=1 is one of the simplest possible false expressions. But "&lt;FONT face="courier new,courier"&gt;if 0 then set&lt;/FONT&gt; ..." is still shorter, so perhaps more elegant (and programmers are sometimes lazy).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 25 Jan 2016 12:26:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Dividing-row-count-of-2-datasets/m-p/245863#M45922</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2016-01-25T12:26:17Z</dc:date>
    </item>
  </channel>
</rss>

