BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
1ashg
Calcite | Level 5

I have a Dataset A and Dataset B and both data sets have only 1 variable, ID.

 

I'm having a hard time trying to do this : (# of rows in Dataset A / # of rows in Dataset B) and then display that answer somehow.

 

I've tried this but it takes way to long because it involves cartesian product join:

 

proc sql;
create table ppv as
select count(a.ID)/count(b.ID) as ppv5
from DatasetA as a, DatasetB as b;
quit;

 

Overall I'm trying to program to calculate ppv after a model is generated.

1 ACCEPTED SOLUTION

Accepted Solutions
stat_sas
Ammonite | Level 13

proc sql;

create table ppv as
select count(ID)/(select count(ID) from DatasetB) as ppv5
from DatasetA;
quit;

View solution in original post

8 REPLIES 8
FreelanceReinh
Jade | Level 19

Although it seems uncommon to me to calculate the PPV this way (cf. http://support.sas.com/kb/24/170.html), you could do it as follows:

data _null_;
if 0 then set DatasetA nobs=na;
if 0 then set DatasetB nobs=nb;
ppv=na/nb;
put ppv=;
stop; /* Only to avoid NOTE: DATA STEP stopped due to looping. */
run;

The result is written to the log.

stat_sas
Ammonite | Level 13

proc sql;

create table ppv as
select count(ID)/(select count(ID) from DatasetB) as ppv5
from DatasetA;
quit;

1ashg
Calcite | Level 5
Thank you for the reference, I'll have to read through that, it may be much more efficient way to get what we need.
FreelanceReinh
Jade | Level 19

You're welcome.

 

Please note that the two suggested solutions are not equivalent. You described your requirement as dividing the row counts of two datasets. This is what my suggested code does. The ppv5 value as per @stat_sas's solution, however, is the quotient of the numbers of rows with non-missing values of ID (see documentation). So, the results are likely to be different if there are one or more missing values in either of the two datasets.

 

If no missing values are involved, the results will be equal, but there is still a remarkable difference:

The data step solution with the SET statements preceded by "if 0" (i.e., a condition which is definitely not met) retrieves the numbers of observations from header information (cf. PROC CONTENTS output) already at compile time. The SET statements are not executed, because the IF conditions are not met. Therefore, the run time of this data step is almost zero, no matter how large the datasets are.

 

In contrast, the PROC SQL step must read through all observations of both datasets, because it has to look into the values of ID in order to check how many of them are missing and hence have to be disregarded in the count (see above). So, the run time of the PROC SQL step can be substantial if the datasets are large.

 

I've tested it with two datasets of about 7 GB each, stored on a fast SSD of a professional workstation. The PROC SQL step took 58.47 seconds, whereas the data step took between 0.00 and 0.02 seconds, >2000 times faster.

 

So, if your intention is to count rows rather than non-missing values of a particular variable, the data step solution could be a "much more efficient way to get what" you need.

Babloo
Rhodochrosite | Level 12

What is "if 0" in following code? Will it execute only when there is zero observations?

 

if 0 then set DatasetA nobs=na;
FreelanceReinh
Jade | Level 19

@Babloo: No, the 0 has nothing to do with the number of observations. It stands for the logical (Boolean) value FALSE. You could replace it by any logical expression which is likewise never true, e.g. "if 4=7 then ..." or "if 1+1=3 then ...". Also, "if . then ..." (with the numeric missing value .) would work the same way, because both 0 and . are FALSE when evaluated as logical expressions. But 0 is simply the most "natural" way to express the Boolean value FALSE in SAS and it is evaluated more easily than a compound expression such as "1+1=3".

 

So, the idea of "if 0 then set ..." is to tell SAS: "Do not execute the SET statement" (in order to save time, because we do not need to read any variable values from the dataset; we are only interested in the number of observations in the dataset and we will get it thanks to the effect of NOBS=... at compile time).

 

It can be regarded as an improved version of

set DatasetA(obs=1) nobs=na;

The above SET statement will be executed (there is no IF condition to prevent that), but only one observation will be read from DatasetA (unnecessarily, though). So, run time will also be very low, but probably slightly longer than if SET was not executed at all. (By the way, "obs=0" wouldn't work for our purpose, because in this case the statements following the SET statement would not be executed.)

 

 

Babloo
Rhodochrosite | Level 12

Thank you for detailed explanation. So we're using "if 0 then set ..." only to find out the total number of observations? To understand completely, any possibilities to replace this if clause ("if 0 then set ...") with some other if clause to achive the same task?

FreelanceReinh
Jade | Level 19

The construct "if 0 then set ..." can be used for other purposes as well. See, for example, this older thread and in particular p. 26, section E, of the paper linked there.

 

As mentioned in my previous post, the zero in "if 0" can be replaced equivalently by any logical expression which (always) evaluates to FALSE. An example which I have seen from time to time is "if 0=1 then set ...". Again, 0=1 is one of the simplest possible false expressions. But "if 0 then set ..." is still shorter, so perhaps more elegant (and programmers are sometimes lazy).

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 4082 views
  • 6 likes
  • 4 in conversation