turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- PROC SQL Warning

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

12-13-2017 09:11 AM

Hello:

I ran the codes below. I found some warning and note messages shown up. In addition, the PROC SQL is going forever, it never stopped. Please advice how to fix it. Thank you!

**proc** **sql**;

create table MasterP as

select

a.p,

b.p,

c.p

from

class as a,

component as b,

product as c

where

a.p eq **.** or b.p eq **.** or c.p eq **.**;

**quit**;

NOTE: The execution of this query involves performing one or more Cartesian product joins that

can not be optimized.

WARNING: Variable P already exists on file WORK.MASTERP.

WARNING: Variable P already exists on file WORK.MASTERP.

Accepted Solutions

Solution

12-13-2017
01:13 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ybz12003

12-13-2017 09:36 AM

ybz12003 wrote:

Yes. My plan was looking for the missing value in P columns of A, B, and C tables. Thanks.

Then just do this:

```
data want;
set
class
component
product
;
where p = .;
run;
```

You will end up with quite a wide structure, but it can't be more than nobs(class) + nobs(component) + nobs(product) observations. Instead of nobs(class) * nobs(component) * nobs(product)!

---------------------------------------------------------------------------------------------

Maxims of Maximally Efficient SAS Programmers

How to convert datasets to data steps

How to post code

Maxims of Maximally Efficient SAS Programmers

How to convert datasets to data steps

How to post code

All Replies

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ybz12003

12-13-2017 09:16 AM - edited 12-13-2017 09:17 AM

1 - Show a sample of your data for us to debug your program. We can't see your data, so we can't see your problem

2 - Define aliases for commonly named variables (in this case p) like this

select

a.p as a_p,

b.p as b_p,

c.p as c_p

from

3 - You are most likely missing some condition in your where statement, since as of not you are creating at least one Cartesian Product, which is rarely desired.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to draycut

12-13-2017 09:25 AM

Each table contains at least 3000 observation. I can't show it here.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ybz12003

12-13-2017 09:26 AM

Not asking you to post 9000 lines of data, just a simple outline of the structure of you data, preferibly in the form of data steps with a few observations, so it is easy to work with

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ybz12003

12-13-2017 09:32 AM

ybz12003 wrote:

Each table contains at least 3000 observation. I can't show it here.

Then you could easily end up with 27.000.000.000 observations in your resulting dataset. Just run your SQL on subsets with 100 observations each, and you'll see what happens.

If you just post those subsets as datasteps here, and an example for the expected result, we can easily show you the code that can do it.

---------------------------------------------------------------------------------------------

Maxims of Maximally Efficient SAS Programmers

How to convert datasets to data steps

How to post code

Maxims of Maximally Efficient SAS Programmers

How to convert datasets to data steps

How to post code

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ybz12003

12-13-2017 09:18 AM

If you're going to use SQL, you will need to learn about what a Cartesian product is.

In this case, let's say that each source (A, B, and C) contains 1,000 observations with a missing value for P. Your request is asking for a table with a billion observations in it. No wonder it is taking a while. You would be much better served to use a DATA step:

data missings;

set class component product;

where p=.;

run;

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Astounding

12-13-2017 09:23 AM

Hi Astounding:

I did have a data step to do this inquire. I would like to use Proc SQL so that I could confirm the one I created was right and save the merge step. Thanks.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ybz12003

12-13-2017 09:25 AM

What does your data look like?

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ybz12003

12-13-2017 09:27 AM

In that case, take a look at UNION. It also has its tricky aspects, but is probably the right tool for the job for QC/comparison purposes. It's just barely beyond my SQL knowledge, but I'm sure somebody can post a reasonably short example.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ybz12003

12-13-2017 09:29 AM

First of all, you need to give the variables new names if you want to see them in the output:

```
select
a.p as p_a,
b.p as p_b,
c.p as p_c
```

That would take care of the NOTEs

What happens then is: The first obs in class where p is missing is joined with ALL obs (regardless of values) from the other two datasets. The same for the next obs that has p missing, and so on. The same also happens for all observations with missing p in the other two datasets, so you get a *real lot* of output observations. Bottom line: be careful with your where conditions, especially when no join on a key variable is defined.

---------------------------------------------------------------------------------------------

Maxims of Maximally Efficient SAS Programmers

How to convert datasets to data steps

How to post code

Maxims of Maximally Efficient SAS Programmers

How to convert datasets to data steps

How to post code

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to KurtBremser

12-13-2017 09:46 AM

Hi KurtBremser:

The way you are doing will end up create three new columns p_a, p_b and p_c. What about if I would like to generate all of them in single column? Also, I have data step to do this. I just want to try the PROC SQL way to confirm my previous work.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ybz12003

12-13-2017 09:52 AM

It would be *mighty* helpful if you posted that data step code.

This:

```
proc sql;
create table MasterP as
select p from class
where p = .
union all
select p from component
where p = .
union all
select p from product
where p = .
;
quit;
```

will give you all the missing values stacked on top of each other. The number of observations will be the sum (not the product) of the missing's in the datasets.

Maxims of Maximally Efficient SAS Programmers

How to convert datasets to data steps

How to post code

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ybz12003

12-13-2017 09:32 AM

Yes. My plan was looking for the missing value in P columns of A, B, and C tables. Thanks.

Solution

12-13-2017
01:13 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ybz12003

12-13-2017 09:36 AM

ybz12003 wrote:

Yes. My plan was looking for the missing value in P columns of A, B, and C tables. Thanks.

Then just do this:

```
data want;
set
class
component
product
;
where p = .;
run;
```

You will end up with quite a wide structure, but it can't be more than nobs(class) + nobs(component) + nobs(product) observations. Instead of nobs(class) * nobs(component) * nobs(product)!

Maxims of Maximally Efficient SAS Programmers

How to convert datasets to data steps

How to post code

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to KurtBremser

12-13-2017 01:14 PM

Thanks, it works.