using proc sql

Reply
Contributor
Posts: 54

using proc sql

Hello all,

I am trying to use proc sql to pull some information from a dataset.

I have 2 datasets. The first data set (called trying) is a list of account numbers, about 250. The 2nd dataset (called c2) is a list containing those 250 account numbers plus another 1 million more. In the 2nd data set there is a column of information I would like to attach to the 250 accounts called "type". My code is as follows.....

proc sql;

create table all_accounts as

select accountnum, type

from sasuser.c2

where accountnum in

(select distinct cacct from work.trying)

quit;

I get the code to do what I want for the most part. However, I need the list of account numbers on output to stay in the order in which they are input. They aren't sorted and need to be reattached to a larger file in the right order after the code is run. Is there a reason why the list "all_accounts" returns in a different but unsorted order than it was input? Thanks for any help/suggestions.

Super User
Posts: 11,343

Re: using proc sql

I suspect the order returned is the order they appear in sasuser.c2.

An alternative approach

proc sql;

     create table all_accounts as

     select b.accountnum, b.type

     from (select distinct cacct from work.trying) as a left join sasuser.c2 as b

          on a.cacct=b.accountnum;

quit;

may keep the order in work.trying if that is what is meant by "order entered".

If work.trying is supposed to be a list of accounts to select does it actually contain duplicates of cacct?

Respected Advisor
Posts: 4,934

Re: using proc sql

I think you would do better with the query :

proc sql;

create table all_accounts as

select t.*, c.type

     from work.trying as t left join sasuser.c2 as c

          on c.accountnum = t.cacct;

quit;

but the only way to guarantee a given order with SQL is to include an ordering variable in your dataset.

PG

PG
Respected Advisor
Posts: 3,156

Re: using proc sql

This may not be relevant, but FWIW, given your scenario (eg. look up a very small proportion of data within a big table), if you do it often, it would really boost your performance if you index the big table(if not changing often) on 'cacct'.

Haikuo

Occasional Contributor
Posts: 9

Re: using proc sql

First off left join is not the proper vehicle. unless there is the chance that there are accounts in trying that are not in c2.

Second, an index will only help if the big table is used many times on account number, and only if the smaller stuff is < 10% of the big file, and the big file has a long record length.

Given what you stated, you only need a simple join.

proc sql;

  create table all_accounts as

  select a.accountnum

           ,b.type

    from work.trying as a

    join  sasuser.c2 as b

      on a.accountnum = b.cacct

  order by a.accountnum

  ;

quit;

Basically SQL are set operations and does not guarantee ordering unless an order by statement is used.

IF the sasuser.c2 file has multiple observations with account numbers, then my code won't work.

In that case you replace "join sasuser.c2 as b" with a subquery "join (select distinct cacct  , type from sasuser.c2) as b"

This is faster since it subsets the big file first, and then performs the join

Using a "where value in subquery" is less efficient in that it creates more operations and more data movement.

Also, this way you can easily add variables from trying and c2 if you want them.

Trusted Advisor
Posts: 3,215

Re: using proc sql

The "proc SQL" is following the ANSI guidelines as much as possible. This is what I have found with hints.

Working with "SAS datastep" and using "proc SQL" is asking some awareness of differences.

  • SQL is designed with no automatic awareness of keeping ordering in tables.
  • Indexes orders etc. should be explicitly given. If missing and required they should be created.
  • SQL is designed to do optimized processing like multi-threading.

      Multi-threading is implemented wit SAS 9.

Running possible many subtasks doing a part of the SQL-job is leading to:

  • running much faster as is possible with a single subtasks
  • losing any hidden order of tables
  • prohibit overwriting/creating  a table that is also used as input.
  • more...?

SQL can gain or suffer a lot of indexes on tables ordering information. It is a job to study on on his own. That is why DBA Database Base Administrator has become a segregated role. 

The SQL improvements given like other have done in repley  is the way to go.

---->-- ja karman --<-----
Ask a Question
Discussion stats
  • 5 replies
  • 326 views
  • 0 likes
  • 6 in conversation