Help using Base SAS procedures

Merge question

Accepted Solution Solved
Reply
Super Contributor
Posts: 1,041
Accepted Solution

Merge question

HI ,

i have a table and need to connect to several other tables one by one to get other relevant information.

i Am not using FORMAT CNTLIN each time because the code would become lengthy.

EAch time ine I am sorting the data set contains information and merging it

is there any easy way???

thanks


Accepted Solutions
Solution
‎06-05-2014 12:03 PM
Trusted Advisor
Posts: 3,215

Re: Merge question

Posted in reply to robertrao

Xia, You must know the context recognizing this that quickly. I am trusting you, you are right as the dwaprv tables are incidcated as dimensions/facttable.

I assume the dwaprv library is in reality the datawarehouse stored in a likely external RDBMS.

There is al lot of coding without any comment. That is the job-protection approach. In my old days it was stated every good program/module was not longer  as 1 page (60 lines)with at least the half of it should be logical understandable comment for explanation. I hope there is a functional description/design somewhere.,

Seeing the code, I have the questions:
- Why not code the where statements as dataset options.  It will not bring the data into the PDV. When it is an external database benefitting by the implicit pass through.

- The first datastep (creating Hosp_acc);

   + is doing some type conversions.  Can those be resolved at the dwaprv side?

   + is doing the selection for dates (obviously noted as numeric, it works as the order is logical sorted.
      This is the random part probably being a choice/selection from menu-s. Are those periods for the data able to be prepared in the dwaprv?

- the format Hospitalaccountid is being created inline. Is that one possible not more stable. Just needing update as some points in time? 

A lot of questions in fact it is the question (aside the code) what was the original request and what was the available data.
No you do not need to share that with us. It is the question you should ask yourself.   

---->-- ja karman --<-----

View solution in original post


All Replies
Super User
Posts: 3,260

Re: Merge question

Posted in reply to robertrao

If your objective is to do the joining all in one step then SQL would seem to be the obvious answer. You can join multiple tables with different sort orders without pre-sorting:

proc sql;

  create table new as

  select *

  from table1, table2, table3

  where table1.key1 = table2.key2

  and table1.key1 = table3.key3;

quit;

Super User
Posts: 7,854

Re: Merge question

Posted in reply to robertrao

But be aware, if you work with large tables, that a single proc SQL with multiple joins will create a very large utility file and produce a lot of I/O overhead. Writing more sort/merge steps saves disk space and execution time. Up to an order of magnitude, in my experience. YMMV.

You see, the tables are not suddenly sorted by magic if you use SQL like this. SAS still needs to do the logical equivalent of a proc sort, and during sql this is done less efficiently. As long as everything fits into memory, this is not an issue, but as soon as the tables outgrow your available memory, the process may slow down to the "drying of paint".

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Trusted Advisor
Posts: 3,215

Re: Merge question

Posted in reply to robertrao

You are indicating Cntlin with formats.

Why not use MLF (Multi Label Formats) with just one pass over the dataset getting all different views. sorted by formats will exclude any need for sorting.

---->-- ja karman --<-----
Super Contributor
Posts: 297

Re: Merge question

Posted in reply to robertrao

I agree with 's stand on SQL joins over multiple tables, however I have seen gains by converting multiple table merges to a single SQL join in the past.  When in doubt I create the code using both methods and test their efficiency.

I don't quite understand your aversion to using PROC FORMAT.  You seem to be concerned about the amount of code you would be writing, but in my opinion you should be more concerned about the efficiency of that code.  I am sure we could come up with an approach using PROC FORMAT without the amount of code becoming an issue for you.

Can you provide some insight into the tables you are merging\joining? 

Perhaps you could put up an example of some of the tables?

Do you have a single large table with many small tables? 

Do you have many large tables? 

How many tables are you looking to merge\join?   

There are several approaches we could suggest to help you, but we need to know more about your specific circumstances.  I use different methods dependent on the data structure, the resources available on the system I am working on and whether it is a one off job or will be scheduled.

Super User
Posts: 7,854

Re: Merge question

Posted in reply to robertrao

Using PROC FORMAT should not make for lengthy code. Just write a macro where you specifiy which variables to use, and necessary parameters like length, and then transform your input data into a valid cntlin file. After that, you only have to write one line to generate the format.

Converting a lookup table into a format is one of the most efficient ways to avoid another long-running join, the other is using a hash object.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Solution
‎06-05-2014 12:03 PM
Trusted Advisor
Posts: 3,215

Re: Merge question

Posted in reply to robertrao

Xia, You must know the context recognizing this that quickly. I am trusting you, you are right as the dwaprv tables are incidcated as dimensions/facttable.

I assume the dwaprv library is in reality the datawarehouse stored in a likely external RDBMS.

There is al lot of coding without any comment. That is the job-protection approach. In my old days it was stated every good program/module was not longer  as 1 page (60 lines)with at least the half of it should be logical understandable comment for explanation. I hope there is a functional description/design somewhere.,

Seeing the code, I have the questions:
- Why not code the where statements as dataset options.  It will not bring the data into the PDV. When it is an external database benefitting by the implicit pass through.

- The first datastep (creating Hosp_acc);

   + is doing some type conversions.  Can those be resolved at the dwaprv side?

   + is doing the selection for dates (obviously noted as numeric, it works as the order is logical sorted.
      This is the random part probably being a choice/selection from menu-s. Are those periods for the data able to be prepared in the dwaprv?

- the format Hospitalaccountid is being created inline. Is that one possible not more stable. Just needing update as some points in time? 

A lot of questions in fact it is the question (aside the code) what was the original request and what was the available data.
No you do not need to share that with us. It is the question you should ask yourself.   

---->-- ja karman --<-----
Occasional Contributor
Posts: 19

Re: Merge question

Posted in reply to robertrao

Hi! Yhis is perhaps trivial, but   "Missing right parenthesis" in

set dwaprv.fact_hospitalaccount(keep=HospitalAccountID PRIM_ENC_CSN_ID LengthOfStay DateAdmitKey


Try Data Step View to speed up part of the calculations.

/ Br Anders

Anders Sköllermo
Ph.D., Swedish Actuary

"Both Age and IQ are 69+"
"Retired, But Not Tired"

Sandgränd 13, S-178 40 Ekerö
email: anders.skollermo@one.se
tele: 00468968419 mobil: 0046735077373
🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 391 views
  • 3 likes
  • 6 in conversation