BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
robertrao
Quartz | Level 8

HI ,

i have a table and need to connect to several other tables one by one to get other relevant information.

i Am not using FORMAT CNTLIN each time because the code would become lengthy.

EAch time ine I am sorting the data set contains information and merging it

is there any easy way???

thanks

1 ACCEPTED SOLUTION

Accepted Solutions
jakarman
Barite | Level 11

Xia, You must know the context recognizing this that quickly. I am trusting you, you are right as the dwaprv tables are incidcated as dimensions/facttable.

I assume the dwaprv library is in reality the datawarehouse stored in a likely external RDBMS.

There is al lot of coding without any comment. That is the job-protection approach. In my old days it was stated every good program/module was not longer  as 1 page (60 lines)with at least the half of it should be logical understandable comment for explanation. I hope there is a functional description/design somewhere.,

Seeing the code, I have the questions:
- Why not code the where statements as dataset options.  It will not bring the data into the PDV. When it is an external database benefitting by the implicit pass through.

- The first datastep (creating Hosp_acc);

   + is doing some type conversions.  Can those be resolved at the dwaprv side?

   + is doing the selection for dates (obviously noted as numeric, it works as the order is logical sorted.
      This is the random part probably being a choice/selection from menu-s. Are those periods for the data able to be prepared in the dwaprv?

- the format Hospitalaccountid is being created inline. Is that one possible not more stable. Just needing update as some points in time? 

A lot of questions in fact it is the question (aside the code) what was the original request and what was the available data.
No you do not need to share that with us. It is the question you should ask yourself.   

---->-- ja karman --<-----

View solution in original post

7 REPLIES 7
SASKiwi
PROC Star

If your objective is to do the joining all in one step then SQL would seem to be the obvious answer. You can join multiple tables with different sort orders without pre-sorting:

proc sql;

  create table new as

  select *

  from table1, table2, table3

  where table1.key1 = table2.key2

  and table1.key1 = table3.key3;

quit;

Kurt_Bremser
Super User

But be aware, if you work with large tables, that a single proc SQL with multiple joins will create a very large utility file and produce a lot of I/O overhead. Writing more sort/merge steps saves disk space and execution time. Up to an order of magnitude, in my experience. YMMV.

You see, the tables are not suddenly sorted by magic if you use SQL like this. SAS still needs to do the logical equivalent of a proc sort, and during sql this is done less efficiently. As long as everything fits into memory, this is not an issue, but as soon as the tables outgrow your available memory, the process may slow down to the "drying of paint".

jakarman
Barite | Level 11

You are indicating Cntlin with formats.

Why not use MLF (Multi Label Formats) with just one pass over the dataset getting all different views. sorted by formats will exclude any need for sorting.

---->-- ja karman --<-----
Scott_Mitchell
Quartz | Level 8

I agree with 's stand on SQL joins over multiple tables, however I have seen gains by converting multiple table merges to a single SQL join in the past.  When in doubt I create the code using both methods and test their efficiency.

I don't quite understand your aversion to using PROC FORMAT.  You seem to be concerned about the amount of code you would be writing, but in my opinion you should be more concerned about the efficiency of that code.  I am sure we could come up with an approach using PROC FORMAT without the amount of code becoming an issue for you.

Can you provide some insight into the tables you are merging\joining? 

Perhaps you could put up an example of some of the tables?

Do you have a single large table with many small tables? 

Do you have many large tables? 

How many tables are you looking to merge\join?   

There are several approaches we could suggest to help you, but we need to know more about your specific circumstances.  I use different methods dependent on the data structure, the resources available on the system I am working on and whether it is a one off job or will be scheduled.

Kurt_Bremser
Super User

Using PROC FORMAT should not make for lengthy code. Just write a macro where you specifiy which variables to use, and necessary parameters like length, and then transform your input data into a valid cntlin file. After that, you only have to write one line to generate the format.

Converting a lookup table into a format is one of the most efficient ways to avoid another long-running join, the other is using a hash object.

jakarman
Barite | Level 11

Xia, You must know the context recognizing this that quickly. I am trusting you, you are right as the dwaprv tables are incidcated as dimensions/facttable.

I assume the dwaprv library is in reality the datawarehouse stored in a likely external RDBMS.

There is al lot of coding without any comment. That is the job-protection approach. In my old days it was stated every good program/module was not longer  as 1 page (60 lines)with at least the half of it should be logical understandable comment for explanation. I hope there is a functional description/design somewhere.,

Seeing the code, I have the questions:
- Why not code the where statements as dataset options.  It will not bring the data into the PDV. When it is an external database benefitting by the implicit pass through.

- The first datastep (creating Hosp_acc);

   + is doing some type conversions.  Can those be resolved at the dwaprv side?

   + is doing the selection for dates (obviously noted as numeric, it works as the order is logical sorted.
      This is the random part probably being a choice/selection from menu-s. Are those periods for the data able to be prepared in the dwaprv?

- the format Hospitalaccountid is being created inline. Is that one possible not more stable. Just needing update as some points in time? 

A lot of questions in fact it is the question (aside the code) what was the original request and what was the available data.
No you do not need to share that with us. It is the question you should ask yourself.   

---->-- ja karman --<-----
AndersS
Pyrite | Level 9

Hi! Yhis is perhaps trivial, but   "Missing right parenthesis" in

set dwaprv.fact_hospitalaccount(keep=HospitalAccountID PRIM_ENC_CSN_ID LengthOfStay DateAdmitKey


Try Data Step View to speed up part of the calculations.

/ Br Anders

Anders Sköllermo (Skollermo in English)

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1207 views
  • 3 likes
  • 6 in conversation