Desktop productivity for business analysts and programmers

More new user questions...join tables

Reply
Occasional Contributor
Posts: 11

More new user questions...join tables

Greetings!  I'm using SAS EG 5.1 with SAS 9.3.  I've run into an issue and an error message/problem and I can't figure out a way to fix them.  First, I'm trying to join 10 tables with a common variable across in each table.  I've already taken out the variables I don't need, so I need all the remaining information from each table...and I've used a full outer join for the first two tables.  I thought I should use outer joins for the remaining tables, but that doesn't seem to work.  Do I have to add and join these one at a time, or can they all be added in one step?  Since I'm keeping the information in all the tables, am I correct to choose a full outer join?

Second, several small (3-4 table) "joins" have worked, so I know I've at least go the right idea.  There are 3 relatively large files, however, that always result in an error message when I try to join them to any other files.  The error message is:  Error: insufficient space in file WORK.FILE.DATA.  Error: File WORK.FILE.DATA is damaged.  I/O processing did not complete.    Here's the hot fix explanation that SAS help sent:

In SAS® OpRisk VaR, an internal coding problem can cause a utility file to grow very large when you use a large missing-range table. The larger the missing-range table is, the exponentially larger is the utility file. As a result, disk-space problems can occur, generating errors similar to the following:

ERROR: Insufficient space in file WORK._TMP_LOSSES_M.DATA. ERROR: File WORK._TMP_LOSSES_M.DATA is damaged. I/O processing did not complete. ERROR: File WORK._TMP_LOSSES_M.DATA does not exist. ERROR 388-185: Expecting an arithmetic operator. ERROR 202-322: The option or parameter is not recognized and will be ignored.

When I try to install it, however, my system says it is already up to date.

Thanks in advance for any help you can provide.

Charles

Super User
Posts: 19,064

Re: More new user questions...join tables

Only you really know what type of join you need.

One way is to think about the records in each table, not at the variable level, more at the row level.

So if table 1 has a sublist of ID's and Table 2 has all the ID's, I would want a right or left join.

If I want all ID's, even those not in table 1 then I want a full join.


Here's a diagram to help explain:

http://www.google.com/imgres?imgurl=http://1.bp.blogspot.com/-_PHkf1f9Vpk/UHGgfNrLxEI/AAAAAAAAAUk/NT...

With respect to things producing error. Generally, you can join as many tables as you want, but if you do it one at a time you can figure out where the error is.

Occasional Contributor
Posts: 11

Re: More new user questions...join tables

Thanks for the cheat sheet.  I think I would describe it as wanting all information from all ID's even if it is an empty value...for example, I'd want all the variables associated with health care use, including the ID and blank cells of those who had no use.

Regarding things producing errors, I tried joining the tables one at a time...and still get the same error message when I join any one of the 3 larger data sets.  I know which data sets invoke the error, just not sure how to fix it since the "hot fix" doesn't seem to be applicable.

Super Contributor
Posts: 418

Re: More new user questions...join tables

The only time you would need full outer joins is if you have key rows in Table A that do not exist in Table B, AND you have key variables in Table B that do not exist in table A.

Example:

Table A                                                                           Table B

Key                Variable                                             Key               Variable

1                         geez                                                 4                         Nope

2                          this                                                  5                         NEW

3                          is strange

Would give you

Table A                                                                      

Key                Variable                                        

1                         geez                                               

2                          this                                             

3                          is strange

4                         Nope

5                         NEW

If this is what you need then you are correct on the outer join. Note however that a default sas merge statement does this inherently, (and is a lot lot faster). However a merge assumes that if you have the same column name in two tables, the table column encountered second in the merge will overwrite the first tables column.

If that is acceptable then just use a merge, and there should be almost no limit (I merged a 500,000 record  set of 1,000 +  columns of length > 2,000 just earlier today).

If you have the same column in two tables, and want to take the first non-null, then you will have to do the sql join with a coalese function. However note that you have to outer join in successive order. Example, the below code DOES NOT WORK!

proc sql;

create table new as

select *

from table1 a1

full outer join table2 a2 on a2.key=a1.key

full outer join table3 a3 on a3.key=a1.key;

quit;

run;

The correct code is listed below, notice the difference in the second full outer join. You 100% need the OR clause or else the same record in two tables could both get their own row (example, if the key was in table 2 and table3, but not table 1).

proc sql;

create table new as

select *

from table1 a1

full outer join table2 a2 on a2.key=a1.key

full outer join table3 a3 on a3.key=a1.key  OR a3.key=a2.key;

quit;

run;

Occasional Contributor
Posts: 11

Re: More new user questions...join tables

Thanks to everyone for your insights and assistance!  The learning continues...:smileyplain:

Ask a Question
Discussion stats
  • 4 replies
  • 485 views
  • 6 likes
  • 3 in conversation