Solved: Re: Merge of two tables with same column names

buechler66 · Posted 06-03-2016 10:53 AM

I would like to merge these two tables. Both tables have the exact same variable names. Is there any way to append a string to the front of each variable name? Like having all variables coming from table aliased as 'A' would have 'A_' as part of the variable name. And columns coming from the table aliased as 'B' would be prefixed with 'B_'. Or is there a built-in way to handle this? I would be doing this so that I could later subset based on variable comparisons rules between like variables. For example:

if A_DELIVERY > B_DELIVERY;

proc sql;
	create table MergedData as 
		select  a.delivery, b.delivery, a.type_cd, b.type_cd, ...more
	        from iv_ora.ods_iv_recon_selected_mp a,  bids_ora.ods_bi_recon_selected_mp b                                                                                                                                                                                                                                                                                                                                                                                                                                                   
		where a.imb_dlvry_zip_5 = '14221'
		  and a.imb_code = b.imb_code;
quit;

PGStats · Posted 08-05-2016 11:46 AM

To get the IDs where a.DELIVERY > b.DELIVERY

proc sql;
create table A_gt_B_IDs as
select a.id
from
	have as a inner join 
	have as b on 
		a.id = b.id and 
		a.DELIVERY > b.DELIVERY
where a.source="A" and b.source="B";
quit;

/*Then you can, for example, extract all records for those IDs with: */

proc sql;
create table A_gt_B as
select *
from have
where id in (select id from A_gt_B_IDs);
quit;

PG

View solution in original post

gamotte · Posted 06-03-2016 11:02 AM

Hi,

SELECT a.delivery AS A_delivery, b.delivery AS B_delivery, ,...

buechler66 · Posted 06-03-2016 11:06 AM

yah, but there are hundreds of columns so I was hoping for a less manual way of doing this. Any thoughts?

gamotte · Posted 06-03-2016 11:13 AM

If you want to do this for all columns of two datasets having the same structure, you can use proc contents with the out parameter. Then you can use the generated output table to create a macrovariable contaning the list of variables in your select :

[not tested]

SELECT cat("a.",NAME," AS A_",NAME), cat("b.",NAME," AS B_",NAME)

INTO :a_cols SEPARATED BY ',', :b_cols SEPARATED BY ','

FROM contents_a; /* Table generated by proc contents on table A */

Then

SELECT &a_cols., &_bcols

If the structure differ, you can adapt by creating intermediary tables containing only the relevant columns

ballardw · Posted 06-03-2016 01:22 PM

Minor recomendation JOIN ON the imb_code instead of selecting WHERE after doing the join of all records. It doesn't take many records in the two sets to start seeing performance hits : 1000(records in A) * 1000 (records in B) get combined for 1,000,000 records that you then select from.

Also use

from (select * from iv_ora.ods_iv_recon_selected_mp where a.imb_dlvry_zip_5 = '14221') as a

to further reduce the size of the comparison sets.

buechler66 · Posted 06-03-2016 03:16 PM

So if I understand you correctly it would look like this?

	proc sql;
	create table QueryData as 
		select  b.actual_dlvry_date as AD_DT, 
			b.imb_code length = 31, 
			b.spm_calc_batch_date
	         from (select * from iv_ora.ods_iv_recon_selected_mp where imb_dlvry_zip_5 = '14221') a
                 inner join (select * from bids_ora.ods_bi_recon_selected_mp where imb_dlvry_zip_5 = '14221') b                                                                                                                                                                                                                                                                                                                                                                                                                                                   
		         on a.imb_code = b.imb_code
		where &rule;
	quit;

PGStats · Posted 06-03-2016 03:15 PM

Insteat of creating a wide data structure

A_id B_id A_x B_x ...

i.e. multiple columns with the same type of information, you should try to work with a long structure

'A' id x ...

'B' id x ...

that will allow simpler data manipulation and scaling.

For example, your two tables could be concatenated with

data want;

set iv_ora.ods_iv_recon_selected_mp(in=a) bids_ora.ods_bi_recon_selected_mp;

if a then source="A" else source="B";

run;

or

proc sql;

create table want as

select "A" as source, * from iv_ora.ods_iv_recon_selected_mp

union all corresponding

select "B" as source, * from bids_ora.ods_bi_recon_selected_mp;

quit;

PG

buechler66 · Posted 06-03-2016 03:21 PM

Interesting, but then how would I do my rule comparison in the long structure? In the wide structure the variables are on the same record and the comparison would look like:

if A_DELIVERY > B_DELIVERY then do...

How would this look in the long structure?

PGStats · Posted 06-03-2016 03:48 PM

With SQL, you would do:

select a.id, a.DELIVERY > b.DELIVERY as comparison

from have as a inner join have as b on a.id=b.id

where a.source="A" and b.source="B"

in a data step, you would do:

merge

have(where=(source="A") rename=delivery=a_delivery)

have(where=(source="B") rename=delivery=b_delivery);

by id;

comparison = a_delivery > b_delivery;

But, in most cases, source is not going to be pertinent information. Fields such as Year, Location, or Product will more likely distinguish information from different sources.

PG

buechler66 · Posted 08-05-2016 10:38 AM

I keep coming back to your example, but how does creating a variable named Comparison (e.g. a.DELIVERY > b.DELIVERY as comparison) help me to actually select the records where a.DELIVERY is greater than b.DELIVERY? I need to actually create a ds of records where a.DELIVERY > b.DELIVERY.

PGStats · Posted 08-05-2016 11:37 AM

To get the IDs where a.DELIVERY > b.DELIVERY

proc sql;

create table AgtB as

select a.id

from

have as a inner join have as b on a.id=b.id and a.DELIVERY > b.DELIVERY

where a.source="A" and b.source="B"

PG

PGStats · Posted 08-05-2016 11:46 AM

To get the IDs where a.DELIVERY > b.DELIVERY

proc sql;
create table A_gt_B_IDs as
select a.id
from
	have as a inner join 
	have as b on 
		a.id = b.id and 
		a.DELIVERY > b.DELIVERY
where a.source="A" and b.source="B";
quit;

/*Then you can, for example, extract all records for those IDs with: */

proc sql;
create table A_gt_B as
select *
from have
where id in (select id from A_gt_B_IDs);
quit;

PG

buechler66 · Posted 08-15-2016 10:49 AM

I've got it working! Thanks so much for taking the time to help. I really appreciate it.

Classroom Training Available!