I would like to merge these two tables. Both tables have the exact same variable names. Is there any way to append a string to the front of each variable name? Like having all variables coming from table aliased as 'A' would have 'A_' as part of the variable name. And columns coming from the table aliased as 'B' would be prefixed with 'B_'. Or is there a built-in way to handle this? I would be doing this so that I could later subset based on variable comparisons rules between like variables. For example:
if A_DELIVERY > B_DELIVERY;
proc sql;
create table MergedData as
select a.delivery, b.delivery, a.type_cd, b.type_cd, ...more
from iv_ora.ods_iv_recon_selected_mp a, bids_ora.ods_bi_recon_selected_mp b
where a.imb_dlvry_zip_5 = '14221'
and a.imb_code = b.imb_code;
quit;
To get the IDs where a.DELIVERY > b.DELIVERY
proc sql;
create table A_gt_B_IDs as
select a.id
from
have as a inner join
have as b on
a.id = b.id and
a.DELIVERY > b.DELIVERY
where a.source="A" and b.source="B";
quit;
/*Then you can, for example, extract all records for those IDs with: */
proc sql;
create table A_gt_B as
select *
from have
where id in (select id from A_gt_B_IDs);
quit;
If you want to do this for all columns of two datasets having the same structure, you can use proc contents with the out parameter. Then you can use the generated output table to create a macrovariable contaning the list of variables in your select :
[not tested]
SELECT cat("a.",NAME," AS A_",NAME), cat("b.",NAME," AS B_",NAME)
INTO :a_cols SEPARATED BY ',', :b_cols SEPARATED BY ','
FROM contents_a; /* Table generated by proc contents on table A */
Then
SELECT &a_cols., &_bcols
If the structure differ, you can adapt by creating intermediary tables containing only the relevant columns
Minor recomendation JOIN ON the imb_code instead of selecting WHERE after doing the join of all records. It doesn't take many records in the two sets to start seeing performance hits : 1000(records in A) * 1000 (records in B) get combined for 1,000,000 records that you then select from.
Also use
from (select * from iv_ora.ods_iv_recon_selected_mp where a.imb_dlvry_zip_5 = '14221') as a
to further reduce the size of the comparison sets.
So if I understand you correctly it would look like this?
proc sql;
create table QueryData as
select b.actual_dlvry_date as AD_DT,
b.imb_code length = 31,
b.spm_calc_batch_date
from (select * from iv_ora.ods_iv_recon_selected_mp where imb_dlvry_zip_5 = '14221') a
inner join (select * from bids_ora.ods_bi_recon_selected_mp where imb_dlvry_zip_5 = '14221') b
on a.imb_code = b.imb_code
where &rule;
quit;
Insteat of creating a wide data structure
A_id B_id A_x B_x ...
i.e. multiple columns with the same type of information, you should try to work with a long structure
'A' id x ...
'B' id x ...
that will allow simpler data manipulation and scaling.
For example, your two tables could be concatenated with
data want;
set iv_ora.ods_iv_recon_selected_mp(in=a) bids_ora.ods_bi_recon_selected_mp;
if a then source="A" else source="B";
run;
or
proc sql;
create table want as
select "A" as source, * from iv_ora.ods_iv_recon_selected_mp
union all corresponding
select "B" as source, * from bids_ora.ods_bi_recon_selected_mp;
quit;
Interesting, but then how would I do my rule comparison in the long structure? In the wide structure the variables are on the same record and the comparison would look like:
if A_DELIVERY > B_DELIVERY then do...
How would this look in the long structure?
With SQL, you would do:
select a.id, a.DELIVERY > b.DELIVERY as comparison
from have as a inner join have as b on a.id=b.id
where a.source="A" and b.source="B"
in a data step, you would do:
merge
have(where=(source="A") rename=delivery=a_delivery)
have(where=(source="B") rename=delivery=b_delivery);
by id;
comparison = a_delivery > b_delivery;
But, in most cases, source is not going to be pertinent information. Fields such as Year, Location, or Product will more likely distinguish information from different sources.
I keep coming back to your example, but how does creating a variable named Comparison (e.g. a.DELIVERY > b.DELIVERY as comparison) help me to actually select the records where a.DELIVERY is greater than b.DELIVERY? I need to actually create a ds of records where a.DELIVERY > b.DELIVERY.
To get the IDs where a.DELIVERY > b.DELIVERY
proc sql;
create table AgtB as
select a.id
from
have as a inner join have as b on a.id=b.id and a.DELIVERY > b.DELIVERY
where a.source="A" and b.source="B"
To get the IDs where a.DELIVERY > b.DELIVERY
proc sql;
create table A_gt_B_IDs as
select a.id
from
have as a inner join
have as b on
a.id = b.id and
a.DELIVERY > b.DELIVERY
where a.source="A" and b.source="B";
quit;
/*Then you can, for example, extract all records for those IDs with: */
proc sql;
create table A_gt_B as
select *
from have
where id in (select id from A_gt_B_IDs);
quit;
I've got it working! Thanks so much for taking the time to help. I really appreciate it.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.