PROC SQL join question

Reply
Frequent Contributor
Posts: 131

PROC SQL join question

purpose: create a table like the following

VariablemarketingnomarketingdifferenceProbt
abc.(           .).(           .)..
cde8.4(        27.8)8.4(        27.8)00.9886
efg.(           .).(           .).0.0236
hij.(           .).(           .).0.8113
klm.(           .).(           .).0.1963
nop.(           .).(           .).0.0567
if0.3(         5.4)0.2(         4.6)0.10.157
region015.3(        22.4)5.7(        23.3)-0.4.
region027.9(          27)8.1(        27.2)-0.2.
region038.8(        28.3)8.5(        27.9)0.3.
region0417.2(        37.8)17.1(        37.6)0.1.
region0525.9(        43.8)25.5(        43.6)0.4.
region0610.2(        30.2)10.1(        30.1)0.1.
region074.8(        21.4)4.7(        21.1)0.1.
region082.8(        16.5)3.1(        17.3)-0.3.
region0914.5(        35.2)14.7(        35.5)-0.2.
region102.6(        15.9)2.6(        15.8)0.


(some) syntax:

proc sql;

create table table2_amp as

select  mean_amp.variable,      

        amp.marketing,

        amp.nomarketing,

        amp.difference,

        ttest.Probt  

from    mean_amp as amp left join  ttest_amp as ttest on amp.variable= ttest.variable;

quit;

issue:

As seen above, it looks like the way the input data sets were joined in PROC SQL needs to be revised because many values did not get copied from the input data sets.  In other words, the input data sets' "variable" variable don't have observations in the same order and it looks to me like PROC SQL requires them to be when joining tables.  How could I either sort the input datasets' variable valid values in the order I want or revise the SQL step?


Thank you so much

Esteemed Advisor
Esteemed Advisor
Posts: 7,203

Re: PROC SQL join question

Well, firstly your example code is invalid.  You use an alias for dataset mean_amp called amp, in you select you use an alias mean_amp which does not exist.


Secondly, what do these variables look like in the datasets:

from    mean_amp as amp left join  ttest_amp as ttest on amp.variable= ttest.variable;

AMP.VARIABLE and TTEST.VARIABLE.

It is irrelevant what the sort order is, SQL does not require presorted data, so there must be some disconnect between the data contained in each.


Frequent Contributor
Posts: 131

Re: PROC SQL join question

The amp.variable valid values (in the following order):

brc
brf
es
fed
fns
ic
if
osharegion01
osharegion02
osharegion03
osharegion04
osharegion05
osharegion06
osharegion07
osharegion08
osharegion09
osharegion10

The ttest.variable valid values:

fed   
fns   
es    
brf   
brc   
if    
ic    

osharegion01

osharegion02

osharegion03

osharegion04

osharegion05

osharegion06

osharegion07

osharegion08

osharegion09

osharegion10

Does that help? Thanks, RW9!

Esteemed Advisor
Esteemed Advisor
Posts: 7,203

Re: PROC SQL join question

Not really I am afraid.  From that there is no reason why the join would not work.  Perhaps provide, in a datastep, sample data from both datasets.  I note that the values you post do not match the ones in the output provided in the first post.  In your first post it would seem to my mind that dataset TTEST only have variable=CDE to IF, whereas the other one has all of them.

Ask a Question
Discussion stats
  • 3 replies
  • 192 views
  • 3 likes
  • 2 in conversation