I have a task to use SAS to write some data aggregation in SAS Studio University. I've been using SQL Server and MySQL for over a decade.
However... I'm trying to do a simple self join on a table and I have no idea how to explain the results I'm getting....
I ended up going all the way down to a small table to figure out what is going on with 1 column called 'Gender' and 2 rows 'Male' and 'Female':
select
m.Gender,
f.Gender
from DATA.GENDERS m
inner join DATA.GENDERS f
on f.Gender = 'Female'
where m.Gender = 'Male';
However, the results I get are:
Gender | Gender | |
1 | Male | Male |
2 | Male | Male |
I've tried several different ways of putting it together and it seriously doesn't make any sense why I'm getting this join result? It's driving me crazy!
On a side note... it looks like INNER JOINs return null values?
Thanks...
Well, I figured out what is going on here, I was getting the issue because it was creating a view and I didn't have an alias for the columns which is invalid. What is interesting is that it proceeds with the view creation but uses only the first instance of the column name rather than throwing a warning.
data genders;
input gender $10.;
cards;
Male
Female
;
proc sql;
create view GENDERS_JOINED as
select
m.Gender,
f.Gender
from GENDERS m
inner join GENDERS f
on f.Gender = 'Female'
where m.Gender='Male';
This is resulting in the following:
However, providing an alias for the fields it doesn't create the issue:
data genders;
input gender $10.;
cards;
Male
Female
;
proc sql;
create view GENDERS_JOINED as
select
m.Gender as M_G,
f.Gender as F_G
from GENDERS m
inner join GENDERS f
on f.Gender = 'Female'
where m.Gender='Male';
If you aren't creating the view with no alias it still works fine as expected:
data genders;
input gender $10.;
cards;
Male
Female
;
proc sql;
select
m.Gender,
f.Gender
from GENDERS m
inner join GENDERS f
on f.Gender = 'Female'
where m.Gender='Male';
For reference, here's the exact same query in SQL Server. Do the joins just work completely different that standard TSQL?
Are you sure those are the values? Do you have a format attached to the variable?
data genders;
input gender $10.;
cards;
Male
Female
;
proc sql;
select
m.Gender as M_Gender
, f.Gender as F_Gender
from GENDERS m
inner join GENDERS f
on f.Gender = 'Female'
where m.Gender='Male'
;
quit;
M_Gender F_Gender ---------------------- Male Female
What is the actual problem that lead you to create this crazy example?
What real life situation would you have where you use a join condition that only referenced values from one of the tables?
Thanks for the response. The type for the field is CHAR(13).
Self joins aren't uncommon in the normal breadth of SQL work I do, but to elaborate the data actually looks more like this:
Gender | Question | Result | |
1 | Male | A | 100 |
2 | Male | B | 45 |
3 | Female | A | 66 |
So the join is Male to Female to compare the differences in values for their responses to Questions A, B, C, etc...
So the join is more like:
f.Gender='Female' and f.Question = m.Question
but even going back to just the one column I am getting the unexpected result.
Well, I figured out what is going on here, I was getting the issue because it was creating a view and I didn't have an alias for the columns which is invalid. What is interesting is that it proceeds with the view creation but uses only the first instance of the column name rather than throwing a warning.
data genders;
input gender $10.;
cards;
Male
Female
;
proc sql;
create view GENDERS_JOINED as
select
m.Gender,
f.Gender
from GENDERS m
inner join GENDERS f
on f.Gender = 'Female'
where m.Gender='Male';
This is resulting in the following:
However, providing an alias for the fields it doesn't create the issue:
data genders;
input gender $10.;
cards;
Male
Female
;
proc sql;
create view GENDERS_JOINED as
select
m.Gender as M_G,
f.Gender as F_G
from GENDERS m
inner join GENDERS f
on f.Gender = 'Female'
where m.Gender='Male';
If you aren't creating the view with no alias it still works fine as expected:
data genders;
input gender $10.;
cards;
Male
Female
;
proc sql;
select
m.Gender,
f.Gender
from GENDERS m
inner join GENDERS f
on f.Gender = 'Female'
where m.Gender='Male';
You cannot have two variables with the same name in a DATASET (or a view which has to mimic a dataset structure).
But you can, like in both your example and mine, make a REPORT from PROC SQL that is pulling values FROM two variables that USED TO have the same name, since that doesn't make an actual dataset .
So you essentially ran:
select gender,gender from have;
To give you even more food for thought, run this:
data genders;
input gender $;
datalines;
Female
Male
;
proc sql;
create view GENDERS_JOINED as
select
m.Gender,
f.Gender
from GENDERS m
inner join GENDERS f
on f.Gender = 'Female'
where m.Gender='Male';
quit;
data test1;
set genders_joined;
run;
proc sql;
create table test2 as
select * from genders_joined;
quit;
and look at the two resulting tables.
When I run this:
data genders;
input gender $;
datalines;
Female
Male
;
proc sql;
select
m.Gender,
f.Gender
from GENDERS m
inner join GENDERS f
on f.Gender = 'Female'
where m.Gender = 'Male';
quit;
I get this:
gender gender Male Female
so your data has to be different from what you think.
Hello Kurt,
I found the issue in an earlier post-- the problem was with creating the query as view without an alias which is invalid. However, instead of throwing an error SAS will generate the view but use the first instance of the key for all results.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.