I'm using proc sql to merge a few data sets (I know a merge data step would be easier but we have to use sql) and for some reason when I merge them its not showing observations from the same on group in the same line. For instance, for university of Missouri its showing the AA_apps, total_apps, and percent_AA in the same line as the university but is showing random numbers underneath that I'm not sure even go to the university of missouri. Heres my code and a picture of the output to visualize what I'm talking about. Basically each fact sheet has descriptive statistics on each race and I want to merge them to create one data set.
proc sql;
create table all as
select *, coalesce (a.total_apps,b.total_apps,c.total_apps,d.total_apps,e.total_apps) astotal_apps
from facts1 a full join facts2 b
on a.institution=b.institution
full join facts3 c
on b.institution=c.institution
full join facts4 d
on c.institution=d.institution
full join facts5 e
on d.institution=e.institution;
quit;
Looks like institution is missing on some rows in one or more of your tables. Try excluding missing values from your joins:
on a.institution=b.institution and a.institution ne ''
It is a good idea to actually specify the table.variable instead of select *. Especially when using multiple tables. If a variable exists in multiple tables which value you get may be a crap shoot. You may also want to use () to control which result is combined with which.
Your comment about University of Missouri makes me think that you also should have at least one, if not for each source table a "where institution ='University of Missouri' such as
from facts1 a full join (select * from facts2 where institution='University of Missouri') b
Whenever I see a statement like "we have to use SQL" I always want to ask "do you use a hammer to dig a hole or a shovel to drive nails?". Both sort of work but the right tool for the task.
@matoma wrote:
.... but we have to use sql
Sadly, we are often confronted with statements like this here. I (and many, if not all other senior members of the communities) consider a statement like this a fallacy which can only come from a serious lack of SAS knowledge.
The issue at hand shall define the tools to use. Please reconsider, or do everything in your power to make the "powers that be" reconsider. Otherwise, you'll throw computing resources, manpower, and time (together known as "money") out the window.
A safer method would be:
proc sql;
create table inst as
select institution from facts1
union
select institution from facts2
union
select institution from facts3
union
select institution from facts4
union
select institution from facts5;
create table all as
select
i.institution,
a.AA_apps,
a.percent_AA,
b.AI_apps,
b.percent_AI,
......
coalesce (a.total_apps, b.total_apps, c.total_apps, d.total_apps, e.total_apps) as total_apps
from
inst as i left join
facts1 as a on i.institution=a.institution left join
facts2 as b on i.institution=b.institution left join
facts3 as c on i.institution=c.institution left join
facts4 as d on i.institution=d.institution left join
facts5 as e on i.institution=e.institution;
quit;
Are you sure you want COALESCE and not SUM?
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.