BookmarkSubscribeRSS Feed
matoma
Obsidian | Level 7

I'm using proc sql to merge a few data sets (I know a merge data step would be easier but we have to use sql) and for some reason when I merge them its not showing observations from the same on group in the same line. For instance, for university of Missouri its showing the AA_apps, total_apps, and percent_AA in the same line as the university but is showing random numbers underneath that I'm not sure even go to the university of missouri. Heres my code and a picture of the output to visualize what I'm talking about. Basically each fact sheet has descriptive statistics on each race and I want to merge them to create one data set. 

 

proc sql;

create table all as

select *, coalesce (a.total_apps,b.total_apps,c.total_apps,d.total_apps,e.total_apps) astotal_apps

from facts1 a full join facts2 b

on a.institution=b.institution

full join facts3 c

on b.institution=c.institution

full join facts4 d

on c.institution=d.institution

full join facts5 e

on d.institution=e.institution;

quit;

matoma_0-1588191077740.png

 

4 REPLIES 4
SASKiwi
PROC Star

Looks like institution is missing  on some rows in one or more of your tables. Try excluding missing values from your joins:

on a.institution=b.institution and a.institution ne ''

 

ballardw
Super User

It is a good idea to actually specify the table.variable instead of select *. Especially when using multiple tables. If a variable exists in multiple tables which value you get may be a crap shoot. You may also want to use () to control which result is combined with which.

 

Your comment about University of Missouri makes me think that you also should have at least one, if not for each source table a "where institution ='University of Missouri' such as

from facts1 a full join 
   (select * from facts2 
    where institution='University of Missouri') b

 

Whenever I see a statement like "we have to use SQL" I always want to ask "do you use a hammer to dig a hole or a shovel to drive nails?". Both sort of work but the right tool for the task.

 

 

Kurt_Bremser
Super User

@matoma wrote:

.... but we have to use sql


Sadly, we are often confronted with statements like this here. I (and many, if not all other senior members of the communities) consider a statement like this a fallacy which can only come from a serious lack of SAS knowledge.

The issue at hand shall define the tools to use. Please reconsider, or do everything in your power to make the "powers that be" reconsider. Otherwise, you'll throw computing resources, manpower, and time (together known as "money") out the window.

PGStats
Opal | Level 21

A safer method would be:

 

proc sql;
create table inst as
select institution from facts1
union
select institution from facts2
union
select institution from facts3
union
select institution from facts4
union
select institution from facts5;

create table all as
select 
	i.institution, 
	a.AA_apps,
	a.percent_AA,
	b.AI_apps,
	b.percent_AI, 
	......
	coalesce (a.total_apps, b.total_apps, c.total_apps, d.total_apps, e.total_apps) as total_apps
from 
	inst as i left join 
	facts1 as a on i.institution=a.institution left join 
	facts2 as b on i.institution=b.institution left join 
	facts3 as c on i.institution=c.institution left join 
	facts4 as d on i.institution=d.institution left join 
	facts5 as e on i.institution=e.institution;
quit;

Are you sure you want COALESCE and not SUM?

PG

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 766 views
  • 1 like
  • 5 in conversation