I have two tables
Table 1
name | status |
Yellow | no |
Blue | no |
Orange | no |
Red | no |
Green | no |
Table 2
name | status |
Yellow | yes |
Orange | yes |
Want FINAL:
name | status |
Yellow | yes |
Blue | no |
Orange | yes |
Red | no |
Green | no |
I used the following sql command but i don't want to create additional dataset
proc sql;
create table final as
select a.name, a.status, b.name, b.status
from table1 a left join table2 b on a.name = b.name;
quit;
This is the output i see:
name | status | name | status |
Yellow | no | Yellow | yes |
Blue | no | yes | |
Orange | no | Orange | yes |
Red | no | yes | |
Green | no | yes |
If you want SQL, use an update statement:
data t1;
input name $ status $;
datalines;
Yellow no
Blue no
Orange no
Red no
Green no
;
data t2;
input name $ status $;
datalines;
Yellow yes
Orange yes
;
proc sql;
update t1
set status = (select status from t2 where name=t1.name)
where name in (select name from t2);
select * from t1;
quit;
data t1;
input name $ status $;
datalines;
Yellow no
Blue no
Orange no
Red no
Green no
;
data t2;
input name $ status $;
datalines;
Yellow yes
Orange yes
;
proc sql;
create table final as
select a.name,coalescec(b.status,a.status) as status
from t1 a left join t2 b on a.name = b.name;
quit;
That cannot be the dataset you get. It might be what SQL produces when you remove the CREATE TABLE part of the statement and just run the select statement. But when you are creating a table it cannot have two variables with the same name. If your list of variables includes two or more with the same name then PROC SQL will only save the first one. So your result is just a copy of the TABLE1.
Normally in SAS you would just make a new table. If you create the new table with the same name as an old one SAS will remove the old one when the step finishes without errors. The only reason to worry about updating a table in place is if the table is really large (or possibly you are doing this update hundreds of thousands of times). In which case why are you using SAS to manage it instead of using some database system?
The data step UPDATE statement will allow to apply transactions to a dataset that has a unique key (note the key could require multiple variables. The datasets need to be sorted by the key variable(s). This will also allow insertions. So if we consider table1 as your source dataset and table2 as your transaction dataset the syntax would be:
data want;
update table1 table2;
by name;
run;
To replace table1 just use table1 instead of want in the data statement.
To truly update in place you can use the MODIFY statement instead of the UPDATE statement. That process is more complex but gives you more options for what types of actions to take.
In SQL to make a new table you will need to explicitly state how your want the new variables created.
create table want as
select a.NAME, coalesce(b.STATUS,a.STATUS)
from table1 a left join table2 b
on a.name = b.name
;
Again replace want with table1 if you want to replace your existing table1.
Or you could us the UPDATE statement of PROC SQL to modify in place. So perhaps something like this.
update table1 a
set status = (select b.status from table2 b where a.name=b.name)
where name in (select name from table2)
;
Also make sure that TABLE2 does not have multiple observations per NAME. The data step UPDATE will work as it will just apply the transactions in order, but the SQL codes would not work. The SQL join would produce multiple observations and the SQL UPDATE will fail since you cannot store more than one values of status into a single observation.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.