- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
I am trying to figure out the best way to fill in blanks/update data in one data table with data form a column in another data table. I have tried the hash/define method, but I couldn't get it to work. So now I'm trying this:
proc sql;
select coalesce (MOBILENO, MOBILE) as MOST_RECENT_MN
from AP;
quit;
I dont really know how to use this function and the doco is over my head.
Basically I would like to select the MOBILENO (when it exists), which is in the AP table and have that overwrite the MOBILE column which is in the M1 table. Where the MOBILENO does not exist then the MOBILE column is fine.
Greatly appreciate any help or direction.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@2222 wrote:
thank you, and does this work if the variable comes from the same table as the variable it is updating? then I drop the variable after i have used it to update the variable that had missing values?
In that case there is no need for SQL at all.
data want;
set AP;
most_recent_mn = coalesceC(MOBILENO, MOBILE);
drop MOBILENO;
run;
Unless you are doing something extremely complex with extremely large datasets it is much better to create a NEW dataset when making changes to the data with code instead of making changes to an existing dataset. That way you can always correct any problems and re-run the code without having to worry that the mistakes have destroyed the source data.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You are on the right track but you've only listed one table in your query. It should look more like this:
proc sql;
create table Want as
select coalesce (MOBILENO, MOBILE) as MOST_RECENT_MN
from AP
left join M1
on AP.join_column = M1.join_column
;
quit;
I don't know what the joining rules are for your tables so you need to supply that. Putting MOBILENO first in the COALESCE means what will be used if populated, otherwise MOBILE is used.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
thank you for that, unfortunately I am getting an ambiguous reference error because MOBILE is in two datasets
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
In SQL, you need to be clear about where a variable is supposed to come from; define table aliases in the FROM clause and use them when addressing variables.
e.g.
proc sql;
create table want as
select
t1.id,
t1.var_a,
(t1.var_a - t2.var_b) as diff
from table_a t1 left join table_b t2
on t1.id = t2.id
;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@2222 wrote:
thank you, and does this work if the variable comes from the same table as the variable it is updating? then I drop the variable after i have used it to update the variable that had missing values?
In that case there is no need for SQL at all.
data want;
set AP;
most_recent_mn = coalesceC(MOBILENO, MOBILE);
drop MOBILENO;
run;
Unless you are doing something extremely complex with extremely large datasets it is much better to create a NEW dataset when making changes to the data with code instead of making changes to an existing dataset. That way you can always correct any problems and re-run the code without having to worry that the mistakes have destroyed the source data.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content