Here is the scenario
main_data
name | my_value |
---|---|
A | 20 |
A | 50 |
new_data
name | my_value |
---|---|
A | 80 |
update via data step
data main_data;
update main_data new_data;
by name;
run;
Output :
name | my_value |
---|---|
A | 80 |
A | 50 |
update via proc sql
proc sql;
update main_data
set my_value=(select my_value from new_data where main_data.name=new_data.name)
;
run;
output:
name | my_value |
---|---|
A | 80 |
A | 80 |
Both the above update doesnt yeild the same result !!! y are'nt they both same ????????
I want the output as the proc sql way,how do i achieve this thru data step ???
Hi,
If the Duplicates Value are in your master dataset then you need to execute your trnsaction dataset multiple times to apply the updates for all the observations in your master dataset...For that i have used SAS index to search for observation and execute the transaction dataset in such a way that updates will apply to all the observations of master dataset...
Make sure that duplicates values are in MASTER DATASET not in TRANSACTION DATASET...
proc sql;
create index name
on new_data(name);
quit;
data want(drop = ori_val);
do until(totobs);
set main_data(rename = (my_value = ori_val)) nobs = totobs;
set new_data key = name;
if _iorc_ NE 0 then do;
my_value = my_value;
end;
end;
run;
-Urvish
it becomes a problem when updating more columns.
data step update is capable of only one to one update.
Well there're so many ways of playing with in SAS Datastep...so it will be good if you post how your transaction and master dataset looks like...
The code i have posted, it works if you want to have update on single variable...
In case of more variables, you only need to add them in RENAME = dataset option and let the update will occure for all the observations...
As you said datastep update is capable of one to one update only...it is very true as per the SAS Documentation...However, it is also mentioned that if your master dataset is having duplicate values you can use this kind of tecniques...There is always limitation of any SAS Statements,options,functions,etc...So i think we should use all together, rather than stuck on only UPDATE statement, in such a way that we will have our desired output...
The all you need is like you only need to rename the variables if there is more than one variables...See the following code...
proc sql;
create index name
on new_data(name);
quit;
data want(drop = ori_val ori_val1);
do until(totobs);
set main_data(rename = (my_value = ori_val my_value1 = ori_val1)) nobs = totobs;
set new_data key = name;
if _iorc_ NE 0 then do;
my_value = my_value;
my_value1 = my_value1;
end;
end;
run;
-Urvish
The simple answer is that the UPDATE clause in PROC SQL has nothing to do with the UPDATE statement of a DATA step.
The UPDATE statement for a DATA step is intended to apply transactions to a master dataset. In some ways it is a legacy from before there were transactional database systems and changes to the data needed to be keyed into a file and applied to that database. I data step using UPDATE will build a whole new dataset. Transaction records can include new id values that will result in additional records in the output dataset than were in the original master dataset.
The UPDATE statement in PROC SQL will modify existing records within an existing dataset. The MODIFY statement in a data step is most similar to this functionality.
You want the MERGE statement and not the UPDATE statement to do what you want.
* update via data step ;
data main_data;
MERGE main_data new_data;
by name;
run;
Tom : merge and update yields the same output. That wud not be of much help here:smileyconfused:
Because the variable you are adding already exists on the master data set. So you need to do the trick with renaming the transaction variable(s). But the merge statement is closer to update statement in SQL because it does not have all of the additional features of the UPDATE statement, such as not updating when the value in the transaction dataset is missing or allowing for multiple transaction records to be applied to the same master record.
data main_data;
MERGE main_data new_data(rename=(my_value1=xxx my_value2=yyy) in=in_transaction) ;
by name;
if in_transaction then do; my_value1 = xxx ; my_value2=yyy; end;
drop xxx yyy;
run;
Hi,
Looking forward, see the following data structure where your master dataset contains consecutive duplicates as well as non matching values compare to your transaction dataset...
Neither the techniques posted above can achive the desired output of applying each update for each duplicates in master dataset...Not even UPDATE statement of PROC SQL posted in your first question will work here...So dont stuck on only one SAS techniques...
See the following modified code...
/*===================================================*/
/* Master Data contains consecutive duplicate values */
/*===================================================*/
data master;
input name $ value;
cards4;
A 10
A 20
B 30
D 40
A 30
D 20
;;;;
/*============================================================*/
/* Transaction Data contains the values to be updated */
/* with some new values which are not there in master dataset */
/*============================================================*/
data trans;
input name $ value;
cards4;
A 80
C 30
D 50
;;;;
/*==============================================================*/
/* Data Step won't work to apply the update for each duplicates */
/* in master dataset, niether the COALESCE will work for this */
/* kind of situation, all we need to use CASE expression to */
/* apply correspondence update in master dataset */
/*==============================================================*/
proc sql;
create table want(drop = value rename = (value1 = value)) as
select coalesce(a.name,b.name) as name,
case
when a.value NE b.value then b.value
else a.value
end as value,
case
when calculated value = .then a.value
else calculated value
end as value1
from master as a
full join
trans as b
on a.name = b.name
orderby 1;
quit;
-Urvish
OP, I have replicated Tom's method and it results in exactly what you are after.
DATA WANT;
MERGE MASTER
TRANS (RENAME = (VALUE = NEWVALUE));
BY NAME;
IF MISSING(NEWVALUE) THEN NEWVALUE = VALUE;
DROP VALUE;
RENAME NEWVALUE = VALUE;
RUN;
What seems to be the problem?
You need to be careful with the requirements of your situation if you want to start using COALESCE() function or testing if the new value is missing. If that is your technique then it cannot handle updates that are intended to convert an existing non-missing value to a missing value.
@scott : i believe a little modification will do my need . not as per the missing value , lookin on taking the update from slave dataset or new dataset values to the master dataset . Thanks
thanks @UrvishShah @ tom @ scott . its been nice to see differnt ways
what method did you end up using? Can you post your final solution?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.