Re: update vs proc sql

sugeshnambiar · Posted 08-29-2013 08:10 AM

Here is the scenario

main_data

name	my_value
A	20
A	50

new_data

name	my_value
A	80

update via data step

data main_data;

update main_data new_data;

by name;

run;

Output :

name	my_value
A	80
A	50

update via proc sql

proc sql;

update main_data

set my_value=(select my_value from new_data where main_data.name=new_data.name)

;

run;

output:

name	my_value
A	80
A	80

Both the above update doesnt yeild the same result !!! y are'nt they both same ????????

I want the output as the proc sql way,how do i achieve this thru data step ???

UrvishShah · Posted 08-29-2013 09:27 AM

Hi,

If the Duplicates Value are in your master dataset then you need to execute your trnsaction dataset multiple times to apply the updates for all the observations in your master dataset...For that i have used SAS index to search for observation and execute the transaction dataset in such a way that updates will apply to all the observations of master dataset...

Make sure that duplicates values are in MASTER DATASET not in TRANSACTION DATASET...

proc sql;

create index name

on new_data(name);

quit;

data want(drop = ori_val);

do until(totobs);

set main_data(rename = (my_value = ori_val)) nobs = totobs;

set new_data key = name;

if _iorc_ NE 0 then do;

my_value = my_value;

end;

run;

-Urvish

sugeshnambiar · Posted 08-29-2013 09:51 AM

it becomes a problem when updating more columns.

data step update is capable of only one to one update.

UrvishShah · Posted 08-29-2013 12:19 PM

Well there're so many ways of playing with in SAS Datastep...so it will be good if you post how your transaction and master dataset looks like...

The code i have posted, it works if you want to have update on single variable...

In case of more variables, you only need to add them in RENAME = dataset option and let the update will occure for all the observations...

As you said datastep update is capable of one to one update only...it is very true as per the SAS Documentation...However, it is also mentioned that if your master dataset is having duplicate values you can use this kind of tecniques...There is always limitation of any SAS Statements,options,functions,etc...So i think we should use all together, rather than stuck on only UPDATE statement, in such a way that we will have our desired output...

The all you need is like you only need to rename the variables if there is more than one variables...See the following code...

proc sql;

create index name

on new_data(name);

quit;

data want(drop = ori_val ori_val1);

do until(totobs);

set main_data(rename = (my_value = ori_val my_value1 = ori_val1)) nobs = totobs;

set new_data key = name;

if _iorc_ NE 0 then do;

my_value = my_value;

my_value1 = my_value1;

end;

run;

-Urvish

Tom · Posted 08-29-2013 09:56 AM

The simple answer is that the UPDATE clause in PROC SQL has nothing to do with the UPDATE statement of a DATA step.

The UPDATE statement for a DATA step is intended to apply transactions to a master dataset. In some ways it is a legacy from before there were transactional database systems and changes to the data needed to be keyed into a file and applied to that database. I data step using UPDATE will build a whole new dataset. Transaction records can include new id values that will result in additional records in the output dataset than were in the original master dataset.

The UPDATE statement in PROC SQL will modify existing records within an existing dataset. The MODIFY statement in a data step is most similar to this functionality.

Tom · Posted 08-29-2013 10:07 AM

You want the MERGE statement and not the UPDATE statement to do what you want.

* update via data step ;

data main_data;

MERGE main_data new_data;

by name;

run;

sugeshnambiar · Posted 08-29-2013 10:12 AM

Tom : merge and update yields the same output. That wud not be of much help here:smileyconfused:

Tom · Posted 08-29-2013 10:52 AM

Because the variable you are adding already exists on the master data set. So you need to do the trick with renaming the transaction variable(s). But the merge statement is closer to update statement in SQL because it does not have all of the additional features of the UPDATE statement, such as not updating when the value in the transaction dataset is missing or allowing for multiple transaction records to be applied to the same master record.

data main_data;

MERGE main_data new_data(rename=(my_value1=xxx my_value2=yyy) in=in_transaction) ;

by name;

if in_transaction then do; my_value1 = xxx ; my_value2=yyy; end;

drop xxx yyy;

run;

UrvishShah · Posted 09-01-2013 01:55 AM

Hi,

Looking forward, see the following data structure where your master dataset contains consecutive duplicates as well as non matching values compare to your transaction dataset...

Neither the techniques posted above can achive the desired output of applying each update for each duplicates in master dataset...Not even UPDATE statement of PROC SQL posted in your first question will work here...So dont stuck on only one SAS techniques...

See the following modified code...

/*===================================================*/

/* Master Data contains consecutive duplicate values */

/*===================================================*/

data master;

input name $ value;

cards4;

A 10

A 20

B 30

D 40

A 30

D 20

;;;;

/*============================================================*/

/* Transaction Data contains the values to be updated */

/* with some new values which are not there in master dataset */

/*============================================================*/

data trans;

input name $ value;

cards4;

A 80

C 30

D 50

;;;;

/*==============================================================*/

/* Data Step won't work to apply the update for each duplicates */

/* in master dataset, niether the COALESCE will work for this */

/* kind of situation, all we need to use CASE expression to */

/* apply correspondence update in master dataset */

/*==============================================================*/

proc sql;

create table want(drop = value rename = (value1 = value)) as

select coalesce(a.name,b.name) as name,

case

when a.value NE b.value then b.value

else a.value

end as value,

case

when calculated value = .then a.value

else calculated value

end as value1

from master as a

full join

trans as b

on a.name = b.name

orderby 1;

quit;

-Urvish

Scott_Mitchell · Posted 09-01-2013 10:16 PM

OP, I have replicated Tom's method and it results in exactly what you are after.

DATA WANT;

MERGE MASTER

TRANS (RENAME = (VALUE = NEWVALUE));

BY NAME;

IF MISSING(NEWVALUE) THEN NEWVALUE = VALUE;

DROP VALUE;

RENAME NEWVALUE = VALUE;

RUN;

What seems to be the problem?

Tom · Posted 09-02-2013 12:48 AM

You need to be careful with the requirements of your situation if you want to start using COALESCE() function or testing if the new value is missing. If that is your technique then it cannot handle updates that are intended to convert an existing non-missing value to a missing value.

sugeshnambiar · Posted 09-02-2013 05:13 AM

@scott : i believe a little modification will do my need . not as per the missing value , lookin on taking the update from slave dataset or new dataset values to the master dataset . Thanks

sugeshnambiar · Posted 09-02-2013 05:15 AM

thanks @UrvishShah @ tom @ scott . its been nice to see differnt ways

r0oki3 · Posted 11-26-2019 08:42 AM

what method did you end up using? Can you post your final solution?

SAS Innovate 2025: Call for Content

Classroom Training Available!