BookmarkSubscribeRSS Feed
sugeshnambiar
Fluorite | Level 6

Here is the scenario

main_data

name

my_value

A20
A50

new_data

namemy_value
A80

update via data step

data main_data;                                                  

update main_data new_data;

by name;

run;

Output :

namemy_value
A80
A50

update via proc sql

proc sql;

update main_data

set my_value=(select my_value from new_data where main_data.name=new_data.name)

;

run;

output:

namemy_value
A80
A80

Both the above update doesnt yeild the same result !!! y are'nt they both same ????????

I want the output as the proc sql way,how do i achieve this thru data step ???

13 REPLIES 13
UrvishShah
Fluorite | Level 6

Hi,

If the Duplicates Value are in your master dataset then you need to execute your trnsaction dataset multiple times to apply the updates for all the observations in your master dataset...For that i have used SAS index to search for observation and execute the transaction dataset in such a way that updates will apply to all the observations of master dataset...

Make sure that duplicates values are in MASTER DATASET not in TRANSACTION DATASET...

proc sql;

  create index name

  on new_data(name);

quit;

data want(drop = ori_val);

  do until(totobs);

       set main_data(rename = (my_value = ori_val)) nobs = totobs;

       set new_data key = name;

       if _iorc_ NE 0 then do;

      my_value = my_value;

       end;

end;

run;

-Urvish

sugeshnambiar
Fluorite | Level 6

it becomes a problem when updating  more columns.

data step update is capable of only one to one update.

UrvishShah
Fluorite | Level 6


Well there're so many ways of playing with in SAS Datastep...so it will be good if you post how your transaction and master dataset looks like...

The code i have posted, it works if you want to have update on single variable...

In case of more variables, you only need to add them in RENAME = dataset option and let the update will occure for all the observations...


As you said datastep update is capable of one to one update only...it is very true as per the SAS Documentation...However, it is also mentioned that if your master dataset is having duplicate values you can use this kind of tecniques...There is always limitation of any SAS Statements,options,functions,etc...So i think we should use all together, rather than stuck on only UPDATE statement, in such a way that we will have our desired output...

The all you need is like you only need to rename the variables if there is more than one variables...See the following code...

proc sql;

  create index name

  on new_data(name);

quit;

data want(drop = ori_val ori_val1);

  do until(totobs);

       set main_data(rename = (my_value = ori_val my_value1 = ori_val1)) nobs = totobs;

       set new_data key = name;

       if _iorc_ NE 0 then do;

       my_value = my_value;

       my_value1 = my_value1;

       end;

  end;

run;

-Urvish

Tom
Super User Tom
Super User

The simple answer is that the UPDATE clause in PROC SQL has nothing to do with the UPDATE statement of a DATA step.

The UPDATE statement for a DATA step is intended to apply transactions to a master dataset.  In some ways it is a legacy from before there were transactional database systems and changes to the data needed to be keyed into a file and applied to that database.  I data step using UPDATE will build a whole new dataset.  Transaction records can include new id values that will result in additional records in the output dataset than were in the original master dataset.

The UPDATE statement in PROC SQL will modify existing records within an existing dataset.  The MODIFY statement in a data step is most similar to this functionality.

Tom
Super User Tom
Super User

You want the MERGE statement and not the UPDATE statement to do what you want.

* update via data step ;

data main_data;                                                 

MERGE main_data new_data;

by name;

run;

sugeshnambiar
Fluorite | Level 6

Tom :   merge and update yields the same output. That wud not be of much help here:smileyconfused:

Tom
Super User Tom
Super User

Because the variable you are adding already exists on the master data set.  So you need to do the trick with renaming the transaction variable(s).  But the merge statement is closer to update statement in SQL because it does not have all of the additional features of the UPDATE statement, such as not updating when the value in the transaction dataset is missing or allowing for multiple transaction records to be applied to the same master record.

data main_data;                                                

MERGE main_data new_data(rename=(my_value1=xxx my_value2=yyy) in=in_transaction) ;

by name;

if in_transaction then do; my_value1 = xxx ; my_value2=yyy; end;

drop xxx yyy;

run;

UrvishShah
Fluorite | Level 6

Hi,

Looking forward, see the following data structure where your master dataset contains consecutive duplicates as well as non matching values compare to your transaction dataset...

Neither the techniques posted above can achive the desired output of applying each update for each duplicates in master dataset...Not even UPDATE statement of PROC SQL posted in your first question will work here...So dont stuck on only one SAS techniques...

See the following modified code...

/*===================================================*/

/* Master Data contains consecutive duplicate values */

/*===================================================*/

data master;

  input name $ value;

    cards4;

A 10

A 20

B 30

D 40

A 30

D 20

;;;;

/*============================================================*/

/* Transaction Data contains the values to be updated         */

/* with some new values which are not there in master dataset */

/*============================================================*/

data trans;

   input name $ value;

   cards4;

A 80

C 30

D 50

;;;;

/*==============================================================*/

/* Data Step won't work to apply the update for each duplicates */

/* in master dataset, niether the COALESCE will work for this   */

/* kind of situation, all we need to use CASE expression to     */

/* apply correspondence update in master dataset                */

/*==============================================================*/

proc sql;

    create table want(drop = value rename = (value1 = value)) as

    select coalesce(a.name,b.name) as name,

         case

             when a.value NE b.value then b.value

                            else a.value

                   end as value,

         case

             when calculated value = .then a.value

                            else calculated value

        end as value1

  from master as a

            full join

            trans as b

  on a.name = b.name

  orderby 1;

quit;

-Urvish

Scott_Mitchell
Quartz | Level 8

OP, I have replicated Tom's method and it results in exactly what you are after.

DATA WANT;

  MERGE MASTER

       TRANS (RENAME = (VALUE = NEWVALUE));

  BY NAME;

  IF MISSING(NEWVALUE) THEN NEWVALUE = VALUE;

  DROP VALUE;

  RENAME NEWVALUE = VALUE;

RUN;

What seems to be the problem?

Tom
Super User Tom
Super User

You need to be careful with the requirements of your situation if you want to start using COALESCE() function or testing if the new value is missing.  If that is your technique then it cannot handle updates that are intended to convert an existing non-missing value to a missing value.

sugeshnambiar
Fluorite | Level 6

@scott :   i believe a little modification will do my need . not as per the missing value , lookin on taking the update from slave dataset or new dataset values to the master dataset . Thanks Smiley Happy

sugeshnambiar
Fluorite | Level 6

thanks @UrvishShah @ tom  @ scott . its been nice to see differnt ways

r0oki3
Obsidian | Level 7

what method did you end up using? Can you post your final solution?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 2733 views
  • 7 likes
  • 5 in conversation