BookmarkSubscribeRSS Feed
sunilreddy
Fluorite | Level 6

I have millions of records in table and always appending the new records into the table.

idcol1col2
1id1desc1
5id2desc3
3id1desc2

in above case, first two records are old and 3rd record is appended to existing table. I want to change old desc1 column with new desc2 for same col1 value.

i want my output as

       

idcol1col2
1id1desc2
5id2desc3
3id1desc2

Any efficient way to do this

6 REPLIES 6
SASKiwi
PROC Star

From your description it appears you want to do both updating of an existing record if already present, and appending of new records if not present.

It would help if you could change your example to showing three things: the original rows, the rows to be added by updating or appending, and the expected final rows.

Doing both updating and appending requires more resources than just appending, so clarification of your requirements would be helpful first.

Haikuo
Onyx | Level 15

Hi,

There would be many ways to do it, depending on your terms and conditions. One assumption for the following solutions is that you still have access to a separate data set containing new data.

1. if you don't care about the records order, traditional data step is sufficient to do the job:

data have;

input (id    col1    col2) ($);

cards;

1    id1    desc1

5    id2    desc3

3    id1    desc2

;

data new;

input (id    col1    col2) ($);

cards;

3    id1    desc2

;

proc sort data=have;

by col1;

run;

data want;

merge have(in=a) new(drop=id rename=col2=col3);

by col1;

col2=coalescec(col3,col2);

drop col3;

run;

2. if you do need to keep the order in 'have', then hash() can be handy:

data want;

if _n_=1 then do;

   set new(obs=1);

   dcl hash h(dataset:'new', ordered:'a',duplicate:'r');

   h.definekey('col1');

   h.definedata('col2');

   h.definedone();

end;

set have;

rc=h.find();

drop rc;

run;

Haikuo

Tom
Super User Tom
Super User

Why SQL?

Why not use a MERGE or UPDATE statement?

Ksharp
Super User
data old;
input (id    col1    col2) ($);
cards;
1    id1    desc1
5    id2    desc3
;
run;
 

data new;
input (id    col1    col2) ($);
cards;
3    id1    desc2
;
run;
proc sql;
update old 
 set col2=(select col2 from new where new.col1=old.col1)
  where exists(select col1 from new where new.col1=old.col1)
  ;
quit;
proc append base=old data=new force;run;

Ksharp

shivas
Pyrite | Level 9

Hi,

You can try this..

data have;

input (id    col1    col2) ($);

cards;

1    id1    desc1

5    id2    desc3

3    id1    desc1

;

data new;

input (id    col1    col2) ($);

cards;

3    id1    desc2

;

proc sort data=have;

by id  col1;

run;

data have ;

   update have(in=a) new(in=b) ;

   by id col1;

   if a ;

   run ;

or

proc sql ;

   update have m

   set col2 = (select col2 from new r

                    where m.id = r.id

                    and m.col1=r.col1)

   where m.id in (select id from new) ;

   quit ;

Thanks,

Shiva

Keith
Obsidian | Level 7

If you are updating / appending a small number of records in your large dataset then using MODIFY may be the most efficient technique. And if you haven't done so already, I would recommend creating an index on COL1 in your master dataset.  Below are 2 examples of updating the master dataset (indexed) using MODIFY, the 2nd example is likely to be fastest if your transaction dataset is small.  There's a good paper on using MODIFY at http://www2.sas.com/proceedings/sugi31/264-31.pdf

data have (index=(col1));

input (id col1 col2) ($);

cards;

5 id2 desc3

1 id1 desc1

;

run;

data new;

input (id col1 col2) ($);

cards;

4 id3 desc4

3 id1 desc2

;

run;

/* example 1 */

data have ;

modify have new (rename=(id=id_new col2=col2_new));

by col1;

  select (_iorc_);

  when (%sysrc(_sok)) do;

    col2=col2_new;

  replace;

  id=id_new;

  output;

  end;

  when (%sysrc(_dsenmr)) do;

  id=id_new;

  col2=col2_new;

  output;

  _error_=0;

  end;

otherwise end;

run ;

/* example 2 */

data have;

set new (rename=(id=id_new col2=col2_new));

modify have key=col1;

  select (_iorc_);

  when (%sysrc(_sok)) do;

    col2=col2_new;

  replace;

  id=id_new;

  output;

  end;

  when (%sysrc(_dsenom)) do;

  id=id_new;

  col2=col2_new;

  output;

  _error_=0;

  end;

otherwise end;

run;

Message was edited by: Keith Timms Sorry, just realised that this method only works properly where the values for COL1 are unique in the master dataset.  I'll have a look to see if I can get it working where duplicate values exist.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1178 views
  • 0 likes
  • 7 in conversation