Help using Base SAS procedures

sql

Reply
Frequent Contributor
Posts: 115

sql

I have millions of records in table and always appending the new records into the table.

idcol1col2
1id1desc1
5id2desc3
3id1desc2

in above case, first two records are old and 3rd record is appended to existing table. I want to change old desc1 column with new desc2 for same col1 value.

i want my output as

       

idcol1col2
1id1desc2
5id2desc3
3id1desc2

Any efficient way to do this

Super User
Posts: 3,250

Re: sql

Posted in reply to sunilreddy

From your description it appears you want to do both updating of an existing record if already present, and appending of new records if not present.

It would help if you could change your example to showing three things: the original rows, the rows to be added by updating or appending, and the expected final rows.

Doing both updating and appending requires more resources than just appending, so clarification of your requirements would be helpful first.

Respected Advisor
Posts: 3,156

Re: sql

Posted in reply to sunilreddy

Hi,

There would be many ways to do it, depending on your terms and conditions. One assumption for the following solutions is that you still have access to a separate data set containing new data.

1. if you don't care about the records order, traditional data step is sufficient to do the job:

data have;

input (id    col1    col2) ($);

cards;

1    id1    desc1

5    id2    desc3

3    id1    desc2

;

data new;

input (id    col1    col2) ($);

cards;

3    id1    desc2

;

proc sort data=have;

by col1;

run;

data want;

merge have(in=a) new(drop=id rename=col2=col3);

by col1;

col2=coalescec(col3,col2);

drop col3;

run;

2. if you do need to keep the order in 'have', then hash() can be handy:

data want;

if _n_=1 then do;

   set new(obs=1);

   dcl hash h(dataset:'new', ordered:'a',duplicate:'r');

   h.definekey('col1');

   h.definedata('col2');

   h.definedone();

end;

set have;

rc=h.find();

drop rc;

run;

Haikuo

Super User
Super User
Posts: 7,038

Re: sql

Posted in reply to sunilreddy

Why SQL?

Why not use a MERGE or UPDATE statement?

Super User
Posts: 10,018

Re: sql

Posted in reply to sunilreddy
data old;
input (id    col1    col2) ($);
cards;
1    id1    desc1
5    id2    desc3
;
run;
 

data new;
input (id    col1    col2) ($);
cards;
3    id1    desc2
;
run;
proc sql;
update old 
 set col2=(select col2 from new where new.col1=old.col1)
  where exists(select col1 from new where new.col1=old.col1)
  ;
quit;
proc append base=old data=new force;run;

Ksharp

Super Contributor
Posts: 349

Re: sql

Hi,

You can try this..

data have;

input (id    col1    col2) ($);

cards;

1    id1    desc1

5    id2    desc3

3    id1    desc1

;

data new;

input (id    col1    col2) ($);

cards;

3    id1    desc2

;

proc sort data=have;

by id  col1;

run;

data have ;

   update have(in=a) new(in=b) ;

   by id col1;

   if a ;

   run ;

or

proc sql ;

   update have m

   set col2 = (select col2 from new r

                    where m.id = r.id

                    and m.col1=r.col1)

   where m.id in (select id from new) ;

   quit ;

Thanks,

Shiva

Regular Contributor
Posts: 151

Re: sql

Posted in reply to sunilreddy

If you are updating / appending a small number of records in your large dataset then using MODIFY may be the most efficient technique. And if you haven't done so already, I would recommend creating an index on COL1 in your master dataset.  Below are 2 examples of updating the master dataset (indexed) using MODIFY, the 2nd example is likely to be fastest if your transaction dataset is small.  There's a good paper on using MODIFY at http://www2.sas.com/proceedings/sugi31/264-31.pdf

data have (index=(col1));

input (id col1 col2) ($);

cards;

5 id2 desc3

1 id1 desc1

;

run;

data new;

input (id col1 col2) ($);

cards;

4 id3 desc4

3 id1 desc2

;

run;

/* example 1 */

data have ;

modify have new (rename=(id=id_new col2=col2_new));

by col1;

  select (_iorc_);

  when (%sysrc(_sok)) do;

    col2=col2_new;

  replace;

  id=id_new;

  output;

  end;

  when (%sysrc(_dsenmr)) do;

  id=id_new;

  col2=col2_new;

  output;

  _error_=0;

  end;

otherwise end;

run ;

/* example 2 */

data have;

set new (rename=(id=id_new col2=col2_new));

modify have key=col1;

  select (_iorc_);

  when (%sysrc(_sok)) do;

    col2=col2_new;

  replace;

  id=id_new;

  output;

  end;

  when (%sysrc(_dsenom)) do;

  id=id_new;

  col2=col2_new;

  output;

  _error_=0;

  end;

otherwise end;

run;

Message was edited by: Keith Timms Sorry, just realised that this method only works properly where the values for COL1 are unique in the master dataset.  I'll have a look to see if I can get it working where duplicate values exist.

Ask a Question
Discussion stats
  • 6 replies
  • 271 views
  • 0 likes
  • 7 in conversation