sql

sunilreddy · Posted 07-09-2012 06:42 PM

I have millions of records in table and always appending the new records into the table.

id	col1	col2
1	id1	desc1
5	id2	desc3
3	id1	desc2

in above case, first two records are old and 3rd record is appended to existing table. I want to change old desc1 column with new desc2 for same col1 value.

i want my output as

id	col1	col2
1	id1	desc2
5	id2	desc3
3	id1	desc2

Any efficient way to do this

SASKiwi · Posted 07-09-2012 08:43 PM

From your description it appears you want to do both updating of an existing record if already present, and appending of new records if not present.

It would help if you could change your example to showing three things: the original rows, the rows to be added by updating or appending, and the expected final rows.

Doing both updating and appending requires more resources than just appending, so clarification of your requirements would be helpful first.

Haikuo · Posted 07-09-2012 08:56 PM

Hi,

There would be many ways to do it, depending on your terms and conditions. One assumption for the following solutions is that you still have access to a separate data set containing new data.

1. if you don't care about the records order, traditional data step is sufficient to do the job:

data have;

input (id col1 col2) ($);

cards;

1 id1 desc1

5 id2 desc3

3 id1 desc2

;

data new;

input (id col1 col2) ($);

cards;

3 id1 desc2

;

proc sort data=have;

by col1;

run;

data want;

merge have(in=a) new(drop=id rename=col2=col3);

by col1;

col2=coalescec(col3,col2);

drop col3;

run;

2. if you do need to keep the order in 'have', then hash() can be handy:

data want;

if _n_=1 then do;

set new(obs=1);

dcl hash h(dataset:'new', ordered:'a',duplicate:'r');

h.definekey('col1');

h.definedata('col2');

h.definedone();

end;

set have;

rc=h.find();

drop rc;

run;

Haikuo

Tom · Posted 07-09-2012 11:08 PM

Why SQL?

Why not use a MERGE or UPDATE statement?

Ksharp · Posted 07-09-2012 11:36 PM

data old;
input (id    col1    col2) ($);
cards;
1    id1    desc1
5    id2    desc3
;
run;
 

data new;
input (id    col1    col2) ($);
cards;
3    id1    desc2
;
run;
proc sql;
update old 
 set col2=(select col2 from new where new.col1=old.col1)
  where exists(select col1 from new where new.col1=old.col1)
  ;
quit;
proc append base=old data=new force;run;

Ksharp

shivas · Posted 07-10-2012 12:10 AM

Hi,

You can try this..

data have;

input (id col1 col2) ($);

cards;

1 id1 desc1

5 id2 desc3

3 id1 desc1

;

data new;

input (id col1 col2) ($);

cards;

3 id1 desc2

;

proc sort data=have;

by id col1;

run;

data have ;

update have(in=a) new(in=b) ;

by id col1;

if a ;

run ;

or

proc sql ;

update have m

set col2 = (select col2 from new r

where m.id = r.id

and m.col1=r.col1)

where m.id in (select id from new) ;

quit ;

Thanks,

Shiva

Keith · Posted 07-10-2012 05:08 AM

If you are updating / appending a small number of records in your large dataset then using MODIFY may be the most efficient technique. And if you haven't done so already, I would recommend creating an index on COL1 in your master dataset. Below are 2 examples of updating the master dataset (indexed) using MODIFY, the 2nd example is likely to be fastest if your transaction dataset is small. There's a good paper on using MODIFY at http://www2.sas.com/proceedings/sugi31/264-31.pdf

data have (index=(col1));

input (id col1 col2) ($);

cards;

5 id2 desc3

1 id1 desc1

;

run;

data new;

input (id col1 col2) ($);

cards;

4 id3 desc4

3 id1 desc2

;

run;

/* example 1 */

data have ;

modify have new (rename=(id=id_new col2=col2_new));

by col1;

select (_iorc_);

when (%sysrc(_sok)) do;

col2=col2_new;

replace;

id=id_new;

output;

end;

when (%sysrc(_dsenmr)) do;

id=id_new;

col2=col2_new;

output;

_error_=0;

end;

otherwise end;

run ;

/* example 2 */

data have;

set new (rename=(id=id_new col2=col2_new));

modify have key=col1;

select (_iorc_);

when (%sysrc(_sok)) do;

col2=col2_new;

replace;

id=id_new;

output;

end;

when (%sysrc(_dsenom)) do;

id=id_new;

col2=col2_new;

output;

_error_=0;

end;

otherwise end;

run;

Message was edited by: Keith Timms Sorry, just realised that this method only works properly where the values for COL1 are unique in the master dataset. I'll have a look to see if I can get it working where duplicate values exist.

sql

Re: sql

Re: sql

Re: sql

Re: sql

Re: sql

Re: sql

sql

Re: sql

Re: sql

Re: sql

Re: sql

Re: sql

Re: sql

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away