Solved: Re: Club the newly updated date for existing ID in another table

Sathish_jammy · Posted 03-14-2019 02:00 AM

Hi community,

Please help me to club the newly entered data to update in another file.

For Example;

data have1;
input ID date date9. var1;
format date date9.;
cards;
111 02jul2000 78
222 08aug2000 89
333 06jun2001 99
444 13dec2002 77
;

data have2;
input ID date date9. var1;
format date date9.;
cards;
111 02jul2000 78
111 18jan2001 52 /*add this line to have1*/
222 08aug2000 89
222 23feb2001 66 /*add this line to have1*/
333 06jun2001 99
;

/*OutPut need in Have1 like
111 02jul2000 78
111 18jan2001 52
222 08aug2000 89
222 23feb2001 66
333 06jun2001 99
444 13dec2002 77
*/

I tried the below given method , but I failed.

Proc Sql;
create table aaa as
select a.Id,a.date
from Have1 a inner join Have2 b
  on a.Id = b.Id
;
insert into Have1 
select * from Have2 a
where a.date not in (select date from aaa)
;
quit;

Please Correct my mistake on the code.. or give some better way to complete the task.

Thanks in advance!

novinosrin · Posted 03-14-2019 04:58 AM

data have1;
input ID date date9. var1;
format date date9.;
cards;
111 02jul2000 78
222 08aug2000 89
333 06jun2001 99
444 13dec2002 77
;

data have2;
input ID date date9. var1;
format date date9.;
cards;
111 02jul2000 78
111 18jan2001 52 /*add this line to have1*/
222 08aug2000 89
222 23feb2001 66 /*add this line to have1*/
333 06jun2001 99
;

proc sql;
 create table want as
  select *  from have1
   union
 (select *  from have2
         except
         select * from have1)
	order by id,date;
quit;

View solution in original post

ScottBass · Posted 03-14-2019 02:14 AM

data have1;
input ID date date9. var1;
format date date9.;
cards;
111 02jul2000 78
222 08aug2000 89
333 06jun2001 99
444 13dec2002 77
;
run;

data have2;
input ID date date9. var1;
format date date9.;
cards;
111 02jul2000 78
111 18jan2001 52 /*add this line to have1*/
222 08aug2000 89
222 23feb2001 66 /*add this line to have1*/
333 06jun2001 99
;
run;

data want1;
   merge have1 have2;
   by id date;
run;

proc sql;
   create table want2 as
   select 
      coalesce(a.id,b.id) as id,
      coalesce(a.date,b.date) as date format=date9.,
      coalesce(a.var1,b.var1) as var1
   from have1 a
   full join have2 b
   on a.id=b.id
   and a.date=b.date;
run;

proc compare base=want1 comp=want2;
run;

The data step is the easier approach. Both datasets must be sorted on id and date.

Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.

Sathish_jammy · Posted 03-14-2019 02:28 AM

Dear @ScottBass

Thanks for your suggestion.

Your code works well in the sample data, I need another Idea from you.

Could you suggest me a better step, because My datasets have more than 50million rows and hundreds of columns,

so to merge the data will take too much time to proceed further.

That's the reason I tried to get only Id and Date from datasets in proc sql.

I hope you understand. Please let me know if any details require.

Thank you!

ScottBass · Posted 03-14-2019 02:40 AM

Are they SAS datasets or RDBMS tables?

Can you create indexes on the datasets or tables?

Can you create the datasets in a SPDE library?

Try running with this:

data have1;
input ID date date9. var1;
format date date9.;
cards;
111 02jul2000 78
222 08aug2000 89
333 06jun2001 99
444 13dec2002 77
;
run;

data have2;
input ID date date9. var1;
format date date9.;
cards;
111 02jul2000 78
111 18jan2001 52 /*add this line to have1*/
222 08aug2000 89
222 23feb2001 66 /*add this line to have1*/
333 06jun2001 99
;
run;

proc sort data=have1;
   by descending id date;
run;

proc sort data=have2;
   by id descending date;
run;

proc datasets lib=work nolist;
   modify have1;
   create index keys=(id date) / unique;
   run;
   modify have2;
   create index keys=(id date) / unique;
   run;
quit;

data want;
   merge have1 have2;
   by id date;
run;

You still have to create the index (instead of sorting), but the I/O will be less since you're just writing out the keys and a pointer to observation number. The merge will be slightly slower than if you had sorted the datasets, but your overall run time should be less, especially the wider your datasets.

Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.

Kurt_Bremser · Posted 03-14-2019 02:58 AM

TANSTAAFL (There Ain't No Such Thing As A Free Lunch).

When working with large datasets, processing WILL take time, period. Having your datasets pre-sorted will make processing easier, but you WILL have to set up a sort order of some kind, and you WILL have to do the merge. There's no way around that.

Note that in my experience, creating an index and using it won't improve your overall performance when you have to process the whole dataset(s) anyway. Indexes improve performance when you can use them to extract a rather small subset, but that's not what you are doing here.

In your case, you can achieve your intended result without a merge, but by using a simple set with by. This method is called interleaving, and you can use first. (or last.) to remove duplicates:

data have1;
input ID date date9. var1;
format date date9.;
cards;
111 02jul2000 78
222 08aug2000 89
333 06jun2001 99
444 13dec2002 77
;
run;

data have2;
input ID date date9. var1;
format date date9.;
cards;
111 02jul2000 78
111 18jan2001 52 /*add this line to have1*/
222 08aug2000 89
222 23feb2001 66 /*add this line to have1*/
333 06jun2001 99
;
run;

data want;
set
  have1
  have2
;
by id date;
if first.date;
run;

proc print data=want noobs;
run;

Result:

 ID         date    var1

111    02JUL2000     78 
111    18JAN2001     52 
222    08AUG2000     89 
222    23FEB2001     66 
333    06JUN2001     99 
444    13DEC2002     77

Both datasets need to be sorted, of course. Any method that achieves your result will need that in some kind.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

novinosrin · Posted 03-14-2019 04:58 AM

data have1;
input ID date date9. var1;
format date date9.;
cards;
111 02jul2000 78
222 08aug2000 89
333 06jun2001 99
444 13dec2002 77
;

data have2;
input ID date date9. var1;
format date date9.;
cards;
111 02jul2000 78
111 18jan2001 52 /*add this line to have1*/
222 08aug2000 89
222 23feb2001 66 /*add this line to have1*/
333 06jun2001 99
;

proc sql;
 create table want as
  select *  from have1
   union
 (select *  from have2
         except
         select * from have1)
	order by id,date;
quit;

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!