BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Sathish_jammy
Lapis Lazuli | Level 10

Hi community,

 

Please help me to club the newly entered data to update in another file.

For Example;

 

data have1;
input ID date date9. var1;
format date date9.;
cards;
111 02jul2000 78
222 08aug2000 89
333 06jun2001 99
444 13dec2002 77
;

data have2;
input ID date date9. var1;
format date date9.;
cards;
111 02jul2000 78
111 18jan2001 52 /*add this line to have1*/
222 08aug2000 89
222 23feb2001 66 /*add this line to have1*/
333 06jun2001 99
;

/*OutPut need in Have1 like
111 02jul2000 78
111 18jan2001 52
222 08aug2000 89
222 23feb2001 66
333 06jun2001 99
444 13dec2002 77
*/

I tried the below given method , but I failed.

 

Proc Sql;
create table aaa as select a.Id,a.date from Have1 a inner join Have2 b on a.Id = b.Id ; insert into Have1 select * from Have2 a where a.date not in (select date from aaa) ; quit;

Please Correct my mistake on the code.. or give some better way to complete the task.

Thanks in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20
data have1;
input ID date date9. var1;
format date date9.;
cards;
111 02jul2000 78
222 08aug2000 89
333 06jun2001 99
444 13dec2002 77
;

data have2;
input ID date date9. var1;
format date date9.;
cards;
111 02jul2000 78
111 18jan2001 52 /*add this line to have1*/
222 08aug2000 89
222 23feb2001 66 /*add this line to have1*/
333 06jun2001 99
;

proc sql;
 create table want as
  select *  from have1
   union
 (select *  from have2
         except
         select * from have1)
	order by id,date;
quit;

View solution in original post

5 REPLIES 5
ScottBass
Rhodochrosite | Level 12
data have1;
input ID date date9. var1;
format date date9.;
cards;
111 02jul2000 78
222 08aug2000 89
333 06jun2001 99
444 13dec2002 77
;
run;

data have2;
input ID date date9. var1;
format date date9.;
cards;
111 02jul2000 78
111 18jan2001 52 /*add this line to have1*/
222 08aug2000 89
222 23feb2001 66 /*add this line to have1*/
333 06jun2001 99
;
run;

data want1;
   merge have1 have2;
   by id date;
run;

proc sql;
   create table want2 as
   select 
      coalesce(a.id,b.id) as id,
      coalesce(a.date,b.date) as date format=date9.,
      coalesce(a.var1,b.var1) as var1
   from have1 a
   full join have2 b
   on a.id=b.id
   and a.date=b.date;
run;

proc compare base=want1 comp=want2;
run;

The data step is the easier approach.  Both datasets must be sorted on id and date.


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
Sathish_jammy
Lapis Lazuli | Level 10

Dear @ScottBass 

 

Thanks for your suggestion.

Your code works well in the sample data, I need another Idea from you.

Could you suggest me a better step, because My datasets have more than 50million rows and hundreds of columns,

so to merge the data will take too much time to proceed further.

That's the reason I tried to get only Id and Date from datasets in proc sql.

I hope you understand. Please let me know if any details require.

 

Thank you!

 

 

 

 

ScottBass
Rhodochrosite | Level 12

Are they SAS datasets or RDBMS tables?

Can you create indexes on the datasets or tables?

Can you create the datasets in a SPDE library?

 

Try running with this:

 

data have1;
input ID date date9. var1;
format date date9.;
cards;
111 02jul2000 78
222 08aug2000 89
333 06jun2001 99
444 13dec2002 77
;
run;

data have2;
input ID date date9. var1;
format date date9.;
cards;
111 02jul2000 78
111 18jan2001 52 /*add this line to have1*/
222 08aug2000 89
222 23feb2001 66 /*add this line to have1*/
333 06jun2001 99
;
run;

proc sort data=have1;
   by descending id date;
run;

proc sort data=have2;
   by id descending date;
run;

proc datasets lib=work nolist;
   modify have1;
   create index keys=(id date) / unique;
   run;
   modify have2;
   create index keys=(id date) / unique;
   run;
quit;

data want;
   merge have1 have2;
   by id date;
run;

You still have to create the index (instead of sorting), but the I/O will be less since you're just writing out the keys and a pointer to observation number.  The merge will be slightly slower than if you had sorted the datasets, but your overall run time should be less, especially the wider your datasets.


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
Kurt_Bremser
Super User

TANSTAAFL (There Ain't No Such Thing As A Free Lunch).

When working with large datasets, processing WILL take time, period. Having your datasets pre-sorted will make processing easier, but you WILL have to set up a sort order of some kind, and you WILL have to do the merge. There's no way around that.

Note that in my experience, creating an index and using it won't improve your overall performance when you have to process the whole dataset(s) anyway. Indexes improve performance when you can use them to extract a rather small subset, but that's not what you are doing here.

 

In your case, you can achieve your intended result without a merge, but by using a simple set with by. This method is called interleaving, and you can use first. (or last.) to remove duplicates:

data have1;
input ID date date9. var1;
format date date9.;
cards;
111 02jul2000 78
222 08aug2000 89
333 06jun2001 99
444 13dec2002 77
;
run;

data have2;
input ID date date9. var1;
format date date9.;
cards;
111 02jul2000 78
111 18jan2001 52 /*add this line to have1*/
222 08aug2000 89
222 23feb2001 66 /*add this line to have1*/
333 06jun2001 99
;
run;

data want;
set
  have1
  have2
;
by id date;
if first.date;
run;

proc print data=want noobs;
run;

Result:

 ID         date    var1

111    02JUL2000     78 
111    18JAN2001     52 
222    08AUG2000     89 
222    23FEB2001     66 
333    06JUN2001     99 
444    13DEC2002     77 

Both datasets need to be sorted, of course. Any method that achieves your result will need that in some kind.

novinosrin
Tourmaline | Level 20
data have1;
input ID date date9. var1;
format date date9.;
cards;
111 02jul2000 78
222 08aug2000 89
333 06jun2001 99
444 13dec2002 77
;

data have2;
input ID date date9. var1;
format date date9.;
cards;
111 02jul2000 78
111 18jan2001 52 /*add this line to have1*/
222 08aug2000 89
222 23feb2001 66 /*add this line to have1*/
333 06jun2001 99
;

proc sql;
 create table want as
  select *  from have1
   union
 (select *  from have2
         except
         select * from have1)
	order by id,date;
quit;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 720 views
  • 0 likes
  • 4 in conversation