Hi community,
Please help me to club the newly entered data to update in another file.
For Example;
data have1;
input ID date date9. var1;
format date date9.;
cards;
111 02jul2000 78
222 08aug2000 89
333 06jun2001 99
444 13dec2002 77
;
data have2;
input ID date date9. var1;
format date date9.;
cards;
111 02jul2000 78
111 18jan2001 52 /*add this line to have1*/
222 08aug2000 89
222 23feb2001 66 /*add this line to have1*/
333 06jun2001 99
;
/*OutPut need in Have1 like
111 02jul2000 78
111 18jan2001 52
222 08aug2000 89
222 23feb2001 66
333 06jun2001 99
444 13dec2002 77
*/
I tried the below given method , but I failed.
Proc Sql;
create table aaa as
select a.Id,a.date
from Have1 a inner join Have2 b
on a.Id = b.Id
;
insert into Have1
select * from Have2 a
where a.date not in (select date from aaa)
;
quit;
Please Correct my mistake on the code.. or give some better way to complete the task.
Thanks in advance!
data have1;
input ID date date9. var1;
format date date9.;
cards;
111 02jul2000 78
222 08aug2000 89
333 06jun2001 99
444 13dec2002 77
;
data have2;
input ID date date9. var1;
format date date9.;
cards;
111 02jul2000 78
111 18jan2001 52 /*add this line to have1*/
222 08aug2000 89
222 23feb2001 66 /*add this line to have1*/
333 06jun2001 99
;
proc sql;
create table want as
select * from have1
union
(select * from have2
except
select * from have1)
order by id,date;
quit;
data have1;
input ID date date9. var1;
format date date9.;
cards;
111 02jul2000 78
222 08aug2000 89
333 06jun2001 99
444 13dec2002 77
;
run;
data have2;
input ID date date9. var1;
format date date9.;
cards;
111 02jul2000 78
111 18jan2001 52 /*add this line to have1*/
222 08aug2000 89
222 23feb2001 66 /*add this line to have1*/
333 06jun2001 99
;
run;
data want1;
merge have1 have2;
by id date;
run;
proc sql;
create table want2 as
select
coalesce(a.id,b.id) as id,
coalesce(a.date,b.date) as date format=date9.,
coalesce(a.var1,b.var1) as var1
from have1 a
full join have2 b
on a.id=b.id
and a.date=b.date;
run;
proc compare base=want1 comp=want2;
run;
The data step is the easier approach. Both datasets must be sorted on id and date.
Dear @ScottBass
Thanks for your suggestion.
Your code works well in the sample data, I need another Idea from you.
Could you suggest me a better step, because My datasets have more than 50million rows and hundreds of columns,
so to merge the data will take too much time to proceed further.
That's the reason I tried to get only Id and Date from datasets in proc sql.
I hope you understand. Please let me know if any details require.
Thank you!
Are they SAS datasets or RDBMS tables?
Can you create indexes on the datasets or tables?
Can you create the datasets in a SPDE library?
Try running with this:
data have1;
input ID date date9. var1;
format date date9.;
cards;
111 02jul2000 78
222 08aug2000 89
333 06jun2001 99
444 13dec2002 77
;
run;
data have2;
input ID date date9. var1;
format date date9.;
cards;
111 02jul2000 78
111 18jan2001 52 /*add this line to have1*/
222 08aug2000 89
222 23feb2001 66 /*add this line to have1*/
333 06jun2001 99
;
run;
proc sort data=have1;
by descending id date;
run;
proc sort data=have2;
by id descending date;
run;
proc datasets lib=work nolist;
modify have1;
create index keys=(id date) / unique;
run;
modify have2;
create index keys=(id date) / unique;
run;
quit;
data want;
merge have1 have2;
by id date;
run;
You still have to create the index (instead of sorting), but the I/O will be less since you're just writing out the keys and a pointer to observation number. The merge will be slightly slower than if you had sorted the datasets, but your overall run time should be less, especially the wider your datasets.
TANSTAAFL (There Ain't No Such Thing As A Free Lunch).
When working with large datasets, processing WILL take time, period. Having your datasets pre-sorted will make processing easier, but you WILL have to set up a sort order of some kind, and you WILL have to do the merge. There's no way around that.
Note that in my experience, creating an index and using it won't improve your overall performance when you have to process the whole dataset(s) anyway. Indexes improve performance when you can use them to extract a rather small subset, but that's not what you are doing here.
In your case, you can achieve your intended result without a merge, but by using a simple set with by. This method is called interleaving, and you can use first. (or last.) to remove duplicates:
data have1;
input ID date date9. var1;
format date date9.;
cards;
111 02jul2000 78
222 08aug2000 89
333 06jun2001 99
444 13dec2002 77
;
run;
data have2;
input ID date date9. var1;
format date date9.;
cards;
111 02jul2000 78
111 18jan2001 52 /*add this line to have1*/
222 08aug2000 89
222 23feb2001 66 /*add this line to have1*/
333 06jun2001 99
;
run;
data want;
set
have1
have2
;
by id date;
if first.date;
run;
proc print data=want noobs;
run;
Result:
ID date var1 111 02JUL2000 78 111 18JAN2001 52 222 08AUG2000 89 222 23FEB2001 66 333 06JUN2001 99 444 13DEC2002 77
Both datasets need to be sorted, of course. Any method that achieves your result will need that in some kind.
data have1;
input ID date date9. var1;
format date date9.;
cards;
111 02jul2000 78
222 08aug2000 89
333 06jun2001 99
444 13dec2002 77
;
data have2;
input ID date date9. var1;
format date date9.;
cards;
111 02jul2000 78
111 18jan2001 52 /*add this line to have1*/
222 08aug2000 89
222 23feb2001 66 /*add this line to have1*/
333 06jun2001 99
;
proc sql;
create table want as
select * from have1
union
(select * from have2
except
select * from have1)
order by id,date;
quit;
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.