BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
nickspencer
Obsidian | Level 7
Hi all,

I have two datasets with transaction data. I want to select the transactions present in first dataset but not in the second one by month and year.

Dataset1:

acct_id date
1234 12dec2019
2345 12dec2019
3456 12dec2019
4467 12dec2019


dataset2:

Acct_id date
1234 01dec2019
2345 01dec2019
3456 21nov2019
4467 21nov2019

In the above datasets I want to remove acct ids 1234 and 2345 from dataset1 (and create a new dataset) since they are already present in dataset2 to for the same month and year. But want to keep 3456 and 4467 from dataset1 since they were for the month of November in dataset 2. There are number of other variables in both dataset but I want to compare the accounts and month year only and create a new dataset from dataset1 based on dataset 2.

What is the best way to achieve that ? Any suggestion is highly appreciated .

Thanks!!
1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Hi @nickspencer  It's fun in Proc SQL

data one;
input acct_id date :date9.;
format date date9.;
cards;
1234 12dec2019
2345 12dec2019
3456 12dec2019
4467 12dec2019
;


data two;
input acct_id date :date9.;
format date date9.;
cards;
1234 01dec2019
2345 01dec2019
3456 21nov2019
4467 21nov2019
;
proc sql;
create table want as
select a.*
from one a left join two b
on a.acct_id=b.acct_id and put(a.date,monyy7. -l)=put(b.date,monyy7. -l)
where put(a.date,monyy7. -l) ne put(b.date,monyy7. -l);
quit;

Actually better with INNER JOIN. Oops So sorry

proc sql;
create table want as
select a.*
from one a inner join two b
on a.acct_id=b.acct_id and put(a.date,monyy7. -l) ne put(b.date,monyy7. -l);
quit;

View solution in original post

6 REPLIES 6
novinosrin
Tourmaline | Level 20

Hi @nickspencer  It's fun in Proc SQL

data one;
input acct_id date :date9.;
format date date9.;
cards;
1234 12dec2019
2345 12dec2019
3456 12dec2019
4467 12dec2019
;


data two;
input acct_id date :date9.;
format date date9.;
cards;
1234 01dec2019
2345 01dec2019
3456 21nov2019
4467 21nov2019
;
proc sql;
create table want as
select a.*
from one a left join two b
on a.acct_id=b.acct_id and put(a.date,monyy7. -l)=put(b.date,monyy7. -l)
where put(a.date,monyy7. -l) ne put(b.date,monyy7. -l);
quit;

Actually better with INNER JOIN. Oops So sorry

proc sql;
create table want as
select a.*
from one a inner join two b
on a.acct_id=b.acct_id and put(a.date,monyy7. -l) ne put(b.date,monyy7. -l);
quit;
novinosrin
Tourmaline | Level 20

data want ;
 if _n_=1 then do;
   dcl hash H () ;
   h.definekey  ("acct_id","d") ;
   h.definedone () ;
   do until(z);
    set two end=z;
	d=put(date,monyy7. -l);
	h.ref();
   end;
 end;
 set one;
 if h.check(key:acct_id,key:put(date,monyy7. -l)) ne 0;
 drop d;
run;
nickspencer
Obsidian | Level 7
@nonivosrin This is perfect. But I want to include the accounts from dataset 1 which is not present in the dataset2 for the month. Will the inner join stilll work if it is present in dataset1 but not in dataset2 but want to include in the table want ?
nickspencer
Obsidian | Level 7
@novinosrin This is perfect. But I want to include the accounts from dataset 1 which is not present in the dataset2 for the month. Will the inner join stilll work if it is present in dataset1 but not in dataset2 but want to include in the table want ?
novinosrin
Tourmaline | Level 20

Thank you @nickspencer  for clarifying. Please ignore the INNER JOIN and stick to the LEFT JOIN, the 1st one. I'm glad my initial thought was right. Have a good one!

mkeintz
PROC Star

Assuming ONE and TWO are sorted by ID/DATE:

 

data one;
input acct_id date :date9.;
format date date9.;
cards;
1234 12dec2019
2345 12dec2019
3456 12dec2019
4467 12dec2019
;


data two;
input acct_id date :date9.;
format date date9.;
cards;
1234 01dec2019
2345 01dec2019
3456 21nov2019
4467 21nov2019
;


data want;
  set two (in=in2) one ;
  by acct_id;

  array _cal {2015:2020,12} _temporary_;
  if first.acct_id then call missing(of _cal{*});
  if in2 then _cal{year(date),month(date)}=1;
  else if _cal{year(date),month(date)}^=1 then output;
run;
  1. Just make sure the _CAL matrix has upper and lower bounds to cover the time span in your data set.
  2. The program reads all the cases for a given ID in data set TWO, and sets the matrix accordingly.  Then it reads all the cases for the same ID in data set ONE, and examines the matrix to determine whether to output.
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 979 views
  • 0 likes
  • 3 in conversation