Comparing Month Over Month Datasets

Accepted Solution Solved
Reply
Contributor
Posts: 59
Accepted Solution

Comparing Month Over Month Datasets

Hi all

 

Each month I run a running report from Jan 1. Each month new items are added to the sets of data. I would like to be able to write code that will automatically populate a column in the new table with "Y" or "N" which indicates whether or not a line is has already been recognized in the old table. I do this manually each month and it would be time consuming if I were able to do this. I am attaching examples for review (I cant seem to cut and paste w/o losing my format):

 




Accepted Solutions
Solution
‎05-31-2018 02:42 PM
PROC Star
Posts: 1,838

Re: Comparing Month Over Month Datasets

[ Edited ]
Posted in reply to wheddingsjr

@wheddingsjr Ok That makes me feel bad. One last try please at least for the sake of my satisfaction:-

 

 

proc sql;
 create table want as
 select *, 'Y' as retract from
   (select *  from new
         intersect
         select * from old)
   union
select *,'N' as retract from (select *  from new
         except
         select * from old)
order by memno, begdate;
quit;

 

View solution in original post


All Replies
Contributor
Posts: 59

Re: Comparing Month Over Month Datasets

Posted in reply to wheddingsjr
BTW, the tab labeled "OLD" is the tab to look at. Ignore tab labeled 'NEW" as it was included in the "OLD" tab. Thx
PROC Star
Posts: 1,838

Re: Comparing Month Over Month Datasets

[ Edited ]
Posted in reply to wheddingsjr

something like this?

data want;
if _n_=1 then do;
if 0 then set old;
   dcl hash H (dataset:'old',multidata:'y') ;
   h.definekey  ('MEMNO','BEGDATE','COMBO');
   h.definedone () ;
end;
set new;
if h.check()=0 then new_column='Y';
else new_column='N';
run;

It appears you could even add 

COUNTUNIQDOS

to the key variable list in the hash object and use the direct check method. Play around and let us know  

Contributor
Posts: 59

Re: Comparing Month Over Month Datasets

Posted in reply to novinosrin

Hi Novinosrin and thanks for your help. Sorry for the delayed response but it was a long weekend here where I am. I brought in the old data and ran it against the new data using the code you provided. It creates and populates a column in my results but its putting a "N" in every row even though I know there are some that should have a "Y". I am not very versed with complex code as I basically just run a standard and saved script so some of the code you use I don't really understand.

 

I used this:

data UrineAnalysis1;
if _n_=1 then do;
if 0 then set Apr2018_TEST; (The old data I imported in)
dcl hash H (dataset:'Apr2018_TEST',multidata:'N') ;
h.definekey ('COMBO');
h.definedone () ;
end;
set UrineAnalysis; (the new data created)
if h.check()=0 then RETRACT='N';
else RETRACT='Y';
run;

PROC Star
Posts: 1,838

Re: Comparing Month Over Month Datasets

Posted in reply to wheddingsjr

Can you provide us with a more and better representative sample of your real plz?

Contributor
Posts: 59

Re: Comparing Month Over Month Datasets

Posted in reply to novinosrin

Here you go.

PROC Star
Posts: 1,838

Re: Comparing Month Over Month Datasets

Posted in reply to wheddingsjr

Please test and let us know. Thank you

data old;
input MEMNO 	COUNT	UNIQDOS	BEGDATE :mmddyy10.	COMBO ;
cards;
12345	1	1	01/03/18	1234543103
12345	.	0	01/03/18	1234543103
12345	2	1	01/05/18	1234543105
12345	3	1	01/07/18	1234543107
12345	4	1	01/09/18	1234543109
12345	5	1	01/10/18	1234543110
12345	.	0	01/10/18	1234543110
12345	6	1	01/12/18	1234543112
12345	7	1	01/13/18	1234543113
12345	8	1	01/14/18	1234543114
12345	9	1	01/16/18	1234543116
12345	10	1	01/17/18	1234543117
12345	11	1	01/19/18	1234543119
12345	.	0	01/19/18	1234543119
12345	12	1	01/21/18	1234543121
12345	13	1	01/23/18	1234543123
12345	14	1	01/24/18	1234543124
12345	15	1	01/26/18	1234543126
12345	16	1	01/27/18	1234543127
12345	17	1	01/28/18	1234543128
12345	18	1	01/30/18	1234543130
12345	19	1	01/31/18	1234543131
12345	20	1	02/02/18	1234543133
12345	.	0	02/02/18	1234543133
12345	21	1	02/04/18	1234543135
;


data new;
input MEMNO	RETRACT $	COUNT	UNIQDOS	BEGDATE  :mmddyy10.	COMBO;
drop retract;
cards;
12345	Y	1	1	01/03/18	1234543103
12345	Y	.	0	01/03/18	1234543103
12345	Y	2	1	01/05/18	1234543105
12345	Y	3	1	01/07/18	1234543107
12345	Y	4	1	01/09/18	1234543109
12345	Y	5	1	01/10/18	1234543110
12345	Y	.	0	01/10/18	1234543110
12345	Y	6	1	01/12/18	1234543112
12345	Y	7	1	01/13/18	1234543113
12345	Y	8	1	01/14/18	1234543114
12345	Y	9	1	01/16/18	1234543116
12345	Y	10	1	01/17/18	1234543117
12345	Y	11	1	01/19/18	1234543119
12345	Y	.	0	01/19/18	1234543119
12345	Y	12	1	01/21/18	1234543121
12345	Y	13	1	01/23/18	1234543123
12345	Y	14	1	01/24/18	1234543124
12345	Y	15	1	01/26/18	1234543126
12345	Y	16	1	01/27/18	1234543127
12345	Y	17	1	01/28/18	1234543128
12345	Y	18	1	01/30/18	1234543130
12345	Y	19	1	01/31/18	1234543131
12345	Y	20	1	02/02/18	1234543133
12345	Y	.	0	02/02/18	1234543133
12345	Y	21	1	02/04/18	1234543135
12345	N	22	1	02/06/18	1234543137
12345	N	23	1	02/07/18	1234543138
12345	N	24	1	02/09/18	1234543140
12345	N	.	0	02/09/18	1234543140
12345	N	25	1	02/11/18	1234543142
12345	N	26	1	02/13/18	1234543144
12345	N	27	1	02/16/18	1234543147
12345	N	.	0	02/16/18	1234543147
12345	N	28	1	02/18/18	1234543149
12345	N	29	1	02/21/18	1234543152
12345	N	.	0	02/21/18	1234543152
;

data want;
if _n_=1 then do;
if 0 then set old;
   dcl hash H (dataset:'old') ;
   h.definekey  ('MEMNO','count','UNIQDOS','BEGDATE','COMBO');
   h.definedone () ;
end;
set new;
if h.check()=0 then retract='Y';
else retract='N';
run;
Contributor
Posts: 59

Re: Comparing Month Over Month Datasets

Posted in reply to novinosrin

Hi again

 

I tested this and it isnt working. Its ok though. I automated a pretty good amount of the process (took it from a 3-4 hour process down to 20 mins.) So manually going through the monthly reports for dupes wont be that bad, I have done it that way for 2 years now so its no big deal. But thanks for all of your help.

Solution
‎05-31-2018 02:42 PM
PROC Star
Posts: 1,838

Re: Comparing Month Over Month Datasets

[ Edited ]
Posted in reply to wheddingsjr

@wheddingsjr Ok That makes me feel bad. One last try please at least for the sake of my satisfaction:-

 

 

proc sql;
 create table want as
 select *, 'Y' as retract from
   (select *  from new
         intersect
         select * from old)
   union
select *,'N' as retract from (select *  from new
         except
         select * from old)
order by memno, begdate;
quit;

 

Contributor
Posts: 59

Re: Comparing Month Over Month Datasets

Posted in reply to novinosrin

Novinosrin

 

It looked as if it wanted to work..lol...but I got a ton of either

 

         ERROR: Column 20 from the first contributor of INTERSECT is not the same type as its counterpart from the second.

or
        ERROR: Numeric expression requires a numeric format.

PROC Star
Posts: 1,838

Re: Comparing Month Over Month Datasets

Posted in reply to wheddingsjr

@wheddingsjr Ok, Now it's time for us to know the data and metadata(data about data) . Can you find out 

 

1. do both the old and new tables have the same variables and types(char and numeric)

2. you could run proc contents to find that out

 

Contributor
Posts: 59

Re: Comparing Month Over Month Datasets

Posted in reply to novinosrin

Novinosrins

 

It was mentioned to me that instead of importing the previous months excel report, that I just run that report in SAS so I dont have to import. I did that and now I have just a few lines that give the error previously mentioned and i think I know why. In the middle stages of my code I add a few columns to my final table. I think I need to do those steps AFTER I do the steps that you recommended. I am going to give that a shot and let you know how that works out.

 

Thanks again

PROC Star
Posts: 1,838

Re: Comparing Month Over Month Datasets

Posted in reply to wheddingsjr

@wheddingsjr Ok, some progress after all.

 

I reiterate the logic in both approaches:

 

1.Hash  : takes in all vars as key i.e each record makes in unique

2. sql : set operators  same as above

 

In essence we compare record to record

hash.check=0 finds it  as the same as intersect when we choose all vars as key

 

Therefore,if one var is not the same (any additional), both approaches will fail. 

Contributor
Posts: 59

Re: Comparing Month Over Month Datasets

Posted in reply to novinosrin

@novinosrinAll I have to say is that you are a genius. Everything worked out absolutely perfect. Because I just wrote the previous months data into my code and moved your union, intersect, except code further up in the script, I was able to include my add-on columns later in the process. I manually did this at the beginning of the month and just compared it with the new automated process and they are identical. Thank you very very much for your patience and your desire to help me. It is much appreciated.

Contributor
Posts: 59

Re: Comparing Month Over Month Datasets

Posted in reply to wheddingsjr
What took me 4 hours to do a few weeks ago was just reduced to 20 minutes. That's amazing!!!
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 23 replies
  • 289 views
  • 4 likes
  • 3 in conversation