BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
help09
Fluorite | Level 6

Hi All

i have an issue in which i need to compare two datasets and need to capture the difference in observations.

 
DATASET1

  ID   DEP  DATE      TYPE
1234 EIE   12102010   D
6789 EEE  12102010  D


DATASET2

ID      DEP   DATE      TYPE    Description
1234  EEE  12102010   D         "DEP IS NOT MATCHING  has EIE".
6789  EEE   13102010  D         "DATE NOT MATCHING has 12102010"

In DATASET2
first observation with ID number 1234 is matching with observations in DATASET1, then need to move on to DEP, compare the values in DEP of DATASET2 and DATASET1
As DATASET2 has EEE DEP and it is not matching with DATASET1 then in NEW field named as DESCRIPTION should be populated with "DEP IS NOT MATCHING  D1 has EIE".

One More Example.

In ID 6789 DATE is not MATCHING so it should be populated as ""DATE NOT MATCHING has 12102010"

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

OK.

data d1;
input ID     TYPE $     DEP $ DATE :date9.;
format date date9.;
cards;
1234     A     CIV     22Jul2011
1234     A     MIC    09Jul2011
4321     A     CSE     05Aug2011
6874     A     ECE     05Jul2011
9848     M     ECE     15Jul2011
8977     A     ECE     22Jul2011
8977     P     EIE     27Jul2011
8956     A     EIE     28Sep2011
5656     A     CSE     16Sep2011
3265     A     CSE     22Sep2011
3265     B     CSE     22Sep2011
3265     C     CSE     22Sep2011
4456     A     SDC     27Sep2011
6546     A     MPC     27Jul2011
;
run;

data d2;
input _ID     _TYPE $     _DEP     $ _DATE :date9.;
obs+1;
format _date date9.;
cards;
1234     A     EEE     22Jul2011
1234     A     EIE     09Aug2011
5678     A     EIE     02Sep2011
9876     A     EIE     01Sep2011
4321     A     EIE     16Sep2011
4321     C     CSE     10Aug2011
6874     A     ECE     04Jul2011
9848     P     ECE     01Aug2011
9848     .     ECE     14Jul2011
9703     P     ECE     25Jul2011
8977     A     ECE     23Jun2011
8977     C     CSE     04Jun2011
8977     P     ECE     27Jul2011
8790     P     ECE     29Jul2011
7864     A     MEC     05Sep2011
7864     P     MEC     16Sep2011
7864     P     ECE     14Jul2011
;
run;
data want(drop=_:  obs );
 if _n_ eq 1 then do;
  declare hash ha(hashexp:20,ordered:'Y');
  declare hiter hi('ha');
   ha.definekey('obs');
   ha.definedata('obs','_id','_TYPE','_DEP',' _DATE');
   ha.definedone();
  do until(last);
   set d2 end=last;
   ha.add();
  end;
 end;
set d1;
length  description $ 100;
description='Missing';
do while(hi.next()=0);
if id=_id then do;
    if type ne _type and dep eq _dep and date eq _date then do; description=catx(' ','TYPE not matching has',_type);leave;end;
     else if dep ne _dep and type eq _type and date eq _date then do; description=catx(' ','DEP not matching has',_dep);leave;end;
      else if date ne _date and dep eq _dep and type eq _type then do; description=catx(' ','DATE not matching has',put(_date,date9.));leave;end;
    end;
end;
run;


Ksharp

View solution in original post

16 REPLIES 16
art297
Opal | Level 21

Why rewrite proc compare?  Other than the specific form of your output, that is what the proc was made to do.

proc compare base=dataset2

             compare=dataset1;

run;

help09
Fluorite | Level 6

Hi art297

Thanks for your reply

i did try using proc compare. but i want to add new variable DESCRIPTION  in Dataset2 which should populate as per the example shown above.

Ksharp
Super User

you can use out= option or ODS Tables Produced by the COMPARE Procedure to get it.

Ksharp

help09
Fluorite | Level 6

I Think I can give the question in detail

Hi All

i have an issue in which i need to compare two datasets and need to capture the difference in observations.

 
DATASET1

  ID   DEP  DATE      TYPE
1234 EIE   12102010   D
6789 EEE  12102010  D


DATASET2

ID      DEP   DATE      TYPE   

1234  EEE  12102010   D        

6789  EEE   13102010  D        

In DATASET2
first observation with ID number 1234 is matching with observations in DATASET1, then need to move on to DEP, compare the values in DEP of DATASET2 and DATASET1
As DATASET2 has EEE DEP and it is not matching with DATASET1 then in NEW field named as DESCRIPTION should be populated with "DEP IS NOT MATCHING  DATASET1 has EIE".

One More Example.

In ID 6789 DATE is not MATCHING so it should be populated as ""DATE NOT MATCHING DATASET1 has 12102010"

FINAL EXPECTED OUTPUT IN DATASET3

DATASET3

ID      DEP   DATE      TYPE    Description
1234  EEE  12102010   D         "DEP IS NOT MATCHING  has EIE".
6789  EEE   13102010  D         "DATE NOT MATCHING has 12102010"

Thanks

Ksharp
Super User

I think proc compare is a best choice.

But you need to tailor its output .

data d1;
input (ID   DEP  DATE      TYPE ) (:$10.);
cards;
1234 EIE   12102010   D
6789 EEE  12102010  D
;
run;

data d2;
input (ID   DEP  DATE      TYPE ) (:$10.);
obs+1;
cards;
1234  EEE  12102010   A        
6789  EEE   13102010  D  
;
run;
ods output comparedetails=dif(keep=batch);
proc compare base=d2 compare=d1  brief trans ;
run;
data temp(drop=batch);
 set dif(firstobs=6);
 length des $ 200;
 retain obs des;
 if left(batch) eq: '_OBS_' then do;
                                   obs=input(scan(batch,-1,,'kd'),best8.);
                                   delete;
                                 end;
  else if left(batch) eq: 'Variable' then delete;
   else if not missing(batch) then do;
    des=catx(' ',scan(batch,1),'not matching has',scan(batch,-1));
    output;
   end;
run;
data want(drop=des);
 set temp;
 by obs;
 length description $ 2000; 
 retain description;
 if first.obs then call missing(description);
 description=catx('.',description,des);
 if last.obs then output;
run;
data want;
 merge d2 want;
 by obs;
 drop obs;
run;

Ksharp

help09
Fluorite | Level 6

Hi KSharp

Thanks for your response

i have getting following error

data temp(drop=batch);

set dif(firstobs=6);

length des $ 200;

retain obs des;

if left(batch) eq: '_OBS_' then do;

obs=input(scan(batch,-1,,'kd'),best8.);

ERROR 159-185: Null parameters for SCAN are invalid.

delete;

end;

else if left(batch) eq: 'Variable' then delete;

else if not missing(batch) then do;

des=catx(' ',scan(batch,1),'not matching has',scan(batch,-1));

output;

end;

run;

Ksharp
Super User

Are you using SAS 9.1?

data d1;
input ID   DEP $ DATE      TYPE  $;
cards;
1234 EIE   12102010   D
6789 EEE  12102010  D
;
run;

data d2;
input ID   DEP $ DATE      TYPE  $;
obs+1;
cards;
1234  EEE  12102010   A        
6789  EEE   13102010  D  
;
run;
ods output comparedetails=dif(keep=batch);
proc compare base=d2 compare=d1  brief trans ;
run; 
data temp(drop=batch);
 set dif(firstobs=6);
 length des $ 200;
 retain obs des;
 if left(batch) eq: '_OBS_' then do;
                                   obs=input(scan(batch,-1,':= '),best8.);
                                   delete;
                                 end;
  else if left(batch) eq: 'Variable' then delete;
   else if not missing(batch) then do;
    des=catx(' ',scan(batch,1),'not matching has',scan(batch,3));
    output;
   end;
run;
data want(drop=des);
 set temp;
 by obs;
 length description $ 2000; 
 retain description;
 if first.obs then call missing(description);
 description=catx('.',description,des);
 if last.obs then output;
run;
data want;
 merge d2 want;
 by obs;
 drop obs;
run;


Ksharp

Message was edited by: xia keshan

help09
Fluorite | Level 6

Hello Ksharp

Yes i am using SAS 9.1

Answer is help full for me but for the same situation if we have multiple records

eg:

DATASET 1

IDTYPEDEPDATE
1234ACIV22Jul2011
4321ACSE05Aug2011
6874AECE05Jul2011
9848MECE15Jul2011
8977AECE22Jul2011
8956AEIE28Sep2011
5656ACSE16Sep2011
3265ACSE22Sep2011
4456ASDC27Sep2011
6546AMPC27Jul2011
6546ACEC08Sep2011

DATASET 2

IDTYPEDEPDATE
1234AEEE22Jul2011
1234AEIE09Aug2011
5678AEIE02Sep2011
9876AEIE01Sep2011
4321AEIE16Sep2011
4321CCSE10Aug2011
6874AECE04Jul2011
9848PECE01Aug2011
9848ECE14Jul2011
9703PECE25Jul2011
8977AECE23Jun2011
8977PECE27Jul2011
8790PECE29Jul2011
7864AMEC05Sep2011
7864PMEC16Sep2011
7864PECE14Jul2011

EXPECTED OUTPUT

IDTYPEDEPDATEDescription
1234AEEE22Jul2011DEP not matching has CIV
1234AEIE09Aug2011DEP and DATE not matching has CIV, 22Jul2011
5678AEIE02Sep2011Missing
9876AEIE01Sep2011Missing
4321AEIE16Sep2011Dep and DATE not matching has CSE 05Aug2011
4321CCSE10Aug2011Type Not Matching Missing
6874AECE04Jul2011Date Not matching has 05Jul2011
9848PECE01Aug2011Type Not Matching Missing
9848ECE14Jul2011Type Not Matching Missing
9703PECE25Jul2011Missing
8977AECE23Jun2011Date Not matching has 22Jul2011
8977PECE27Jul2011Type Not Matching Missing
8790PECE29Jul2011Missing
7864AMEC05Sep2011Missing
7864PMEC16Sep2011Missing
7864PECE14Jul2011Missing

Thanks for your help

Ksharp
Super User

Did you want to compare these two datasets based on id variable?

I assume id variable in d1 has unique value(i.e. no duplicated value).

data d1;
input ID     _TYPE $     _DEP $ _DATE :date9.;
format _date date9.;
cards;
1234     A     CIV     22Jul2011
4321     A     CSE     05Aug2011
6874     A     ECE     05Jul2011
9848     M     ECE     15Jul2011
8977     A     ECE     22Jul2011
8956     A     EIE     28Sep2011
5656     A     CSE     16Sep2011
3265     A     CSE     22Sep2011
4456     A     SDC     27Sep2011
6546     A     MPC     27Jul2011
;
run;

data d2;
input ID     TYPE $     DEP     $ DATE :date9.;
format date date9.;
cards;
1234     A     EEE     22Jul2011
1234     A     EIE     09Aug2011
5678     A     EIE     02Sep2011
9876     A     EIE     01Sep2011
4321     A     EIE     16Sep2011
4321     C     CSE     10Aug2011
6874     A     ECE     04Jul2011
9848     P     ECE     01Aug2011
9848     .     ECE     14Jul2011
9703     P     ECE     25Jul2011
8977     A     ECE     23Jun2011
8977     P     ECE     27Jul2011
8790     P     ECE     29Jul2011
7864     A     MEC     05Sep2011
7864     P     MEC     16Sep2011
7864     P     ECE     14Jul2011
;
run;
data want(drop=_: name value rc);
 if _n_ eq 1 then do;
  declare hash ha(hashexp:16,ordered:'y');
   ha.definekey('id');
   ha.definedata('_TYPE','_DEP',' _DATE');
   ha.definedone();
  do until(last);
   set d1 end=last;
   ha.add();
  end;
 end;
set d2;
length name value description $ 100;
rc=ha.find();
if rc=0 then do;
    if type ne _type then do; name=catx(' ',name,'TYPE');value=catx(' ',value,_type);end;
    if dep ne _dep then do; name=catx(' ',name,'DEP');value=catx(' ',value,_dep);end;
    if date ne _date then do; name=catx(' ',name,'DATE');value=catx(' ',value,put(_date,date9.));end;
    if not missing(name) then description=catx(' ',name,'not matching has',value);
        else description='All Matched';
 end;
 else description='Missing';
run;

 

Ksharp

消息编辑者为:xia keshan

消息编辑者为:xia keshan Optimize it!

help09
Fluorite | Level 6

Hi Ksharp

I am sorry to give you uniq data.

But D1 has multiple id's say as shown in example

data d1;

input ID     _TYPE $     _DEP $ _DATE :date9.;

format _date date9.;

cards;

1234     A     CIV     22Jul2011

1234     A     MIC    09Jul2011

4321     A     CSE     05Aug2011

6874     A     ECE     05Jul2011

9848     M     ECE     15Jul2011

8977     A     ECE     22Jul2011

8977     A     ECE     21Jul2011

8956     A     EIE     28Sep2011

5656     A     CSE     16Sep2011

3265     A     CSE     22Sep2011

3265     B     CSE     22Sep2011

3265     C     CSE     22Sep2011

4456     A     SDC     27Sep2011

6546     A     MPC     27Jul2011

Ksharp
Super User

OK.

data d1;
input _ID     _TYPE $     _DEP $ _DATE :date9.;
obs+1;
format _date date9.;
cards;
1234     A     CIV     22Jul2011
1234     A     MIC    09Jul2011
4321     A     CSE     05Aug2011
6874     A     ECE     05Jul2011
9848     M     ECE     15Jul2011
8977     A     ECE     22Jul2011
8977     A     ECE     21Jul2011
8956     A     EIE     28Sep2011
5656     A     CSE     16Sep2011
3265     A     CSE     22Sep2011
3265     B     CSE     22Sep2011
3265     C     CSE     22Sep2011
4456     A     SDC     27Sep2011
6546     A     MPC     27Jul2011
;
run;

data d2;
input ID     TYPE $     DEP     $ DATE :date9.;
format date date9.;
cards;
1234     A     EEE     22Jul2011
1234     A     EIE     09Aug2011
5678     A     EIE     02Sep2011
9876     A     EIE     01Sep2011
4321     A     EIE     16Sep2011
4321     C     CSE     10Aug2011
6874     A     ECE     04Jul2011
9848     P     ECE     01Aug2011
9848     .     ECE     14Jul2011
9703     P     ECE     25Jul2011
8977     A     ECE     23Jun2011
8977     P     ECE     27Jul2011
8790     P     ECE     29Jul2011
7864     A     MEC     05Sep2011
7864     P     MEC     16Sep2011
7864     P     ECE     14Jul2011
;
run;
data want(drop=_: name value rc obs k found rx);
 if _n_ eq 1 then do;
  declare hash ha(hashexp:20,ordered:'Y');
  declare hiter hi('ha');
   ha.definekey('obs');
   ha.definedata('obs','_id','_TYPE','_DEP',' _DATE');
   ha.definedone();
  do until(last);
   set d1 end=last;
   ha.add();
  end;
 end;
set d2;
length name value description $ 100;
rc=hi.first();
do while(rc=0);
if id=_id then do;
    if type ne _type then do; name=catx(' ',name,'TYPE');value=catx(' ',value,_type);end;
    if dep ne _dep then do; name=catx(' ',name,'DEP');value=catx(' ',value,_dep);end;
    if date ne _date then do; name=catx(' ',name,'DATE');value=catx(' ',value,put(_date,date9.));end;
    if not missing(name) then description=catx(' ',name,'not matching has',value);
        else description='All Matched';
    k=obs;found=1;
 end;
 else description='Missing';
 rc=hi.next(); 
 if found then do;rx=ha.remove(key:k); leave;end;
end;
run;

Ksharp

help09
Fluorite | Level 6

Hello Ksharp

I am extremely thank full for you time and helping out to solve the situation.

D1

1234     A     CIV     22Jul2011    
Dep Not Matching has EEE
1234     A     MIC    09Jul2011     Dep Not Matching has EIE
4321     A     CSE     05Aug2011  Date Not Matching has 16Sep2011
6874     A     ECE     05Jul2011    Type Not Matching has C
9848     M     ECE     15Jul2011Missing
8977     A     ECE     22Jul2011Date Not Matching has 23Jun2011
8977     P    EIE    27Jul2011Dep Not Matching has EIE
8956     A     EIE     28Sep2011Missing
5656     A     CSE     16Sep2011Missing
3265     A     CSE     22Sep2011Missing

D2

1234     A     EEE     22Jul2011
1234     A     EIE     09Aug2011
5678     A     EIE     02Sep2011
9876     A     EIE     01Sep2011
4321     A    CSE    16Sep2011
4321     C     CSE   10Aug2011
6874    C    ECE     05Jul2011
9848     P     ECE     01Aug2011
9848     .     ECE     14Jul2011
9703     P     ECE     25Jul2011
8977     A     ECE     23Jun2011
8977     C    CSE     04Jun2011
8977     P     ECE     27Jul2011
8790     P     ECE     29Jul2011
7864     A     MEC     05Sep2011
7864     P     MEC     16Sep2011
7864     P     ECE     14Jul2011

in the above situation

for ID number 1234  there are two observation in D1 and D2 .

I want the following conditions to be satisfied.

1. ID=_ID and Type=_Type and Date= _Date and Dep ^= _Dep Then Description = DEP not matching has EEE ( Same for the second observation DEP not matching has EIE)

D1

1234     A     CIV     22Jul2011     Dep Not Matching has EEE

1234     A     MIC    09Jul2011      Dep Not Matching has EIE

D2

1234     A     EEE     22Jul2011
1234     A     EIE     09Aug2011

for ID 4321 has one observation in D1 and two observations in D2 

2.ID=_ID and Type=_Type and Date^= _Date and Dep = _Dep then Description = Date not matching has 16Sep2011 and for second observation in D2 with same ID second record and should not over write the first record as we have only 1 record in D1.

D1

4321     A     CSE     05Aug2011  Date Not Matching has 16Sep2011

D2

4321     A    CSE    16Sep2011
4321     C     CSE   10Aug2011

for ID 6874 has one observation in D1 and one in D2.

3.ID=_ID and Type ^=_Type and Date= _Date and Dep = _Dep then Description = Type not matching has A.

D1

6874     A     ECE     05Jul2011    Type Not Matching has C

D2

6874    C    ECE     05Jul2011

One more Example with ID

D1

8977     A     ECE     22Jul2011Date Not Matching has 23Jun2011
8977     P     EIE       27Jul2011Dep Not Matching has EIE

D2

8977     A     ECE     23Jun2011
8977     C    CSE      04Jun2011
8977     P     ECE     27Jul2011

All others Missing

Thank You Very for your Help in Advance

Ksharp
Super User

I am confused. Is it right?

D1

8977     A     ECE     22Jul2011Date Not Matching has 23Jun2011
8977     P     EIE       27Jul2011Dep Not Matching has EIE

D2

8977     A     ECE     23Jun2011
8977     C    CSE      04Jun2011
8977     P     ECE     27Jul2011

If your data looks like : ,  What you got to do?

D1

8977     A     ECE     22Jul2011
8977     P     EIE       27Jul2011

D2

8977     A     ECE     23Jun2011
8977     C    EIE      27Jun2011
8977     P     ECE     30Jul2011
help09
Fluorite | Level 6

I am sorry my mistake for first question it should be ECE

2nd question

D1

8977     A     ECE     22Jul2011
8977     P     EIE       27Jul2011

D2

8977     A     ECE     23Jun2011
8977     C    EIE      27Jun2011
8977     P     ECE     30Jul2011

Instance 1:

1st observation in D1 dataset will check with all the records in D2.

So when ever 3 conditions are satisfied out of 4

.ID=_ID and Type =_Type and Date ^= _Date and Dep = _Dep

So 1st observation in D1 is matching with 1st observation in D2 except date so will register in D1 as date not matching

Instance 2:

Now 2nd Observation in D1 will Check again  with 3 observations in D2

ID=_ID and Type =_Type and Date = _Date and Dep ^= _Dep

So will register in D1 description as Dep not matching.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 2067 views
  • 0 likes
  • 3 in conversation