Hi All
i have an issue in which i need to compare two datasets and need to capture the difference in observations.
DATASET1
ID DEP DATE TYPE
1234 EIE 12102010 D
6789 EEE 12102010 D
DATASET2
ID DEP DATE TYPE Description
1234 EEE 12102010 D "DEP IS NOT MATCHING has EIE".
6789 EEE 13102010 D "DATE NOT MATCHING has 12102010"
In DATASET2
first observation with ID number 1234 is matching with observations in DATASET1, then need to move on to DEP, compare the values in DEP of DATASET2 and DATASET1
As DATASET2 has EEE DEP and it is not matching with DATASET1 then in NEW field named as DESCRIPTION should be populated with "DEP IS NOT MATCHING D1 has EIE".
One More Example.
In ID 6789 DATE is not MATCHING so it should be populated as ""DATE NOT MATCHING has 12102010"
OK.
data d1; input ID TYPE $ DEP $ DATE :date9.; format date date9.; cards; 1234 A CIV 22Jul2011 1234 A MIC 09Jul2011 4321 A CSE 05Aug2011 6874 A ECE 05Jul2011 9848 M ECE 15Jul2011 8977 A ECE 22Jul2011 8977 P EIE 27Jul2011 8956 A EIE 28Sep2011 5656 A CSE 16Sep2011 3265 A CSE 22Sep2011 3265 B CSE 22Sep2011 3265 C CSE 22Sep2011 4456 A SDC 27Sep2011 6546 A MPC 27Jul2011 ; run; data d2; input _ID _TYPE $ _DEP $ _DATE :date9.; obs+1; format _date date9.; cards; 1234 A EEE 22Jul2011 1234 A EIE 09Aug2011 5678 A EIE 02Sep2011 9876 A EIE 01Sep2011 4321 A EIE 16Sep2011 4321 C CSE 10Aug2011 6874 A ECE 04Jul2011 9848 P ECE 01Aug2011 9848 . ECE 14Jul2011 9703 P ECE 25Jul2011 8977 A ECE 23Jun2011 8977 C CSE 04Jun2011 8977 P ECE 27Jul2011 8790 P ECE 29Jul2011 7864 A MEC 05Sep2011 7864 P MEC 16Sep2011 7864 P ECE 14Jul2011 ; run; data want(drop=_: obs ); if _n_ eq 1 then do; declare hash ha(hashexp:20,ordered:'Y'); declare hiter hi('ha'); ha.definekey('obs'); ha.definedata('obs','_id','_TYPE','_DEP',' _DATE'); ha.definedone(); do until(last); set d2 end=last; ha.add(); end; end; set d1; length description $ 100; description='Missing'; do while(hi.next()=0); if id=_id then do; if type ne _type and dep eq _dep and date eq _date then do; description=catx(' ','TYPE not matching has',_type);leave;end; else if dep ne _dep and type eq _type and date eq _date then do; description=catx(' ','DEP not matching has',_dep);leave;end; else if date ne _date and dep eq _dep and type eq _type then do; description=catx(' ','DATE not matching has',put(_date,date9.));leave;end; end; end; run;
Ksharp
Why rewrite proc compare? Other than the specific form of your output, that is what the proc was made to do.
proc compare base=dataset2
compare=dataset1;
run;
Hi art297
Thanks for your reply
i did try using proc compare. but i want to add new variable DESCRIPTION in Dataset2 which should populate as per the example shown above.
you can use out= option or ODS Tables Produced by the COMPARE Procedure to get it.
Ksharp
I Think I can give the question in detail
Hi All
i have an issue in which i need to compare two datasets and need to capture the difference in observations.
DATASET1
ID DEP DATE TYPE
1234 EIE 12102010 D
6789 EEE 12102010 D
DATASET2
ID DEP DATE TYPE
1234 EEE 12102010 D
6789 EEE 13102010 D
In DATASET2
first observation with ID number 1234 is matching with observations in DATASET1, then need to move on to DEP, compare the values in DEP of DATASET2 and DATASET1
As DATASET2 has EEE DEP and it is not matching with DATASET1 then in NEW field named as DESCRIPTION should be populated with "DEP IS NOT MATCHING DATASET1 has EIE".
One More Example.
In ID 6789 DATE is not MATCHING so it should be populated as ""DATE NOT MATCHING DATASET1 has 12102010"
FINAL EXPECTED OUTPUT IN DATASET3
DATASET3
ID DEP DATE TYPE Description
1234 EEE 12102010 D "DEP IS NOT MATCHING has EIE".
6789 EEE 13102010 D "DATE NOT MATCHING has 12102010"
Thanks
I think proc compare is a best choice.
But you need to tailor its output .
data d1; input (ID DEP DATE TYPE ) (:$10.); cards; 1234 EIE 12102010 D 6789 EEE 12102010 D ; run; data d2; input (ID DEP DATE TYPE ) (:$10.); obs+1; cards; 1234 EEE 12102010 A 6789 EEE 13102010 D ; run; ods output comparedetails=dif(keep=batch); proc compare base=d2 compare=d1 brief trans ; run; data temp(drop=batch); set dif(firstobs=6); length des $ 200; retain obs des; if left(batch) eq: '_OBS_' then do; obs=input(scan(batch,-1,,'kd'),best8.); delete; end; else if left(batch) eq: 'Variable' then delete; else if not missing(batch) then do; des=catx(' ',scan(batch,1),'not matching has',scan(batch,-1)); output; end; run; data want(drop=des); set temp; by obs; length description $ 2000; retain description; if first.obs then call missing(description); description=catx('.',description,des); if last.obs then output; run; data want; merge d2 want; by obs; drop obs; run;
Ksharp
Hi KSharp
Thanks for your response
i have getting following error
data temp(drop=batch);
set dif(firstobs=6);
length des $ 200;
retain obs des;
if left(batch) eq: '_OBS_' then do;
obs=input(scan(batch,-1,,'kd'),best8.);
ERROR 159-185: Null parameters for SCAN are invalid.
delete;
end;
else if left(batch) eq: 'Variable' then delete;
else if not missing(batch) then do;
des=catx(' ',scan(batch,1),'not matching has',scan(batch,-1));
output;
end;
run;
Are you using SAS 9.1?
data d1; input ID DEP $ DATE TYPE $; cards; 1234 EIE 12102010 D 6789 EEE 12102010 D ; run; data d2; input ID DEP $ DATE TYPE $; obs+1; cards; 1234 EEE 12102010 A 6789 EEE 13102010 D ; run; ods output comparedetails=dif(keep=batch); proc compare base=d2 compare=d1 brief trans ; run; data temp(drop=batch); set dif(firstobs=6); length des $ 200; retain obs des; if left(batch) eq: '_OBS_' then do; obs=input(scan(batch,-1,':= '),best8.); delete; end; else if left(batch) eq: 'Variable' then delete; else if not missing(batch) then do; des=catx(' ',scan(batch,1),'not matching has',scan(batch,3)); output; end; run; data want(drop=des); set temp; by obs; length description $ 2000; retain description; if first.obs then call missing(description); description=catx('.',description,des); if last.obs then output; run; data want; merge d2 want; by obs; drop obs; run;
Ksharp
Message was edited by: xia keshan
Hello Ksharp
Yes i am using SAS 9.1
Answer is help full for me but for the same situation if we have multiple records
eg:
DATASET 1
ID | TYPE | DEP | DATE |
1234 | A | CIV | 22Jul2011 |
4321 | A | CSE | 05Aug2011 |
6874 | A | ECE | 05Jul2011 |
9848 | M | ECE | 15Jul2011 |
8977 | A | ECE | 22Jul2011 |
8956 | A | EIE | 28Sep2011 |
5656 | A | CSE | 16Sep2011 |
3265 | A | CSE | 22Sep2011 |
4456 | A | SDC | 27Sep2011 |
6546 | A | MPC | 27Jul2011 |
6546 | A | CEC | 08Sep2011 |
DATASET 2
ID | TYPE | DEP | DATE |
1234 | A | EEE | 22Jul2011 |
1234 | A | EIE | 09Aug2011 |
5678 | A | EIE | 02Sep2011 |
9876 | A | EIE | 01Sep2011 |
4321 | A | EIE | 16Sep2011 |
4321 | C | CSE | 10Aug2011 |
6874 | A | ECE | 04Jul2011 |
9848 | P | ECE | 01Aug2011 |
9848 | ECE | 14Jul2011 | |
9703 | P | ECE | 25Jul2011 |
8977 | A | ECE | 23Jun2011 |
8977 | P | ECE | 27Jul2011 |
8790 | P | ECE | 29Jul2011 |
7864 | A | MEC | 05Sep2011 |
7864 | P | MEC | 16Sep2011 |
7864 | P | ECE | 14Jul2011 |
EXPECTED OUTPUT
ID | TYPE | DEP | DATE | Description |
1234 | A | EEE | 22Jul2011 | DEP not matching has CIV |
1234 | A | EIE | 09Aug2011 | DEP and DATE not matching has CIV, 22Jul2011 |
5678 | A | EIE | 02Sep2011 | Missing |
9876 | A | EIE | 01Sep2011 | Missing |
4321 | A | EIE | 16Sep2011 | Dep and DATE not matching has CSE 05Aug2011 |
4321 | C | CSE | 10Aug2011 | Type Not Matching Missing |
6874 | A | ECE | 04Jul2011 | Date Not matching has 05Jul2011 |
9848 | P | ECE | 01Aug2011 | Type Not Matching Missing |
9848 | ECE | 14Jul2011 | Type Not Matching Missing | |
9703 | P | ECE | 25Jul2011 | Missing |
8977 | A | ECE | 23Jun2011 | Date Not matching has 22Jul2011 |
8977 | P | ECE | 27Jul2011 | Type Not Matching Missing |
8790 | P | ECE | 29Jul2011 | Missing |
7864 | A | MEC | 05Sep2011 | Missing |
7864 | P | MEC | 16Sep2011 | Missing |
7864 | P | ECE | 14Jul2011 | Missing |
Thanks for your help
Did you want to compare these two datasets based on id variable?
I assume id variable in d1 has unique value(i.e. no duplicated value).
data d1; input ID _TYPE $ _DEP $ _DATE :date9.; format _date date9.; cards; 1234 A CIV 22Jul2011 4321 A CSE 05Aug2011 6874 A ECE 05Jul2011 9848 M ECE 15Jul2011 8977 A ECE 22Jul2011 8956 A EIE 28Sep2011 5656 A CSE 16Sep2011 3265 A CSE 22Sep2011 4456 A SDC 27Sep2011 6546 A MPC 27Jul2011 ; run; data d2; input ID TYPE $ DEP $ DATE :date9.; format date date9.; cards; 1234 A EEE 22Jul2011 1234 A EIE 09Aug2011 5678 A EIE 02Sep2011 9876 A EIE 01Sep2011 4321 A EIE 16Sep2011 4321 C CSE 10Aug2011 6874 A ECE 04Jul2011 9848 P ECE 01Aug2011 9848 . ECE 14Jul2011 9703 P ECE 25Jul2011 8977 A ECE 23Jun2011 8977 P ECE 27Jul2011 8790 P ECE 29Jul2011 7864 A MEC 05Sep2011 7864 P MEC 16Sep2011 7864 P ECE 14Jul2011 ; run; data want(drop=_: name value rc); if _n_ eq 1 then do; declare hash ha(hashexp:16,ordered:'y'); ha.definekey('id'); ha.definedata('_TYPE','_DEP',' _DATE'); ha.definedone(); do until(last); set d1 end=last; ha.add(); end; end; set d2; length name value description $ 100; rc=ha.find(); if rc=0 then do; if type ne _type then do; name=catx(' ',name,'TYPE');value=catx(' ',value,_type);end; if dep ne _dep then do; name=catx(' ',name,'DEP');value=catx(' ',value,_dep);end; if date ne _date then do; name=catx(' ',name,'DATE');value=catx(' ',value,put(_date,date9.));end; if not missing(name) then description=catx(' ',name,'not matching has',value); else description='All Matched'; end; else description='Missing'; run;
Ksharp
消息编辑者为:xia keshan
消息编辑者为:xia keshan Optimize it!
Hi Ksharp
I am sorry to give you uniq data.
But D1 has multiple id's say as shown in example
data d1;
input ID _TYPE $ _DEP $ _DATE :date9.;
format _date date9.;
cards;
1234 A CIV 22Jul2011
1234 A MIC 09Jul2011
4321 A CSE 05Aug2011
6874 A ECE 05Jul2011
9848 M ECE 15Jul2011
8977 A ECE 22Jul2011
8977 A ECE 21Jul2011
8956 A EIE 28Sep2011
5656 A CSE 16Sep2011
3265 A CSE 22Sep2011
3265 B CSE 22Sep2011
3265 C CSE 22Sep2011
4456 A SDC 27Sep2011
6546 A MPC 27Jul2011
OK.
data d1; input _ID _TYPE $ _DEP $ _DATE :date9.; obs+1; format _date date9.; cards; 1234 A CIV 22Jul2011 1234 A MIC 09Jul2011 4321 A CSE 05Aug2011 6874 A ECE 05Jul2011 9848 M ECE 15Jul2011 8977 A ECE 22Jul2011 8977 A ECE 21Jul2011 8956 A EIE 28Sep2011 5656 A CSE 16Sep2011 3265 A CSE 22Sep2011 3265 B CSE 22Sep2011 3265 C CSE 22Sep2011 4456 A SDC 27Sep2011 6546 A MPC 27Jul2011 ; run; data d2; input ID TYPE $ DEP $ DATE :date9.; format date date9.; cards; 1234 A EEE 22Jul2011 1234 A EIE 09Aug2011 5678 A EIE 02Sep2011 9876 A EIE 01Sep2011 4321 A EIE 16Sep2011 4321 C CSE 10Aug2011 6874 A ECE 04Jul2011 9848 P ECE 01Aug2011 9848 . ECE 14Jul2011 9703 P ECE 25Jul2011 8977 A ECE 23Jun2011 8977 P ECE 27Jul2011 8790 P ECE 29Jul2011 7864 A MEC 05Sep2011 7864 P MEC 16Sep2011 7864 P ECE 14Jul2011 ; run; data want(drop=_: name value rc obs k found rx); if _n_ eq 1 then do; declare hash ha(hashexp:20,ordered:'Y'); declare hiter hi('ha'); ha.definekey('obs'); ha.definedata('obs','_id','_TYPE','_DEP',' _DATE'); ha.definedone(); do until(last); set d1 end=last; ha.add(); end; end; set d2; length name value description $ 100; rc=hi.first(); do while(rc=0); if id=_id then do; if type ne _type then do; name=catx(' ',name,'TYPE');value=catx(' ',value,_type);end; if dep ne _dep then do; name=catx(' ',name,'DEP');value=catx(' ',value,_dep);end; if date ne _date then do; name=catx(' ',name,'DATE');value=catx(' ',value,put(_date,date9.));end; if not missing(name) then description=catx(' ',name,'not matching has',value); else description='All Matched'; k=obs;found=1; end; else description='Missing'; rc=hi.next(); if found then do;rx=ha.remove(key:k); leave;end; end; run;
Ksharp
Hello Ksharp
I am extremely thank full for you time and helping out to solve the situation.
D1
1234 A CIV 22Jul2011 | Dep Not Matching has EEE |
1234 A MIC 09Jul2011 | Dep Not Matching has EIE |
4321 A CSE 05Aug2011 | Date Not Matching has 16Sep2011 |
6874 A ECE 05Jul2011 | Type Not Matching has C |
9848 M ECE 15Jul2011 | Missing |
8977 A ECE 22Jul2011 | Date Not Matching has 23Jun2011 |
8977 P EIE 27Jul2011 | Dep Not Matching has EIE |
8956 A EIE 28Sep2011 | Missing |
5656 A CSE 16Sep2011 | Missing |
3265 A CSE 22Sep2011 | Missing |
D2
1234 A EEE 22Jul2011 |
1234 A EIE 09Aug2011 |
5678 A EIE 02Sep2011 |
9876 A EIE 01Sep2011 |
4321 A CSE 16Sep2011 |
4321 C CSE 10Aug2011 |
6874 C ECE 05Jul2011 |
9848 P ECE 01Aug2011 |
9848 . ECE 14Jul2011 |
9703 P ECE 25Jul2011 |
8977 A ECE 23Jun2011 |
8977 C CSE 04Jun2011 |
8977 P ECE 27Jul2011 |
8790 P ECE 29Jul2011 |
7864 A MEC 05Sep2011 |
7864 P MEC 16Sep2011 |
7864 P ECE 14Jul2011 |
in the above situation
for ID number 1234 there are two observation in D1 and D2 .
I want the following conditions to be satisfied.
1. ID=_ID and Type=_Type and Date= _Date and Dep ^= _Dep Then Description = DEP not matching has EEE ( Same for the second observation DEP not matching has EIE)
D1
1234 A CIV 22Jul2011 Dep Not Matching has EEE |
1234 A MIC 09Jul2011 Dep Not Matching has EIE |
D2
1234 A EEE 22Jul2011 |
1234 A EIE 09Aug2011 |
for ID 4321 has one observation in D1 and two observations in D2
2.ID=_ID and Type=_Type and Date^= _Date and Dep = _Dep then Description = Date not matching has 16Sep2011 and for second observation in D2 with same ID second record and should not over write the first record as we have only 1 record in D1.
D1
4321 A CSE 05Aug2011 Date Not Matching has 16Sep2011 |
D2
4321 A CSE 16Sep2011 |
4321 C CSE 10Aug2011 |
for ID 6874 has one observation in D1 and one in D2.
3.ID=_ID and Type ^=_Type and Date= _Date and Dep = _Dep then Description = Type not matching has A.
D1
6874 A ECE 05Jul2011 Type Not Matching has C
D2
6874 C ECE 05Jul2011
One more Example with ID
D1
8977 A ECE 22Jul2011 | Date Not Matching has 23Jun2011 |
8977 P EIE 27Jul2011 | Dep Not Matching has EIE |
D2
8977 A ECE 23Jun2011 |
8977 C CSE 04Jun2011 |
8977 P ECE 27Jul2011 |
All others Missing
Thank You Very for your Help in Advance
I am confused. Is it right?
D1
8977 A ECE 22Jul2011 | Date Not Matching has 23Jun2011 |
8977 P EIE 27Jul2011 | Dep Not Matching has EIE |
D2
8977 A ECE 23Jun2011 |
8977 C CSE 04Jun2011 |
8977 P ECE 27Jul2011 |
If your data looks like : , What you got to do?
D1
8977 A ECE 22Jul2011 | |
8977 P EIE 27Jul2011 |
D2
8977 A ECE 23Jun2011 |
8977 C EIE 27Jun2011 |
8977 P ECE 30Jul2011 |
I am sorry my mistake for first question it should be ECE
2nd question
D1
8977 A ECE 22Jul2011 | |
8977 P EIE 27Jul2011 |
D2
8977 A ECE 23Jun2011 |
8977 C EIE 27Jun2011 |
8977 P ECE 30Jul2011 |
Instance 1:
1st observation in D1 dataset will check with all the records in D2.
So when ever 3 conditions are satisfied out of 4
.ID=_ID and Type =_Type and Date ^= _Date and Dep = _Dep
So 1st observation in D1 is matching with 1st observation in D2 except date so will register in D1 as date not matching
Instance 2:
Now 2nd Observation in D1 will Check again with 3 observations in D2
ID=_ID and Type =_Type and Date = _Date and Dep ^= _Dep
So will register in D1 description as Dep not matching.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.