BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
singhsahab
Lapis Lazuli | Level 10

Hello ,

I have 2 datasets , which contain the same variables and all variables have same data type and label in both

ex :

Current dataset:

 

Data test1;
input name $10 age sex;
cards;
a 5 m
b 7 f
c 6 f
;
run;

old generated dataset:

 

Data test2;
input name $10 age sex;
cards;
a 5 m
b 8 f
c 7 M
;
run;

 

I need output as mentioned below,a new variable needed which contain the details of column name if any data got change in comparison of old dataset and new dataset.


final output needed as :

name age sex variable_status
a        5     m
b        7      f        age
c        6      f        age,sex


thank you in advance !!

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20
Data test1;
input name $ age sex :$1.;
cards;
a 5 m
b 7 f
c 6 f
;
run;

Data test2;
input name $ age sex :$1.;
cards;
a 5 m
b 8 f
c 7 M
;
run;

proc sql;
create table want as
select a.*,catx(',',ifc(a.age ne b.age,'AGE',' '),ifc(a.sex ne b.sex,'SEX',' ')) as variable_status
from test1 a left join test2 b
on a.name=b.name;
quit;

View solution in original post

6 REPLIES 6
RW9
Diamond | Level 26 RW9
Diamond | Level 26

First lets sort out your test data as it was not correct.  Then we use proc compare to get differences.  From that output you can then get the results you want:

data test1;
  input name $ age sex $;
cards;
a 5 m
b 7 f
c 6 f
;
run;

data test2;
  input name $ age sex $;
cards;
a 5 m
b 8 f
c 7 M
;
run;

proc compare base=test1 comp=test2 out=want outnoequal outbase outcomp outdif noprint;
  id name;
run;

data want;
  set want;
  length result $100;
  if _type_="DIF" then do;
    if name ne lag(name) then result=catx(",",result,"name");
    if age ne lag(age) then result=catx(",",result,"age");
    if sex ne lag(sex) then result=catx(",",result,"sex");
    output;
  end;
run;
Kurt_Bremser
Super User

Since you want the orginal (base values) preserved, use this:

Data test1;
input name $ age sex :$1.;
cards;
a 5 m
b 7 f
c 6 f
;
run;

Data test2;
input name $ age sex :$1.;
cards;
a 5 m
b 8 f
c 7 M
;
run;

proc compare
  base=test1
  compare=test2
  out=comp
  outdif
  outbase
  noprint
;
id name;
run;

data want;
set comp (rename=(age=_age sex=_sex));
by name;
retain age sex " ";
if first.name /* BASE obs */
then do;
  age = _age;
  sex = _sex;
end;
if last.name /* DIF obs */
then do;
  variable_status = catx(',',ifc(_age ne 0,'age',''),ifc(_sex ne '.','sex',''));
  output;
end;
keep name age sex variable_status;
run;

proc print data=want noobs;
run;

Result:

                      variable_
name    age    sex     status

 a       5      m              
 b       7      f      age     
 c       6      f      age,sex 
novinosrin
Tourmaline | Level 20
Data test1;
input name $ age sex :$1.;
cards;
a 5 m
b 7 f
c 6 f
;
run;

Data test2;
input name $ age sex :$1.;
cards;
a 5 m
b 8 f
c 7 M
;
run;

proc sql;
create table want as
select a.*,catx(',',ifc(a.age ne b.age,'AGE',' '),ifc(a.sex ne b.sex,'SEX',' ')) as variable_status
from test1 a left join test2 b
on a.name=b.name;
quit;
novinosrin
Tourmaline | Level 20

data want;
if _n_=1 then do;
   dcl hash H (dataset:'test2(rename=(age=_age sex=_sex))') ;
   h.definekey  ("name") ;
   h.definedata ("_age", "_sex") ;
   h.definedone () ;
  _age=.;
	_sex=' ';
   end;
do until(lr);
set test1 end=lr;
rc=h.find();
variable_status = catx(',',ifc(age ne _age,'AGE',' '),ifc(sex ne _sex,'SEX',' '));
output;
end;
stop;
drop rc _:;
run;


novinosrin
Tourmaline | Level 20
/*If sorted by name*/
data want;
merge test1 test2(rename=(age=_age sex=_sex));
by name;
variable_status = catx(',',ifc(age ne _age,'AGE',' '),ifc(sex ne _sex,'SEX',' '));
drop _:;
run;
ballardw
Super User

@singhsahab wrote:

Hello ,

I have 2 datasets , which contain the same variables and all variables have same data type and label in both

ex :

Current dataset:

 

Data test1;
input name $10 age sex;
cards;
a 5 m
b 7 f
c 6 f
;
run;

old generated dataset:

 

Data test2;
input name $10 age sex;
cards;
a 5 m
b 8 f
c 7 M
;
run;

 

I need output as mentioned below,a new variable needed which contain the details of column name if any data got change in comparison of old dataset and new dataset.


final output needed as :

name age sex variable_status
a        5     m
b        7      f        age
c        6      f        age,sex


thank you in advance !!


Please explain exactly how you intend to use the variable_status variable. Many times placing multiple values into a single variable causes much more work down the line. Also, how many variables do you have? You likely will need to consider explicitly setting a length for the variable as if you have dozens of variable the combine lengths of the names plus the commas may exceed the default length for the likely functions to combine the variable names together. Since variable names can contain 32 characters then potentially as few as 10 variables will require: 10*32+9 or 329 characters to hold the result.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 768 views
  • 2 likes
  • 5 in conversation