Hello ,
I have 2 datasets , which contain the same variables and all variables have same data type and label in both
ex :
Current dataset:
Data test1;
input name $10 age sex;
cards;
a 5 m
b 7 f
c 6 f
;
run;
old generated dataset:
Data test2;
input name $10 age sex;
cards;
a 5 m
b 8 f
c 7 M
;
run;
I need output as mentioned below,a new variable needed which contain the details of column name if any data got change in comparison of old dataset and new dataset.
final output needed as :
name age sex variable_status
a 5 m
b 7 f age
c 6 f age,sex
thank you in advance !!
Data test1;
input name $ age sex :$1.;
cards;
a 5 m
b 7 f
c 6 f
;
run;
Data test2;
input name $ age sex :$1.;
cards;
a 5 m
b 8 f
c 7 M
;
run;
proc sql;
create table want as
select a.*,catx(',',ifc(a.age ne b.age,'AGE',' '),ifc(a.sex ne b.sex,'SEX',' ')) as variable_status
from test1 a left join test2 b
on a.name=b.name;
quit;
First lets sort out your test data as it was not correct. Then we use proc compare to get differences. From that output you can then get the results you want:
data test1; input name $ age sex $; cards; a 5 m b 7 f c 6 f ; run; data test2; input name $ age sex $; cards; a 5 m b 8 f c 7 M ; run; proc compare base=test1 comp=test2 out=want outnoequal outbase outcomp outdif noprint; id name; run; data want; set want; length result $100; if _type_="DIF" then do; if name ne lag(name) then result=catx(",",result,"name"); if age ne lag(age) then result=catx(",",result,"age"); if sex ne lag(sex) then result=catx(",",result,"sex"); output; end; run;
Since you want the orginal (base values) preserved, use this:
Data test1;
input name $ age sex :$1.;
cards;
a 5 m
b 7 f
c 6 f
;
run;
Data test2;
input name $ age sex :$1.;
cards;
a 5 m
b 8 f
c 7 M
;
run;
proc compare
base=test1
compare=test2
out=comp
outdif
outbase
noprint
;
id name;
run;
data want;
set comp (rename=(age=_age sex=_sex));
by name;
retain age sex " ";
if first.name /* BASE obs */
then do;
age = _age;
sex = _sex;
end;
if last.name /* DIF obs */
then do;
variable_status = catx(',',ifc(_age ne 0,'age',''),ifc(_sex ne '.','sex',''));
output;
end;
keep name age sex variable_status;
run;
proc print data=want noobs;
run;
Result:
variable_ name age sex status a 5 m b 7 f age c 6 f age,sex
Data test1;
input name $ age sex :$1.;
cards;
a 5 m
b 7 f
c 6 f
;
run;
Data test2;
input name $ age sex :$1.;
cards;
a 5 m
b 8 f
c 7 M
;
run;
proc sql;
create table want as
select a.*,catx(',',ifc(a.age ne b.age,'AGE',' '),ifc(a.sex ne b.sex,'SEX',' ')) as variable_status
from test1 a left join test2 b
on a.name=b.name;
quit;
data want;
if _n_=1 then do;
dcl hash H (dataset:'test2(rename=(age=_age sex=_sex))') ;
h.definekey ("name") ;
h.definedata ("_age", "_sex") ;
h.definedone () ;
_age=.;
_sex=' ';
end;
do until(lr);
set test1 end=lr;
rc=h.find();
variable_status = catx(',',ifc(age ne _age,'AGE',' '),ifc(sex ne _sex,'SEX',' '));
output;
end;
stop;
drop rc _:;
run;
/*If sorted by name*/
data want;
merge test1 test2(rename=(age=_age sex=_sex));
by name;
variable_status = catx(',',ifc(age ne _age,'AGE',' '),ifc(sex ne _sex,'SEX',' '));
drop _:;
run;
@singhsahab wrote:
Hello ,
I have 2 datasets , which contain the same variables and all variables have same data type and label in both
ex :
Current dataset:
Data test1;
input name $10 age sex;
cards;
a 5 m
b 7 f
c 6 f
;
run;old generated dataset:
Data test2;
input name $10 age sex;
cards;
a 5 m
b 8 f
c 7 M
;
run;
I need output as mentioned below,a new variable needed which contain the details of column name if any data got change in comparison of old dataset and new dataset.
final output needed as :name age sex variable_status
a 5 m
b 7 f age
c 6 f age,sex
thank you in advance !!
Please explain exactly how you intend to use the variable_status variable. Many times placing multiple values into a single variable causes much more work down the line. Also, how many variables do you have? You likely will need to consider explicitly setting a length for the variable as if you have dozens of variable the combine lengths of the names plus the commas may exceed the default length for the likely functions to combine the variable names together. Since variable names can contain 32 characters then potentially as few as 10 variables will require: 10*32+9 or 329 characters to hold the result.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.