Datasets to append can have different variables.
Sometimes, it can be useful to identify in the output dataset when values are missing before the variable was not in the original dataset.
data a;
a='A';
b='B';
x=1;
y=1;
run;
data b;
a='A';
c='C';
x=2;
z=2;
run;
How would you get a slash when the character variable was missing and a .A when the numeric variable was missing?
I used a set statement only.
The solution I've found is using arrays.
I'm just assuming that there is a better way of doing it.
@xxformat_com wrote:
I used a set statement only.
The solution I've found is using arrays.
I'm just assuming that there is a better way of doing it.
This can not result in what you posted.
Code I ran on SAS ODA:
data a;
a='A';
b='B';
x=1;
y=1;
run;
data b;
a='A';
c='C';
x=2;
z=2;
run;
data want;
set a b;
run;
proc print data=want;
run;
Result:
Beob. a b x y c z 1 A B 1 1 . 2 A 2 . C 2
As you can see, the order of the variables is different, and the display of the missing values is as it should be.
So you must have run different code, and additional things (e.g. formats assigned to the variables, or a custom setting of OPTIONS MISSING.
As already said, POST YOUR COMPLETE CODE, and add the result of PROC CONTENTS from your result dataset, and the setting of the MISSING option.
data a;
a='A';
b='B';
x=1;
y=1;
run;
data b;
a='A';
c='C';
x=2;
z=2;
run;
data z (drop=i j);
set a b;
array charvar {*} _character_;
array numvar {*} _numeric_;
do i=1 to dim(charvar);
if charvar{i} = ' ' then charvar{i}='/';
end;
do j=1 to dim(numvar);
if numvar{j} = . then numvar{j}=.A;
end;
run;
proc print data=z;
var a b c x y z;
run;
But I'm assuming there is an easier approach.
Actually, I added more data and an extra layer of difficulty after that in my code because any missing value of an existing variable should not be reported with / or .A
data a;
a='A';
b='B';
w=.;
x=1;
y=1;
run;
data b;
a='A';
c='C';
d=' ';
x=2;
z=2;
run;
At the end, my excel report changes the background of the cells only when the variables were not in the original datasets (my datasets are coming from .csv file I import) and I have an extra layer of probably unneeded complication using a loop for call define in proc report.
Set all of the variables to your special values BEFORE the SET. That way the values read will overwrite the special values. Use IF 0 THEN (an obviously false condition) to allow the data step compiler to see all the datasets so all of the variables are define so the ARRAY statement can find them.
But you need to use a separate SET statement for each dataset. Otherwise the SET operation will clear (set to missing) the variables that do not come from that particular dataset. When setting each dataset separately the SET statement will not do that since its won't know anything about the other dataset(s) that are not mentioned.
data a;
input a $ b $ x y ;
cards;
A1 B1 1 1
A2 . 3 .
;
data b;
input a $ c $ x z ;
cards;
A3 C1 2 2
. C2 . 4
;
data want;
if 0 then set a b ;
array _char_ _character_;
array _num_ _numeric_;
do over _char_ ;
_char_='/';
end;
do over _num_;
_num_=.A;
end;
if not eof then set a end=eof;
else set b ;
run;
Result
Obs a b x y c z 1 A1 B1 1 1 / A 2 A2 3 . / A 3 A3 / 2 A C1 2 4 / . A C2 4
If this is just about display/print then a format could do the job.
data a;
a='A';
b='B';
w=.;
x=1;
y=1;
v=.;
run;
data b;
a='A';
c='C';
d=' ';
x=2;
z=2;
v=.;
run;
proc format;
value numMiss
. = .A
;
value $charMiss
' ' = '/'
;
run;
data all;
set a b;
format _character_ $charMiss. _numeric_ numMiss.;
run;
proc print data=all;
run;
I've added a variable V to your sample data to demonstrate that a missing doesn't tell you if a source variable existed in both source tables or not. It can just be missing in both source tables.
This program preserves original missing values in A and B, but gives the new designated "missing" values when the variable lists are expanded to include all vars not originally in both datasets.
data notina(drop=_left--_right);
retain _Left .;
set a (obs=1);
retain _Right ' ';
set b;
array _n _numeric_;
array _c _character_;
do over _n; _n=.A; end;
do over _c; _c='/'; end;
run;
data notinb(drop=_left--_right);
retain _Left .;
set b (obs=1);
retain _Right ' ';
set a;
array _n _numeric_;
array _c _character_;
do over _n; _n=.A; end;
do over _c; _c='/'; end;
run;
data want;
set a (in=ina) b (in=inb);
retain _p 1;
if ina then set notina point=_p;
if inb then set notinb point=_p;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.