I am trying to set up a data step that expunges some 'incorrect' values from a dataset. For instance, I have a table:
data a;
input (A1 A2 A3) ($);
datalines;
A B D
B B D
D A E
;
And I am trying to remake the table but where all values are within this column/dataset (only A, B and C):
data b;
input (B1) ($);
datalines;
A
B
C
;
Such that the final output is
data a_2;
input (A1 A2 A3) ($);
datalines;
A B .
B B .
. A .
;
I know I can check if a variable is 'in' a column with PROC SQL, but it seems like the code to do so would be a bit long - my actual dataset involves far more than 3 variables and 3 allowed values so I was curious if there was a simpler way to do it.
Yes, PROC FORMAT:
proc format;
value $lookup
'A', 'B', 'C' = 'Y'
other = 'N'
;
run;
data want;
set have;
if put(A1, $lookup.) = 'N' then A1 = '';
run;
Create a custom format that will map each variable back to its value if it's in your list and to . otherwise.
You can then apply it to all variables with an array or format statement as needed.
Are your variables character, numeric or a mixture?
You've shown character variables but are using the period, which is typically used to represent only numeric missing.
@DAppelbaum83 wrote:
I am trying to set up a data step that expunges some 'incorrect' values from a dataset. For instance, I have a table:
data a;
input (A1 A2 A3) ($);
datalines;
A B D
B B D
D A E
;
And I am trying to remake the table but where all values are within this column/dataset (only A, B and C):
data b;
input (B1) ($);
datalines;
A
B
C
;
Such that the final output is
data a_2;
input (A1 A2 A3) ($);
datalines;
A B .
B B .
. A .
;
I know I can check if a variable is 'in' a column with PROC SQL, but it seems like the code to do so would be a bit long - my actual dataset involves far more than 3 variables and 3 allowed values so I was curious if there was a simpler way to do it.
They're all categorical / characters. What do you mean custom format? Is there a PROC for that?
Categorical doesn't mean character, so assuming all are characters something like this will work for you.
*create a format from the data set B;
data b_format;
set b end=last;
fmtname="$miss_fmt";
start=B1;
label="$8.";
*tells SAS the format is the label type;
HLO="F";
output;
if last then
do;
HLO='O';
start="**OTHER**";
label="";
output;
end;
run;
proc format cntlin=b_format;
run;
*apply the format to data;
proc print data=a;
format _character_ $miss_fmt.;
run;
@DAppelbaum83 wrote:
They're all categorical / characters. What do you mean custom format? Is there a PROC for that?
If you're not familiar with User Defined Formats, versus custom formats this is a good start.
PDF 001-30: PROC FORMAT - Not Just Another Pretty Face
Yes, PROC FORMAT:
proc format;
value $lookup
'A', 'B', 'C' = 'Y'
other = 'N'
;
run;
data want;
set have;
if put(A1, $lookup.) = 'N' then A1 = '';
run;
data a; input (A1 A2 A3) ($); datalines; A B D B B D D A E ; data b; input (B1) ($); datalines; A B C ; data want; if _n_=1 then do; if 0 then set b; declare hash h(dataset:'b'); h.definekey('B1'); h.definedone(); end; set a; array x{*} a: ; do i=1 to dim(x); if h.check(key:x{i}) ne 0 then call missing(x{i}); end; drop i b1; run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.