I am trying to set up a data step that expunges some 'incorrect' values from a dataset. For instance, I have a table:
data a;
input (A1 A2 A3) ($);
datalines;
A B D
B B D
D A E
;
And I am trying to remake the table but where all values are within this column/dataset (only A, B and C):
data b;
input (B1) ($);
datalines;
A
B
C
;
Such that the final output is
data a_2;
input (A1 A2 A3) ($);
datalines;
A B .
B B .
. A .
;
I know I can check if a variable is 'in' a column with PROC SQL, but it seems like the code to do so would be a bit long - my actual dataset involves far more than 3 variables and 3 allowed values so I was curious if there was a simpler way to do it.
Yes, PROC FORMAT:
proc format;
value $lookup
'A', 'B', 'C' = 'Y'
other = 'N'
;
run;
data want;
set have;
if put(A1, $lookup.) = 'N' then A1 = '';
run;
Create a custom format that will map each variable back to its value if it's in your list and to . otherwise.
You can then apply it to all variables with an array or format statement as needed.
Are your variables character, numeric or a mixture?
You've shown character variables but are using the period, which is typically used to represent only numeric missing.
@DAppelbaum83 wrote:
I am trying to set up a data step that expunges some 'incorrect' values from a dataset. For instance, I have a table:
data a;
input (A1 A2 A3) ($);
datalines;
A B D
B B D
D A E
;
And I am trying to remake the table but where all values are within this column/dataset (only A, B and C):
data b;
input (B1) ($);
datalines;
A
B
C
;
Such that the final output is
data a_2;
input (A1 A2 A3) ($);
datalines;
A B .
B B .
. A .
;
I know I can check if a variable is 'in' a column with PROC SQL, but it seems like the code to do so would be a bit long - my actual dataset involves far more than 3 variables and 3 allowed values so I was curious if there was a simpler way to do it.
They're all categorical / characters. What do you mean custom format? Is there a PROC for that?
Categorical doesn't mean character, so assuming all are characters something like this will work for you.
*create a format from the data set B;
data b_format;
set b end=last;
fmtname="$miss_fmt";
start=B1;
label="$8.";
*tells SAS the format is the label type;
HLO="F";
output;
if last then
do;
HLO='O';
start="**OTHER**";
label="";
output;
end;
run;
proc format cntlin=b_format;
run;
*apply the format to data;
proc print data=a;
format _character_ $miss_fmt.;
run;
@DAppelbaum83 wrote:
They're all categorical / characters. What do you mean custom format? Is there a PROC for that?
If you're not familiar with User Defined Formats, versus custom formats this is a good start.
PDF 001-30: PROC FORMAT - Not Just Another Pretty Face
Yes, PROC FORMAT:
proc format;
value $lookup
'A', 'B', 'C' = 'Y'
other = 'N'
;
run;
data want;
set have;
if put(A1, $lookup.) = 'N' then A1 = '';
run;
data a;
input (A1 A2 A3) ($);
datalines;
A B D
B B D
D A E
;
data b;
input (B1) ($);
datalines;
A
B
C
;
data want;
if _n_=1 then do;
if 0 then set b;
declare hash h(dataset:'b');
h.definekey('B1');
h.definedone();
end;
set a;
array x{*} a: ;
do i=1 to dim(x);
if h.check(key:x{i}) ne 0 then call missing(x{i});
end;
drop i b1;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.