I am trying to set up a data step that expunges some 'incorrect' values from a dataset. For instance, I have a table:
data a;
input (A1 A2 A3) ($);
datalines;
A B D
B B D
D A E
;
And I am trying to remake the table but where all values are within this column/dataset (only A, B and C):
data b;
input (B1) ($);
datalines;
A
B
C
;
Such that the final output is
data a_2;
input (A1 A2 A3) ($);
datalines;
A B .
B B .
. A .
;
I know I can check if a variable is 'in' a column with PROC SQL, but it seems like the code to do so would be a bit long - my actual dataset involves far more than 3 variables and 3 allowed values so I was curious if there was a simpler way to do it.
Yes, PROC FORMAT:
proc format;
value $lookup
'A', 'B', 'C' = 'Y'
other = 'N'
;
run;
data want;
set have;
if put(A1, $lookup.) = 'N' then A1 = '';
run;
Create a custom format that will map each variable back to its value if it's in your list and to . otherwise.
You can then apply it to all variables with an array or format statement as needed.
Are your variables character, numeric or a mixture?
You've shown character variables but are using the period, which is typically used to represent only numeric missing.
@DAppelbaum83 wrote:
I am trying to set up a data step that expunges some 'incorrect' values from a dataset. For instance, I have a table:
data a;
input (A1 A2 A3) ($);
datalines;
A B D
B B D
D A E
;
And I am trying to remake the table but where all values are within this column/dataset (only A, B and C):
data b;
input (B1) ($);
datalines;
A
B
C
;
Such that the final output is
data a_2;
input (A1 A2 A3) ($);
datalines;
A B .
B B .
. A .
;
I know I can check if a variable is 'in' a column with PROC SQL, but it seems like the code to do so would be a bit long - my actual dataset involves far more than 3 variables and 3 allowed values so I was curious if there was a simpler way to do it.
They're all categorical / characters. What do you mean custom format? Is there a PROC for that?
Categorical doesn't mean character, so assuming all are characters something like this will work for you.
*create a format from the data set B;
data b_format;
set b end=last;
fmtname="$miss_fmt";
start=B1;
label="$8.";
*tells SAS the format is the label type;
HLO="F";
output;
if last then
do;
HLO='O';
start="**OTHER**";
label="";
output;
end;
run;
proc format cntlin=b_format;
run;
*apply the format to data;
proc print data=a;
format _character_ $miss_fmt.;
run;
@DAppelbaum83 wrote:
They're all categorical / characters. What do you mean custom format? Is there a PROC for that?
If you're not familiar with User Defined Formats, versus custom formats this is a good start.
PDF 001-30: PROC FORMAT - Not Just Another Pretty Face
Yes, PROC FORMAT:
proc format;
value $lookup
'A', 'B', 'C' = 'Y'
other = 'N'
;
run;
data want;
set have;
if put(A1, $lookup.) = 'N' then A1 = '';
run;
data a; input (A1 A2 A3) ($); datalines; A B D B B D D A E ; data b; input (B1) ($); datalines; A B C ; data want; if _n_=1 then do; if 0 then set b; declare hash h(dataset:'b'); h.definekey('B1'); h.definedone(); end; set a; array x{*} a: ; do i=1 to dim(x); if h.check(key:x{i}) ne 0 then call missing(x{i}); end; drop i b1; run;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.