BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
DAppelbaum83
Fluorite | Level 6

I am trying to set up a data step that expunges some 'incorrect' values from a dataset. For instance, I have a table:

 

data a;

input (A1 A2 A3) ($);

datalines;

A  B  D

B  B  D

D  A  E

;

 

And I am trying to remake the table but where all values are within this column/dataset (only A, B and C):

 

data b;

input (B1) ($);

datalines;

A  

B  

;

 

Such that the final output is 

 

data a_2;

input (A1 A2 A3) ($);

datalines;

A  B  .

B  B  .

.   A  .

;

 

I know I can check if a variable is 'in' a column with PROC SQL, but it seems like the code to do so would be a bit long - my actual dataset involves far more than 3 variables and 3 allowed values so I was curious if there was a simpler way to do it.

1 ACCEPTED SOLUTION

Accepted Solutions
SASKiwi
PROC Star

Yes, PROC FORMAT:

proc format;
  value $lookup
  'A', 'B', 'C' = 'Y'
  other = 'N'
 ;
run;

data want;
  set have;
  if put(A1, $lookup.) = 'N' then A1 = '';
run; 

View solution in original post

5 REPLIES 5
Reeza
Super User

Create a custom format that will map each variable back to its value if it's in your list and to . otherwise. 
You can then apply it to all variables with an array or format statement as needed.

 

Are your variables character, numeric or a mixture?

You've shown character variables but are using the period, which is typically used to represent only numeric missing. 

 


@DAppelbaum83 wrote:

I am trying to set up a data step that expunges some 'incorrect' values from a dataset. For instance, I have a table:

 

data a;

input (A1 A2 A3) ($);

datalines;

A  B  D

B  B  D

D  A  E

;

 

And I am trying to remake the table but where all values are within this column/dataset (only A, B and C):

 

data b;

input (B1) ($);

datalines;

A  

B  

;

 

Such that the final output is 

 

data a_2;

input (A1 A2 A3) ($);

datalines;

A  B  .

B  B  .

.   A  .

;

 

I know I can check if a variable is 'in' a column with PROC SQL, but it seems like the code to do so would be a bit long - my actual dataset involves far more than 3 variables and 3 allowed values so I was curious if there was a simpler way to do it.


 

DAppelbaum83
Fluorite | Level 6

They're all categorical / characters. What do you mean custom format? Is there a PROC for that?

Reeza
Super User

Categorical doesn't mean character, so assuming all are characters something like this will work for you.

 

*create a format from the data set B;
data b_format;
	set b end=last;
	fmtname="$miss_fmt";
	start=B1;
	label="$8.";
	*tells SAS the format is the label type;
	HLO="F";
	output;

	if last then
		do;
			HLO='O';
			start="**OTHER**";
			label="";
			output;
		end;
run;

proc format cntlin=b_format;
run;

*apply the format to data;
proc print data=a;
	format _character_ $miss_fmt.;
run;

@DAppelbaum83 wrote:

They're all categorical / characters. What do you mean custom format? Is there a PROC for that?


If you're not familiar with User Defined Formats, versus custom formats this is a good start. 

PDF 001-30: PROC FORMAT - Not Just Another Pretty Face

 

 

SASKiwi
PROC Star

Yes, PROC FORMAT:

proc format;
  value $lookup
  'A', 'B', 'C' = 'Y'
  other = 'N'
 ;
run;

data want;
  set have;
  if put(A1, $lookup.) = 'N' then A1 = '';
run; 
Ksharp
Super User
 

data a;
input (A1 A2 A3) ($);
datalines;
A  B  D
B  B  D
D  A  E
;

 


 

data b;
input (B1) ($);
datalines;
A  
B  
C 
;

data want;
 if _n_=1 then do;
  if 0 then set b;
  declare hash h(dataset:'b');
  h.definekey('B1');
  h.definedone();
 end;
set a;
array x{*} a: ;
do i=1 to dim(x);
 if h.check(key:x{i}) ne 0 then call missing(x{i});
end;
drop i b1;
run;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 3674 views
  • 3 likes
  • 4 in conversation