BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
DAppelbaum83
Fluorite | Level 6

I am trying to set up a data step that expunges some 'incorrect' values from a dataset. For instance, I have a table:

 

data a;

input (A1 A2 A3) ($);

datalines;

A  B  D

B  B  D

D  A  E

;

 

And I am trying to remake the table but where all values are within this column/dataset (only A, B and C):

 

data b;

input (B1) ($);

datalines;

A  

B  

;

 

Such that the final output is 

 

data a_2;

input (A1 A2 A3) ($);

datalines;

A  B  .

B  B  .

.   A  .

;

 

I know I can check if a variable is 'in' a column with PROC SQL, but it seems like the code to do so would be a bit long - my actual dataset involves far more than 3 variables and 3 allowed values so I was curious if there was a simpler way to do it.

1 ACCEPTED SOLUTION

Accepted Solutions
SASKiwi
PROC Star

Yes, PROC FORMAT:

proc format;
  value $lookup
  'A', 'B', 'C' = 'Y'
  other = 'N'
 ;
run;

data want;
  set have;
  if put(A1, $lookup.) = 'N' then A1 = '';
run; 

View solution in original post

5 REPLIES 5
Reeza
Super User

Create a custom format that will map each variable back to its value if it's in your list and to . otherwise. 
You can then apply it to all variables with an array or format statement as needed.

 

Are your variables character, numeric or a mixture?

You've shown character variables but are using the period, which is typically used to represent only numeric missing. 

 


@DAppelbaum83 wrote:

I am trying to set up a data step that expunges some 'incorrect' values from a dataset. For instance, I have a table:

 

data a;

input (A1 A2 A3) ($);

datalines;

A  B  D

B  B  D

D  A  E

;

 

And I am trying to remake the table but where all values are within this column/dataset (only A, B and C):

 

data b;

input (B1) ($);

datalines;

A  

B  

;

 

Such that the final output is 

 

data a_2;

input (A1 A2 A3) ($);

datalines;

A  B  .

B  B  .

.   A  .

;

 

I know I can check if a variable is 'in' a column with PROC SQL, but it seems like the code to do so would be a bit long - my actual dataset involves far more than 3 variables and 3 allowed values so I was curious if there was a simpler way to do it.


 

DAppelbaum83
Fluorite | Level 6

They're all categorical / characters. What do you mean custom format? Is there a PROC for that?

Reeza
Super User

Categorical doesn't mean character, so assuming all are characters something like this will work for you.

 

*create a format from the data set B;
data b_format;
	set b end=last;
	fmtname="$miss_fmt";
	start=B1;
	label="$8.";
	*tells SAS the format is the label type;
	HLO="F";
	output;

	if last then
		do;
			HLO='O';
			start="**OTHER**";
			label="";
			output;
		end;
run;

proc format cntlin=b_format;
run;

*apply the format to data;
proc print data=a;
	format _character_ $miss_fmt.;
run;

@DAppelbaum83 wrote:

They're all categorical / characters. What do you mean custom format? Is there a PROC for that?


If you're not familiar with User Defined Formats, versus custom formats this is a good start. 

PDF 001-30: PROC FORMAT - Not Just Another Pretty Face

 

 

SASKiwi
PROC Star

Yes, PROC FORMAT:

proc format;
  value $lookup
  'A', 'B', 'C' = 'Y'
  other = 'N'
 ;
run;

data want;
  set have;
  if put(A1, $lookup.) = 'N' then A1 = '';
run; 
Ksharp
Super User
 

data a;
input (A1 A2 A3) ($);
datalines;
A  B  D
B  B  D
D  A  E
;

 


 

data b;
input (B1) ($);
datalines;
A  
B  
C 
;

data want;
 if _n_=1 then do;
  if 0 then set b;
  declare hash h(dataset:'b');
  h.definekey('B1');
  h.definedone();
 end;
set a;
array x{*} a: ;
do i=1 to dim(x);
 if h.check(key:x{i}) ne 0 then call missing(x{i});
end;
drop i b1;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 2472 views
  • 3 likes
  • 4 in conversation