BookmarkSubscribeRSS Feed
seohyeonjeong
Obsidian | Level 7

Hi everyone,

 

I want to find an easy way to remove rows with non-numeric values in a column.

I only have to write the code in PROC SQL and there are non-English characters in the values.

 

For example,

 

[Current data] - real codes have a lot of rows..

custNo, rate

1000    0.6            

1002  0.6    

1004  0.8

1005  確認要

1007  案内通り

 

[Result Data]

custNo, rate

1000    0.6            

1002  0.6    

1004  0.8

 

I want to delete rows with non-numeric values in the rate column,

cuz I have to convert values in the "rate" to numeric values for making a formula.

 

If I get an answer, I will appreciate it.

 

Thanks

 

4 REPLIES 4
PaigeMiller
Diamond | Level 26
proc sql;
    create table want as select * from have
    where not missing(input(rate,best10.));
quit;
--
Paige Miller
Amir
PROC Star

Hi @seohyeonjeong,

 

The below code uses the kcompress() function to remove any "-", "." and digits ("d") in the variable rate, and if there is nothing left then the record is output. This obviously assumes you don't have any invalid rates values such as "---", or ".....", or "-.-.-", etc.:

 

proc sql noprint;
	create table want as
	select *
	from have
	where kcompress(rate, '-.','d') is missing
	;
quit;

 

If you choose to use the input() function (as per @PaigeMiller's post) then I recommend you use the "?" modifier as per the documentation, for example:

 

proc sql;
    create table want as select * from have
    where not missing(input(c_height,? best10.));
quit;

 

 

 

Kind regards,

Amir.

 

Edit: Replaced references to compress() with kcompress(), as per post from @Patrick.

Patrick
Opal | Level 21

@Amir You need to use kcompress() and not compress() with MBCS data or the results can become "unexpected".

Tom
Super User Tom
Super User

Why remove them? Why not just convert the values into a number and that then values that are invalid as a number will have missing values and be ignored when calculating means and other statistics.

The ?? will suppress the errors when invalid strings are found. 32 is the maximum width the normal numeric informat supports. The INPUT() function does not care if the string being converted is shorter than the width used in the INFORMAT being used to convert it.

data want;
  set have;
  rate_n = input(rate,??32.);
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 3706 views
  • 8 likes
  • 5 in conversation