- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi everyone,
I want to find an easy way to remove rows with non-numeric values in a column.
I only have to write the code in PROC SQL and there are non-English characters in the values.
For example,
[Current data] - real codes have a lot of rows..
custNo, rate
1000 0.6
1002 0.6
1004 0.8
1005 確認要
1007 案内通り
[Result Data]
custNo, rate
1000 0.6
1002 0.6
1004 0.8
I want to delete rows with non-numeric values in the rate column,
cuz I have to convert values in the "rate" to numeric values for making a formula.
If I get an answer, I will appreciate it.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
proc sql;
create table want as select * from have
where not missing(input(rate,best10.));
quit;
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @seohyeonjeong,
The below code uses the kcompress() function to remove any "-", "." and digits ("d") in the variable rate, and if there is nothing left then the record is output. This obviously assumes you don't have any invalid rates values such as "---", or ".....", or "-.-.-", etc.:
proc sql noprint;
create table want as
select *
from have
where kcompress(rate, '-.','d') is missing
;
quit;
If you choose to use the input() function (as per @PaigeMiller's post) then I recommend you use the "?" modifier as per the documentation, for example:
proc sql;
create table want as select * from have
where not missing(input(c_height,? best10.));
quit;
Kind regards,
Amir.
Edit: Replaced references to compress() with kcompress(), as per post from @Patrick.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Amir You need to use kcompress() and not compress() with MBCS data or the results can become "unexpected".
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Why remove them? Why not just convert the values into a number and that then values that are invalid as a number will have missing values and be ignored when calculating means and other statistics.
The ?? will suppress the errors when invalid strings are found. 32 is the maximum width the normal numeric informat supports. The INPUT() function does not care if the string being converted is shorter than the width used in the INFORMAT being used to convert it.
data want;
set have;
rate_n = input(rate,??32.);
run;