Solved: remove value that contain 21

Jahanzaib · Posted 05-24-2017 04:20 AM

I have a dataset which has more than 10 million observations. In this, I have variable name NAICS it has different values. I want to remove all those values that start with 21. It contains different no of digits like 21, 2145, 210454 etc. So My intention is to remove all values with start with 21. There are some values of NAICS variable which end with 21 but I don't want them to be affected. Thanking in anticipation.

PeterClemmensen · Posted 05-24-2017 04:28 AM

data have;
input NAICS;
datalines;
21234
2134664
42355
235353
214356
;

data want;
	set have;
	if substr(left(NAICS),1,2) NE '21';
run;

The DATA to DATA Step Macro
Blog: SASnrd

View solution in original post

novinosrin · Posted 05-24-2017 04:27 AM

data want;

set have;

if NAICS =:'21' then delete;

run;

Assuming that variable is a character variable

s_lassen · Posted 05-24-2017 04:28 AM

@Jahanzaib: Is your variable numeric or character? And what do you mean by "removing" a value? Do you mean setting the variable to missing, or do you mean deleting the whole observation? In your example, all the values that you want to get rid of start with the number 21, does that mean that you do not want to get rid of values like 18212 or 321?

Jahanzaib · Posted 05-24-2017 04:34 AM

@s_lassen I mean deleting whole observation. It's a character variable. Right. I don't want to delete values like 18212 or 321. delete only those which start with 21 not the others.

PeterClemmensen · Posted 05-24-2017 04:36 AM

I assumed a numeric variable, but my solution will work for a character variable as well.

The DATA to DATA Step Macro
Blog: SASnrd

s_lassen · Posted 05-24-2017 04:38 AM

@Jahanzaib: Then the fastest and simplest is probably using a WHERE clause:

data want;
  set have;
  where NAICS not like '21%';
run;

- this assuming that the values are left aligned, otherwise use

where left(NAICS) not like '21%';

PeterClemmensen · Posted 05-24-2017 04:28 AM

data have;
input NAICS;
datalines;
21234
2134664
42355
235353
214356
;

data want;
	set have;
	if substr(left(NAICS),1,2) NE '21';
run;

The DATA to DATA Step Macro
Blog: SASnrd

RW9 · Posted 05-24-2017 04:51 AM

I would really advise you to try both methods on the 10mill records as I suspect this (from @s_lassen😞

data want;
  set have (where=(substr(naics,1,2) ne '21'));
run;

Would be faster than the given solution from @PeterClemmensen:

data want;
  set have;
  if substr(left(naics),1,2) ne '21';
run;

I can't prove this offhand, but vaguely remember something aboutthe set reading in the data for each row and then outputting on the if, where the where clause restricts what is coming in, so slightly earlier in the process. If so then this fractional saving would add up of millions of records. But do test.

Jahanzaib · Posted 05-24-2017 05:00 AM

Thankyou @RW9 How to delete if first two digits are 90 or greater than 90?

novinosrin · Posted 05-24-2017 05:17 AM

From your responses, i assume you are not familiar with wild card operators or colon modifiers. If you have a grasping you would have chosen @s_lassen 's answer without having to deal with functions that makes SAS work more.

If you understand collating sequences , the below solution is easy

data want;

set have;

if var>=:'90' then delete;

run;

Regards,

Naveen Srinivasan

novinosrin · Posted 05-24-2017 05:28 AM

Also, the same can be applied to a where clause:

data want;

set have;

where not (var>=:'90');

run;

RW9 · Posted 05-24-2017 05:40 AM

Well 90-99 all consist of 9x, so you can just do:

data want;
  set have (where=(substr(naics,1,2) ne '21' and char(naics,1) ne '9'));
run;

remove value that contain 21

Re: remove value that contain 21

Re: remove value that contain 21

Re: remove value that contain 21

Re: remove value that contain 21

Re: remove value that contain 21

Re: remove value that contain 21

Re: remove value that contain 21

Re: remove value that contain 21

Re: remove value that contain 21

Re: remove value that contain 21

Re: remove value that contain 21

Re: remove value that contain 21

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away