BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Jahanzaib
Quartz | Level 8

I have a dataset which has more than 10 million observations.  In this, I have variable name NAICS it has different values. I want to remove all those values that start with 21. It contains different no of digits like 21, 2145, 210454 etc. So My intention is to remove all values with start with 21.  There are some values of NAICS variable which end with 21 but I don't want them to be affected. Thanking in anticipation.

 

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20
data have;
input NAICS;
datalines;
21234
2134664
42355
235353
214356
;

data want;
	set have;
	if substr(left(NAICS),1,2) NE '21';
run;

View solution in original post

11 REPLIES 11
novinosrin
Tourmaline | Level 20

data want;

set have;

if NAICS =:'21' then delete;

run;

 

Assuming that variable is a character variable

s_lassen
Meteorite | Level 14

@Jahanzaib: Is your variable numeric or character? And what do you mean by "removing" a value? Do you mean setting the variable to missing, or do you mean deleting the whole observation? In your example, all the values that you want to get rid of start with the number 21, does that mean that you do not want to get rid of values like 18212 or 321?

 

 

Jahanzaib
Quartz | Level 8
@s_lassen I mean deleting whole observation. It's a character variable. Right. I don't want to delete values like 18212 or 321. delete only those which start with 21 not the others.
PeterClemmensen
Tourmaline | Level 20

I assumed a numeric variable, but my solution will work for a character variable as well.

s_lassen
Meteorite | Level 14

@Jahanzaib: Then the fastest and simplest is probably using a WHERE clause:

data want;
  set have;
  where NAICS not like '21%';
run;

- this assuming that the values are left aligned, otherwise use

where left(NAICS) not like '21%';
PeterClemmensen
Tourmaline | Level 20
data have;
input NAICS;
datalines;
21234
2134664
42355
235353
214356
;

data want;
	set have;
	if substr(left(NAICS),1,2) NE '21';
run;
RW9
Diamond | Level 26 RW9
Diamond | Level 26

I would really advise you to try both methods on the 10mill records as I suspect this (from @s_lassen😞

data want;
  set have (where=(substr(naics,1,2) ne '21'));
run;

Would be faster than the given solution from @PeterClemmensen:

data want;
  set have;
  if substr(left(naics),1,2) ne '21';
run;

I can't prove this offhand, but vaguely remember something aboutthe set reading in the data for each row and then outputting on the if, where the where clause restricts what is coming in, so slightly earlier in the process.  If so then this fractional saving would add up of millions of records.  But do test.

Jahanzaib
Quartz | Level 8
Thankyou @RW9 How to delete if first two digits are 90 or greater than 90?
novinosrin
Tourmaline | Level 20

From your responses, i assume you are not familiar with wild card operators or colon modifiers. If you have a grasping you would have chosen @s_lassen 's answer without having to deal with functions that makes SAS work more.

If you understand collating sequences , the below solution is easy

data want;

set have;

if var>=:'90' then delete;

run;

 

Regards,

Naveen Srinivasan

novinosrin
Tourmaline | Level 20

Also, the same  can be applied to a where clause:

data want;

set have;

where not (var>=:'90');

run;

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well 90-99 all consist of 9x, so you can just do:

data want;
  set have (where=(substr(naics,1,2) ne '21' and char(naics,1) ne '9'));
run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1619 views
  • 5 likes
  • 5 in conversation