BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
aminkarimid
Lapis Lazuli | Level 10

Hello everybody;

I have a variable that contains alphanumeric strings of specific lengths, for example:

 

Name variable:
asdf1

asdg2

zxcv4

asdh3

qwer2

rtyu4

xcvb4

 

Now, I want to delete observations which have '4' in its name, for instance zxcv4. So, the result is:

Name variable:
asdf1

asdg2

asdh3

qwer2

 

And I need a list of deleted observation from dataset.

 

Here is the attributes of the TRD_STCK_CD variable in my dataset as the name variable.

 

Alphabetic List of Variables and Attributes

# Variable Type Len Format Informat Label

1 TRD_STCK_CD Char 15 $15. $15. TRD_STCK_CD

 

How can I do that?

 

Thanks in advance!

 

1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

You can also split your data into two datasets:

 

data ok nok;

 set have;

      if index(variable,'4') then output nok;

      else output ok;

run;

View solution in original post

6 REPLIES 6
Reeza
Super User

FIND() will find character, FINDW() will search for a word. 

There are also INDEX and INDEXW() functions that operate similarily.

 

Rather than 'delete' you can control where an observation is output to using an OUTPUT statement.

 

Here's an example using SASHELP.CLASS

 

data females males;
set sashelp.class;

if sex='F' then output females;
else if sex='M' then output males;

run;
aminkarimid
Lapis Lazuli | Level 10
Hello @Reza;
I do not know why do you recommend 'output' statement.
I want to remove these data from my dataset in data cleaning procedure
related to my research.
I am using big data (more than 300 million).
Thanks.
Reeza
Super User

Because you also said:

 

And I need a list of deleted observation from dataset.

Shmuel
Garnet | Level 18

@aminkarimid, to remove an observation from a dataset, you either define which observation to delete or

define which to output.

 

So you can use either

data want; 
 set have;
      if index(variable,'4') > 0 then delete;
run;

or:

data want;
 set have;
      in index(variable,'4') = 0 then output;
run;

You will get same results in both cases.

You can use other functions than INDEX, like FIND - as @Reeza mentiond.

 

Shmuel
Garnet | Level 18

You can also split your data into two datasets:

 

data ok nok;

 set have;

      if index(variable,'4') then output nok;

      else output ok;

run;

Jagadishkatam
Amethyst | Level 16

I believe you already got some responses, just to add to those

 

if you want to remove the data please try the perl regular expression functions which are an alternative to the regular search functions.

 

data have;
input Name_variable $20.;
if prxmatch('m/4$/i',strip(Name_variable))=0;
cards;
asdf1
asdg2
zxcv4
asdh3
qwer2
rtyu4
xcvb4
;
run;
Thanks,
Jag

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 4103 views
  • 2 likes
  • 4 in conversation