BookmarkSubscribeRSS Feed
Justin9
Obsidian | Level 7

I have a dataset which has 100,000 accounts. There are three variables: account_number, type and name. I have two questions.

 

Question 1

-In the 'name' variable, I have seen some observations that have punctuation/special characters e.g. "Mr James Thomas (ES1)". Is there a proc freq that I could use to find how many observations have punctuation/special characters e.g. brackets, commas etc., as I would like to know how many accounts that I would exclude from my original dataset in order to create the dataset that I want?

 

Question 2

-In my final dataset, I would like to exclude accounts in the 'name' variable where punctuation/special characters exist in the name e.g. brackets are part of the observation e.g. e.g. "Mr James Thomas (ES1)", as I just want to keep observations that only have letters/names e.g. "Mr Brian Wilson". What would be the correct code in my where statement to set accounts where the 'name' variable does not contain any punctuation/special characters?

data final_dataset;
   set test;
   where type=1 and name not contains [code to ensure observations in 'name' variable do not have punctuation or special characters] ; /*What code to keep names that do not have any punctuation/special characters e.g. Mr James Wilson (ES1) should be excluded, as it has brackets?*/
run;

 

3 REPLIES 3
Justin9
Obsidian | Level 7

I don't think the post helps with my two questions. I would like to do a proc freq to see how many accounts I would have to exclude (if a name has any punctuation/special character, it will have to be excluded), and I don't know what could be used for my second question in the where statement I wrote to only keep accounts that do not have any special characters/punctuation in the name.

 

If anyone has the exact code I could use to answer my two questions, that would be greatly appreciated!

PaigeMiller
Diamond | Level 26

@Justin9 wrote:

I don't think the post helps with my two questions. I would like to do a proc freq to see how many accounts I would have to exclude (if a name has any punctuation/special character, it will have to be excluded)


This method identifies the observations you want and the ones you don't want; you can create a flag variable with values of 0 or 1. Then run PROC FREQ on the flag variable.

--
Paige Miller
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1246 views
  • 0 likes
  • 3 in conversation