I have a dataset which has 100,000 accounts. There are three variables: account_number, type and name. I have two questions.
Question 1
-In the 'name' variable, I have seen some observations that have punctuation/special characters e.g. "Mr James Thomas (ES1)". Is there a proc freq that I could use to find how many observations have punctuation/special characters e.g. brackets, commas etc., as I would like to know how many accounts that I would exclude from my original dataset in order to create the dataset that I want?
Question 2
-In my final dataset, I would like to exclude accounts in the 'name' variable where punctuation/special characters exist in the name e.g. brackets are part of the observation e.g. e.g. "Mr James Thomas (ES1)", as I just want to keep observations that only have letters/names e.g. "Mr Brian Wilson". What would be the correct code in my where statement to set accounts where the 'name' variable does not contain any punctuation/special characters?
data final_dataset;
set test;
where type=1 and name not contains [code to ensure observations in 'name' variable do not have punctuation or special characters] ; /*What code to keep names that do not have any punctuation/special characters e.g. Mr James Wilson (ES1) should be excluded, as it has brackets?*/
run;
I don't think the post helps with my two questions. I would like to do a proc freq to see how many accounts I would have to exclude (if a name has any punctuation/special character, it will have to be excluded), and I don't know what could be used for my second question in the where statement I wrote to only keep accounts that do not have any special characters/punctuation in the name.
If anyone has the exact code I could use to answer my two questions, that would be greatly appreciated!
@Justin9 wrote:
I don't think the post helps with my two questions. I would like to do a proc freq to see how many accounts I would have to exclude (if a name has any punctuation/special character, it will have to be excluded)
This method identifies the observations you want and the ones you don't want; you can create a flag variable with values of 0 or 1. Then run PROC FREQ on the flag variable.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.