Hi there,
I need help to cleaning and subsetting some large public data. I am trying to keep only certain variables, keep only certain variables that have a certain answer according to the code book, and delete observations if there is a blank/ (.). The only thing that successfully ran was dropping all the other variables, but there are 0 obs. This is what I tried that did not work:
Data want;
set have;
if somale= 1 2 3 4;
This isn't correct SAS syntax: if somale= 1 2 3 4;
If you only what to keep observations where the variable has values 1, 2, 3, or 4 then try this:
data want;
set have;
if somale in (1,2,3,4);
run;
When you have code that throws errors, and that code would have thrown many, you should copy the code and the errors from the log (HINT: READ the LOG), then on the forum open a text box and paste the code for the entire data step or procedure and all the notes, warnings and errors.
That code is not even going to run. Check the SAS log to see why.
5 if somale= 1 2 3 4; - 388 200 ERROR 388-185: Expecting an arithmetic operator. ERROR 200-322: The symbol is not recognized and will be ignored. 6 7 if sofemale= 1 2 3 4; - 388 200 8 if _race= 1 2 3 4 5 6 7 8; - 388 76 9 if sleptim1= 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24; - 388 76 10 if menthlth= 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 88; - 388 76 ERROR 388-185: Expecting an arithmetic operator. ERROR 200-322: The symbol is not recognized and will be ignored. ERROR 76-322: Syntax error, statement will be ignored. 11 keep somale sofemale _race sleptim1 menthlth; 12 if somale sofemale _race sleptim1 menthlth = . then delete; -------- 388 76 ERROR 388-185: Expecting an arithmetic operator. ERROR 76-322: Syntax error, statement will be ignored. 13 14 run;
Is sofemale a character variable? If so perhaps you meant:
if sofemale= '1 2 3 4';
Or is it a numeric variable? If so perhaps you meant:
if sofemale in (1 2 3 4);
okay, now that I have gotten more sleep, I realized where I was making some mistakes although I still haven't figured out how to fix them. Somale, sofemale, and _race are all categorical variables with 1-4 for each indicating a different characteristic while sleptim1 and menthlth are numerical (hours in a day and days in a month, respectively). I tried a combination of all of the below ideas but still get 0 obs. any ideas?
SAS has two types of variables. Fixed length character strings and floating point numbers.
Your "categorical" variables could be of either type. Check the definition of the variables (for example by using PROC CONTENTS) to see which variables are numeric an which are character.
For your "categorical" variables, or really any variable that can only have a small number of distinct values, use PROC FREQ to see what values exist in your dataset. You can add the MISSPRINT or MISSING option to include any missing values in the report.
proc freq data=HAVE ;
tables Somale sofemale _race / missing;
run;
For "continuous" variables, or really any numeric variable that might have a large number of distinct value, use PROC MEANS or PROC UNIVARIATE to look at the range of values.
proc means data=HAVE n nmiss min max mean ;
var sleptim1 menthlth;
run;
hey all,
Thanks for the help. I finally figured out I had an inherent problem by not merging both genders together before trying to clean the data.
@almay2121 wrote:
okay, now that I have gotten more sleep, I realized where I was making some mistakes although I still haven't figured out how to fix them. Somale, sofemale, and _race are all categorical variables with 1-4 for each indicating a different characteristic while sleptim1 and menthlth are numerical (hours in a day and days in a month, respectively). I tried a combination of all of the below ideas but still get 0 obs. any ideas?
Hint: provide a small example data set and indicate which records are to be kept or removed.
If you are only interesting in removing records where one or more of those variables has a "blank" you could try
Where not missing(somale) and not missing(sofemale) and not missing(_race) and not missing (sleptim1) and not missing(menthlth);
The missing function works with both character and numeric variables and returns a value of 1(for true) when the variable is missing any value. So the above should return records where none have a missing or blank value for the variable. HOWEVER, if those variables have lots of blanks it may be that you have no data where all of the values are present.
However I might suggest running Proc freq on those variables to see what values actually occur in your data. I seldom trust codebook without verifying them as odd things happen some times. I would verify the actual values before attempting to remove anything.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.