BookmarkSubscribeRSS Feed
almay2121
Calcite | Level 5

Hi there, 

 

I need help to cleaning and subsetting some large public data. I am trying to keep only certain variables, keep only certain variables that have a certain answer according to the code book, and delete observations if there is a blank/ (.). The only thing that successfully ran was dropping all the other variables, but there are 0 obs. This is what I tried that did not work:

 

Data want;

set have;

if somale= 1 2 3 4;

if sofemale= 1 2 3 4;
if _race= 1 2 3 4 5 6 7 8;
if sleptim1= 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24;
if menthlth= 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 88;
keep somale sofemale _race sleptim1 menthlth;
if somale sofemale _race sleptim1 menthlth = . then delete; 
 
run;

 

7 REPLIES 7
SASKiwi
PROC Star

This isn't correct SAS syntax: if somale= 1 2 3 4;

 

If you only what to keep observations where the variable has values 1, 2, 3, or 4 then try this:

data want;
  set have;
  if somale in (1,2,3,4);
run;
ballardw
Super User

When you have code that throws errors, and that code would have thrown many, you should copy the code and the errors from the log (HINT: READ the LOG), then on the forum open a text box and paste the code for the entire data step or procedure and all the notes, warnings and errors.

 

 

Tom
Super User Tom
Super User

That code is not even going to run.  Check the SAS log to see why.

5     if somale= 1 2 3 4;
                   -
                   388
                   200
ERROR 388-185: Expecting an arithmetic operator.

ERROR 200-322: The symbol is not recognized and will be ignored.

6
7     if sofemale= 1 2 3 4;
                     -
                     388
                     200
8     if _race= 1 2 3 4 5 6 7 8;
                  -
                  388
                  76
9     if sleptim1= 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24;
                     -
                     388
                     76
10    if menthlth= 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 88;
                     -
                     388
                     76
ERROR 388-185: Expecting an arithmetic operator.

ERROR 200-322: The symbol is not recognized and will be ignored.

ERROR 76-322: Syntax error, statement will be ignored.

11    keep somale sofemale _race sleptim1 menthlth;
12    if somale sofemale _race sleptim1 menthlth = . then delete;
                --------
                388
                76
ERROR 388-185: Expecting an arithmetic operator.

ERROR 76-322: Syntax error, statement will be ignored.

13
14    run;

Is sofemale a character variable?  If so perhaps you meant:

if sofemale= '1 2 3 4';

Or is it a numeric variable?  If so perhaps you meant:

if sofemale in (1 2 3 4);
almay2121
Calcite | Level 5

okay, now that I have gotten more sleep, I realized where I was making some mistakes although I still haven't figured out how to fix them. Somale, sofemale, and _race are all categorical variables with 1-4 for each indicating a different characteristic while sleptim1 and menthlth are numerical (hours in a day and days in a month, respectively). I tried a combination of all of the below ideas but still get 0 obs. any ideas? 

Tom
Super User Tom
Super User

SAS has two types of variables.  Fixed length character strings and floating point numbers.

Your "categorical" variables could be of either type.  Check the definition of the variables (for example by using PROC CONTENTS) to see which variables are numeric an which are character.

 

For your "categorical" variables, or really any variable that can only have a small number of distinct values, use PROC FREQ to see what values exist in your dataset.  You can add the MISSPRINT or MISSING option to include any missing values in the report.

proc freq data=HAVE ;
  tables Somale sofemale  _race / missing;
run;

For "continuous" variables, or really any numeric variable that might have a large number of distinct value, use PROC MEANS or PROC UNIVARIATE to look at the range of values.

proc means data=HAVE n nmiss min max mean ;
  var sleptim1 menthlth;
run;
almay2121
Calcite | Level 5

hey all,

Thanks for the help. I finally figured out I had an inherent problem by not merging both genders together before trying to clean the data. 

ballardw
Super User

@almay2121 wrote:

okay, now that I have gotten more sleep, I realized where I was making some mistakes although I still haven't figured out how to fix them. Somale, sofemale, and _race are all categorical variables with 1-4 for each indicating a different characteristic while sleptim1 and menthlth are numerical (hours in a day and days in a month, respectively). I tried a combination of all of the below ideas but still get 0 obs. any ideas? 


Hint: provide a small example data set and indicate which records are to be kept or removed.

 

If you are only interesting in removing records where one or more of those variables has a "blank" you could try

Where not missing(somale) and not missing(sofemale) and not missing(_race) and not missing (sleptim1) and not missing(menthlth);

The missing function works with both character and numeric variables and returns a value of 1(for true) when the variable is missing any value. So the above should return records where none have a missing or blank value for the variable. HOWEVER, if those variables have lots of blanks it may be that you have no data where all of the values are present.

 

However I might suggest running Proc freq on those variables to see what values actually occur in your data. I seldom trust codebook without verifying them as odd things happen some times. I would verify the actual values before attempting to remove anything.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 750 views
  • 0 likes
  • 4 in conversation