BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
tainaj
Obsidian | Level 7

Hey everyone!

I need to isolate observations from my dataset since I only need to look at specific data, but I am getting some errors.  A little information: for the countrycit_num variable, I only need U.S.A observations, and I only need 29-1000 and 31-1000 from the occuSelfDetail variable. Please let me know if I need to provide more details. 

My SAS Statement

proc import out=work.IAT2020 (keep=year birthyear birthSex raceomb002 D_biep_White_Good_all countrycit_num edu_14 occuSelfDetail politicalid_7)   

datafile="C:\Users\tjoseph6\Documents\My SAS Files\IAT2020.sav"   

dbms=sav replace;   

run;  

data IAT2020;  

set work.IAT2020;  

/*IF and ELSE for countrycit_num*/   

if countrycit_num eq "U.S.A.";  

run; 

data IAT2020;  

if occuSelfDetail eq '29-1000' then Occupation='Diagnosing and Treating Practioners';else Occupation=0 

if occuSelfDetail eq '31-1000' then Occupation='Nursing and Home Health Assistant';else Occupation=0 

run; 

/*Create Labels for variables*/   

label birthSex='Gender'   

raceomb_002='Race'   

D_biep_White_Good_all='Overall IAT D Score'   

edu_14='Level of Education'   

politicalid_7='Political Ideology Spectrum' 

run; 

 

This is my log:

NOTE: Invalid numeric data, 'U.S.A.' , at line 123 column 22. 

WARNING: Limit set by ERRORS= option reached.  Further errors of this type will not be printed. 

year=2020 birthyear=2005 birthSex=1 raceomb002=Multiracial D_biep_White_Good_all=0.898 

countrycit_num=U.S.A. edu_14=some high school occuSelfDetail=  politicalid_7=neutral _ERROR_=1 _N_=20 

NOTE: There were 1757576 observations read from the data set WORK.IAT2020. 

NOTE: The data set WORK.IAT2020 has 561674 observations and 9 variables. 

NOTE: DATA statement used (Total process time): 

      real time           1.30 seconds 

      cpu time            0.59 seconds 

  

  

125  data IAT2020; 

126  if occuSelfDetail eq '29-1000' then Occupation='Diagnosing and Treating Practioners';else 

126! Occupation=0; 

127  if occuSelfDetail eq '31-1000' then Occupation='Nursing and Home Health Assistant';else 

127! Occupation=0; 

128  run; 

  

NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column). 

      126:102   127:100 

NOTE: Variable occuSelfDetail is uninitialized. 

NOTE: The data set WORK.IAT2020 has 1 observations and 2 variables. 

NOTE: DATA statement used (Total process time): 

      real time           0.02 seconds 

      cpu time            0.01 seconds 

  

  

129  /*Create Labels for variables*/ 

130  label birthSex='Gender' 

     ----- 

     180 

ERROR 180-322: Statement is not valid or it is used out of proper order. 

  

131  raceomb_002='Race' 

132  D_biep_White_Good_all='Overall IAT D Score' 

133  edu_14='Level of Education' 

134  politicalid_7='Political Ideology Spectrum'; 

  

135  run; 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
proc import out=work.IAT2020_raw 
datafile="C:\Users\tjoseph6\Documents\My SAS Files\IAT2020.sav"   
dbms=sav replace;   
run;  

data IAT2020;  
set work.IAT2020_raw;  

where occuSelfDetail in ( '29-1000', '31-1000') &  countrycit_num eq "U.S.A.";

label birthSex='Gender'   
raceomb_002='Race'   
D_biep_White_Good_all='Overall IAT D Score'   
edu_14='Level of Education'   
politicalid_7='Political Ideology Spectrum';   

keep year birthyear birthSex raceomb002 D_biep_White_Good_all countrycit_num edu_14 occuSelfDetail politicalid_7; run;

View solution in original post

13 REPLIES 13
AMSAS
SAS Super FREQ

Your log and code don't appear to match 

Your code after the IF statement, has a comment then the label statement

 

if occuSelfDetail eq '31-1000' then Occupation='Nursing and Home Health Assistant';else Occupation=0;  
/*Create Labels for variables*/  
label birthSex='Gender'  

In the log, you appear to have a run statement between the if and label statements (log line 128):

 

 

127  if occuSelfDetail eq '31-1000' then Occupation='Nursing and Home Health Assistant';else 
127! Occupation=0; 
128  run; 

NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column). 
      126:102   127:100 
NOTE: Variable occuSelfDetail is uninitialized. 
NOTE: The data set WORK.IAT2020 has 1 observations and 2 variables. 
NOTE: DATA statement used (Total process time): 
      real time           0.02 seconds 
      cpu time            0.01 seconds 

129  /*Create Labels for variables*/ 
130  label birthSex='Gender' 
     ----- 
     180 
ERROR 180-322: Statement is not valid or it is used out of proper order. 

The earlier NOTE and WARNING indicate an issue with the data file "C:\Users\tjoseph6\Documents\My SAS Files\IAT2020.sav" specifically the PROC import appears to think that a numeric variable contains "U.S.A." look at row 123 in the data file, it's probably around there.

 

NOTE: Invalid numeric data, 'U.S.A.' , at line 123 column 22. 

WARNING: Limit set by ERRORS= option reached.  Further errors of this type will not be printed. 

year=2020 birthyear=2005 birthSex=1 raceomb002=Multiracial D_biep_White_Good_all=0.898 

 

tainaj
Obsidian | Level 7

Thank you for pointing out the differences! I just updated my original post with the code I actually ran. I am running SAS through VMware Horizon Client and was having trouble copying it from there. Also, I am unsure why it is recognizing the U.S.A. observation as numeric data especially since I don't see that error when I only run the Proc Import statement (see log below):

 

136  proc import out=work.IAT2020 (keep=year birthyear birthSex raceomb002 D_biep_White_Good_all 

136! countrycit_num edu_14 occuSelfDetail politicalid_7) 

137  datafile="C:\Users\tjoseph6\Documents\My SAS Files\IAT2020.sav" 

138  dbms=sav replace; 

139  run; 

  

NOTE:    Variable Name Change.  D_biep.White_Good_all -> D_biep_White_Good_all 

NOTE:    Variable Name Change.  D_biep.White_Good_36 -> D_biep_White_Good_36 

NOTE:    Variable Name Change.  D_biep.White_Good_47 -> D_biep_White_Good_47 

NOTE: One or more variables were converted because the data type is not supported by the V9 engine. 

      For more details, run with options MSGLEVEL=I. 

NOTE: The import data set has 1757576 observations and 505 variables. 

NOTE: WORK.IAT2020 data set was successfully created. 

NOTE: PROCEDURE IMPORT used (Total process time): 

      real time           20.99 seconds 

      cpu time            20.92 seconds 

 

Is it only because of how I wrote the IF statement or do you think there is something wrong with the Proc Import statement as well? Thank you in advance!

ballardw
Super User

One serious caution:

If you use this code:

data IAT2020;  
set work.IAT2020;  
/*IF and ELSE for countrycit_num*/   
if countrycit_num eq "U.S.A.";  
run; 

unless you have a USER library (by that name) then that step completely replaced your work.IAT2020 data set if it ran successfully and removed all records where countrysit is not "U.S.A". So you likely will need to re-import your data if you want to work with anything else.

It is seldom a good idea to have the output data set and the input data set the same, especially when recoding variables or subsetting the data because you destroy the source data.

 

Note that this code:

if occuSelfDetail eq '29-1000' then Occupation='Diagnosing and Treating Practioners';else Occupation=0;   

attempts to set the value of Occupation as Character when true and numeric when false. See the highlighted text. Which really doesn't match your description of "I only need 29-1000 and 31-1000 from the occuSelfDetail variable." Please describe what you thought you were attempting with this.

 

 

tainaj
Obsidian | Level 7
Thank you for your response! I initially created a Library to import the data, but I stopped doing this since I am using my university's virtual lab which deletes everything you do once you log out or times out (usually takes hours of inactivity for this to occur). Basically, I have to re-download the dataset again anyway, and it was a hassle to keep creating a library folder again and again as well if that makes sense. Also, I only want to use the dataset with the subsetted and recoded data, which made me make this final decision.

Thank you for pointing out the IF Else statement for occSelfDetail! My dataset is massive and there are over 100,000 observations under each variable. I only want to look at the observations for participants that are from the USA, and have occupations that are classified as 29-1000 (this represents Diagnosing and Treating Practioners) or 31-1000 (this represents Nursing and Home Health Assistant). For what you highlighted, I was not sure how to isolate only those two observations (29-1000 and 31-1000) from that variable. It didn't seem write to me either, but I cannot figure out how it is supposed to be written.
Please let me know if you require more details or clarification!
Reeza
Super User

Use IN

 

where occuSelfDetail in ( '29-1000', '31-1000') &  countrycit_num eq "U.S.A.";

 

Reeza
Super User
proc import out=work.IAT2020 (keep=year birthyear birthSex raceomb002 D_biep_White_Good_all countrycit_num edu_14 occuSelfDetail politicalid_7)   

datafile="C:\Users\tjoseph6\Documents\My SAS Files\IAT2020.sav"   

dbms=sav replace;   

run;  

data IAT2020;  

set work.IAT2020;  

/*IF and ELSE for countrycit_num*/   
*IAT2020 data set now only has USA data in it; if countrycit_num eq "U.S.A."; run; data IAT2020; *NO SET STATEMENT - THIS DATA SET NOW REPLACES YOUR IAT2020 data step from above and the rest of your program will not work;
*These IF statements don't make sense. Occupation is treated as character in first portion and then numeric (Occupation =0).;
*Also are you sure you want it set to 0 for everything or should it IF/ELSE IF instead of IF. Your second IF essentially will overwrite your
first condition entirely;

if occuSelfDetail eq '29-1000' then Occupation='Diagnosing and Treating Practioners';else Occupation=0; if occuSelfDetail eq '31-1000' then Occupation='Nursing and Home Health Assistant';else Occupation=0; run; /*Create Labels for variables*/ *MUST be run within a proc or data step - this is in neither - this has no effect on anything;
*Maybe prior RUN is too early? label birthSex='Gender' raceomb_002='Race' D_biep_White_Good_all='Overall IAT D Score' edu_14='Level of Education' politicalid_7='Political Ideology Spectrum'; run;

Comments on the code you did post. 

 

tainaj
Obsidian | Level 7
Thank yo so much for your detailed explanations! I honestly wasn't aware that I needed to re-write the set statement, but that makes sense! I will fix that. For the IF statements, I honestly do not want to set it to 0 and it is not necessary for me to create a different variable name (i.e. Occupation like I wrote in the statement). All I want to do is only have 29-1000 and 31-1000 observations under the occSelfDetail variable, but I am at a complete loss on how to do this. Do you know how this could be done? Please let me know if my explanation is not clear!

Also, thank you for pointing out what I need to add in front of the label statement!
Reeza
Super User
proc import out=work.IAT2020_raw 
datafile="C:\Users\tjoseph6\Documents\My SAS Files\IAT2020.sav"   
dbms=sav replace;   
run;  

data IAT2020;  
set work.IAT2020_raw;  

where occuSelfDetail in ( '29-1000', '31-1000') &  countrycit_num eq "U.S.A.";

label birthSex='Gender'   
raceomb_002='Race'   
D_biep_White_Good_all='Overall IAT D Score'   
edu_14='Level of Education'   
politicalid_7='Political Ideology Spectrum';   

keep year birthyear birthSex raceomb002 D_biep_White_Good_all countrycit_num edu_14 occuSelfDetail politicalid_7; run;
tainaj
Obsidian | Level 7

I tried this code, but I received this log statement. Do you know what this error in log means?

 

5 

6    data IAT2020; 

7    set work.IAT2020_raw; 

8 

9    where occuSelfDetail in ( '29-1000', '31-1000') &  countrycit_num eq "U.S.A."; 

ERROR: WHERE clause operator requires compatible variables. 

10 

11   label birthSex='Gender' 

12   raceomb_002='Race' 

13   D_biep_White_Good_all='Overall IAT D Score' 

14   edu_14='Level of Education' 

15   politicalid_7='Political Ideology Spectrum'; 

16   keep year birthyear birthSex raceomb002 D_biep_White_Good_all countrycit_num edu_14 

16 ! occuSelfDetail politicalid_7; 

17   run; 

  

NOTE: The SAS System stopped processing this step because of errors. 

WARNING: The data set WORK.IAT2020 may be incomplete.  When this step was stopped there were 0 

         observations and 9 variables. 

NOTE: DATA statement used (Total process time): 

      real time           0.09 seconds 

      cpu time            0.04 seconds 

Reeza
Super User
What's the format and type of the OCCUSELFDETAIL variable?
You can view that with a proc contents on the data set.
tainaj
Obsidian | Level 7
The type is Char and the format is $7.
Reeza
Super User
countrycit_num (note the term num) is character or numeric then as well?

Run a proc freq on both columns and make sure your string matches what's in the variables.
Check if coutnrycit_num has a format applied - proc contents again.

The error is telling you one of the variables is not character.
tainaj
Obsidian | Level 7
Oh wow I messed up on that! I somehow misread the codebook for countrycit_num because it is numeric (i.e. 1= USA). The code you initially wrote worked with the correction. I cannot thank you enough!!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 1693 views
  • 4 likes
  • 4 in conversation