Hey everyone!
I need to isolate observations from my dataset since I only need to look at specific data, but I am getting some errors. A little information: for the countrycit_num variable, I only need U.S.A observations, and I only need 29-1000 and 31-1000 from the occuSelfDetail variable. Please let me know if I need to provide more details.
My SAS Statement
proc import out=work.IAT2020 (keep=year birthyear birthSex raceomb002 D_biep_White_Good_all countrycit_num edu_14 occuSelfDetail politicalid_7)
datafile="C:\Users\tjoseph6\Documents\My SAS Files\IAT2020.sav"
dbms=sav replace;
run;
data IAT2020;
set work.IAT2020;
/*IF and ELSE for countrycit_num*/
if countrycit_num eq "U.S.A.";
run;
data IAT2020;
if occuSelfDetail eq '29-1000' then Occupation='Diagnosing and Treating Practioners';else Occupation=0;
if occuSelfDetail eq '31-1000' then Occupation='Nursing and Home Health Assistant';else Occupation=0;
run;
/*Create Labels for variables*/
label birthSex='Gender'
raceomb_002='Race'
D_biep_White_Good_all='Overall IAT D Score'
edu_14='Level of Education'
politicalid_7='Political Ideology Spectrum';
run;
This is my log:
NOTE: Invalid numeric data, 'U.S.A.' , at line 123 column 22.
WARNING: Limit set by ERRORS= option reached. Further errors of this type will not be printed.
year=2020 birthyear=2005 birthSex=1 raceomb002=Multiracial D_biep_White_Good_all=0.898
countrycit_num=U.S.A. edu_14=some high school occuSelfDetail= politicalid_7=neutral _ERROR_=1 _N_=20
NOTE: There were 1757576 observations read from the data set WORK.IAT2020.
NOTE: The data set WORK.IAT2020 has 561674 observations and 9 variables.
NOTE: DATA statement used (Total process time):
real time 1.30 seconds
cpu time 0.59 seconds
125 data IAT2020;
126 if occuSelfDetail eq '29-1000' then Occupation='Diagnosing and Treating Practioners';else
126! Occupation=0;
127 if occuSelfDetail eq '31-1000' then Occupation='Nursing and Home Health Assistant';else
127! Occupation=0;
128 run;
NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).
126:102 127:100
NOTE: Variable occuSelfDetail is uninitialized.
NOTE: The data set WORK.IAT2020 has 1 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds
129 /*Create Labels for variables*/
130 label birthSex='Gender'
-----
180
ERROR 180-322: Statement is not valid or it is used out of proper order.
131 raceomb_002='Race'
132 D_biep_White_Good_all='Overall IAT D Score'
133 edu_14='Level of Education'
134 politicalid_7='Political Ideology Spectrum';
135 run;
proc import out=work.IAT2020_raw
datafile="C:\Users\tjoseph6\Documents\My SAS Files\IAT2020.sav"
dbms=sav replace;
run;
data IAT2020;
set work.IAT2020_raw;
where occuSelfDetail in ( '29-1000', '31-1000') & countrycit_num eq "U.S.A.";
label birthSex='Gender'
raceomb_002='Race'
D_biep_White_Good_all='Overall IAT D Score'
edu_14='Level of Education'
politicalid_7='Political Ideology Spectrum';
keep year birthyear birthSex raceomb002 D_biep_White_Good_all countrycit_num edu_14 occuSelfDetail politicalid_7;
run;
Your log and code don't appear to match
Your code after the IF statement, has a comment then the label statement
if occuSelfDetail eq '31-1000' then Occupation='Nursing and Home Health Assistant';else Occupation=0;
/*Create Labels for variables*/
label birthSex='Gender'
In the log, you appear to have a run statement between the if and label statements (log line 128):
127 if occuSelfDetail eq '31-1000' then Occupation='Nursing and Home Health Assistant';else
127! Occupation=0;
128 run;
NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).
126:102 127:100
NOTE: Variable occuSelfDetail is uninitialized.
NOTE: The data set WORK.IAT2020 has 1 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds
129 /*Create Labels for variables*/
130 label birthSex='Gender'
-----
180
ERROR 180-322: Statement is not valid or it is used out of proper order.
The earlier NOTE and WARNING indicate an issue with the data file "C:\Users\tjoseph6\Documents\My SAS Files\IAT2020.sav" specifically the PROC import appears to think that a numeric variable contains "U.S.A." look at row 123 in the data file, it's probably around there.
NOTE: Invalid numeric data, 'U.S.A.' , at line 123 column 22. WARNING: Limit set by ERRORS= option reached. Further errors of this type will not be printed. year=2020 birthyear=2005 birthSex=1 raceomb002=Multiracial D_biep_White_Good_all=0.898
Thank you for pointing out the differences! I just updated my original post with the code I actually ran. I am running SAS through VMware Horizon Client and was having trouble copying it from there. Also, I am unsure why it is recognizing the U.S.A. observation as numeric data especially since I don't see that error when I only run the Proc Import statement (see log below):
136 proc import out=work.IAT2020 (keep=year birthyear birthSex raceomb002 D_biep_White_Good_all
136! countrycit_num edu_14 occuSelfDetail politicalid_7)
137 datafile="C:\Users\tjoseph6\Documents\My SAS Files\IAT2020.sav"
138 dbms=sav replace;
139 run;
NOTE: Variable Name Change. D_biep.White_Good_all -> D_biep_White_Good_all
NOTE: Variable Name Change. D_biep.White_Good_36 -> D_biep_White_Good_36
NOTE: Variable Name Change. D_biep.White_Good_47 -> D_biep_White_Good_47
NOTE: One or more variables were converted because the data type is not supported by the V9 engine.
For more details, run with options MSGLEVEL=I.
NOTE: The import data set has 1757576 observations and 505 variables.
NOTE: WORK.IAT2020 data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 20.99 seconds
cpu time 20.92 seconds
Is it only because of how I wrote the IF statement or do you think there is something wrong with the Proc Import statement as well? Thank you in advance!
One serious caution:
If you use this code:
data IAT2020; set work.IAT2020; /*IF and ELSE for countrycit_num*/ if countrycit_num eq "U.S.A."; run;
unless you have a USER library (by that name) then that step completely replaced your work.IAT2020 data set if it ran successfully and removed all records where countrysit is not "U.S.A". So you likely will need to re-import your data if you want to work with anything else.
It is seldom a good idea to have the output data set and the input data set the same, especially when recoding variables or subsetting the data because you destroy the source data.
Note that this code:
if occuSelfDetail eq '29-1000' then Occupation='Diagnosing and Treating Practioners';else Occupation=0;
attempts to set the value of Occupation as Character when true and numeric when false. See the highlighted text. Which really doesn't match your description of "I only need 29-1000 and 31-1000 from the occuSelfDetail variable." Please describe what you thought you were attempting with this.
Use IN
where occuSelfDetail in ( '29-1000', '31-1000') & countrycit_num eq "U.S.A.";
proc import out=work.IAT2020 (keep=year birthyear birthSex raceomb002 D_biep_White_Good_all countrycit_num edu_14 occuSelfDetail politicalid_7) datafile="C:\Users\tjoseph6\Documents\My SAS Files\IAT2020.sav" dbms=sav replace; run; data IAT2020; set work.IAT2020; /*IF and ELSE for countrycit_num*/
*IAT2020 data set now only has USA data in it; if countrycit_num eq "U.S.A."; run; data IAT2020; *NO SET STATEMENT - THIS DATA SET NOW REPLACES YOUR IAT2020 data step from above and the rest of your program will not work;
*These IF statements don't make sense. Occupation is treated as character in first portion and then numeric (Occupation =0).;
*Also are you sure you want it set to 0 for everything or should it IF/ELSE IF instead of IF. Your second IF essentially will overwrite your
first condition entirely;
if occuSelfDetail eq '29-1000' then Occupation='Diagnosing and Treating Practioners';else Occupation=0; if occuSelfDetail eq '31-1000' then Occupation='Nursing and Home Health Assistant';else Occupation=0; run; /*Create Labels for variables*/ *MUST be run within a proc or data step - this is in neither - this has no effect on anything;
*Maybe prior RUN is too early? label birthSex='Gender' raceomb_002='Race' D_biep_White_Good_all='Overall IAT D Score' edu_14='Level of Education' politicalid_7='Political Ideology Spectrum'; run;
Comments on the code you did post.
proc import out=work.IAT2020_raw
datafile="C:\Users\tjoseph6\Documents\My SAS Files\IAT2020.sav"
dbms=sav replace;
run;
data IAT2020;
set work.IAT2020_raw;
where occuSelfDetail in ( '29-1000', '31-1000') & countrycit_num eq "U.S.A.";
label birthSex='Gender'
raceomb_002='Race'
D_biep_White_Good_all='Overall IAT D Score'
edu_14='Level of Education'
politicalid_7='Political Ideology Spectrum';
keep year birthyear birthSex raceomb002 D_biep_White_Good_all countrycit_num edu_14 occuSelfDetail politicalid_7;
run;
I tried this code, but I received this log statement. Do you know what this error in log means?
5
6 data IAT2020;
7 set work.IAT2020_raw;
8
9 where occuSelfDetail in ( '29-1000', '31-1000') & countrycit_num eq "U.S.A.";
ERROR: WHERE clause operator requires compatible variables.
10
11 label birthSex='Gender'
12 raceomb_002='Race'
13 D_biep_White_Good_all='Overall IAT D Score'
14 edu_14='Level of Education'
15 politicalid_7='Political Ideology Spectrum';
16 keep year birthyear birthSex raceomb002 D_biep_White_Good_all countrycit_num edu_14
16 ! occuSelfDetail politicalid_7;
17 run;
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.IAT2020 may be incomplete. When this step was stopped there were 0
observations and 9 variables.
NOTE: DATA statement used (Total process time):
real time 0.09 seconds
cpu time 0.04 seconds
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.