I tried to subset my data using WHERE and IF statements, so I can run a Proc Means statement from it, but the log is saying that there are no observations in the data set.
I am sure that there are other issues with my SAS code, but I believe that the where statement might be the biggest issue. The data I am using is publicly available. It is large so I can't attach it to this message, but it could be downloaded here: https://osf.io/7h9ux/ (Name is Race IAT 2016). I attached the codebook to this message just in case. Please let me know if you need more information!
SAS code:
proc import out=work.IAT2016_raw
datafile="C:\Users\tjoseph6\Documents\My SAS Files\IAT2016.sav"
dbms=sav replace;
run;
data IAT2016;
set work.IAT2016_raw (keep=year birthyear sex_5 raceomb_002 D_biep_White_Good_all countrycit occupation_self politicalid_7);
Age=year-birthyear;
where countrycit eq 'US' & occupation_self in ('29-1000','31-1000') & sex_5 in (1,2);
if D_biep_White_Good_all ne .;
if politicalid_7 eq "." then delete;
if Age lt 16 then delete;
if raceomb_002 eq "." then delete;
label sex_5='Gender'
raceomb_002='Race'
D_biep_White_Good_all='Overall IAT D Score'
politicalid_7='Political Ideology Spectrum'
occupation_self='Occupation'
countrycit='Country';
run;
proc contents data=IAT2016;
run;
proc means data=IAT2016;
class occupation_self;
var D_biep_White_Good_all;
output out = Summary
mean =
n = / autoname;
run;
LOG:
43 proc import out=work.IAT2016_raw
44 datafile="C:\Users\tjoseph6\Documents\My SAS Files\IAT2016.sav"
45 dbms=sav replace;
46 run;
NOTE: Variable Name Change. D_biep.White_Good_all -> D_biep_White_Good_all
NOTE: Variable Name Change. D_biep.White_Good_36 -> D_biep_White_Good_36
NOTE: Variable Name Change. D_biep.White_Good_47 -> D_biep_White_Good_47
NOTE: One or more variables were converted because the data type is not supported by the V9 engine.
For more details, run with options MSGLEVEL=I.
NOTE: The import data set has 1051105 observations and 534 variables.
NOTE: WORK.IAT2016_RAW data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 1:05.95
cpu time 23.67 seconds
47 data IAT2016;
48 set work.IAT2016_raw (keep=year birthyear sex_5 raceomb_002 D_biep_White_Good_all countrycit
48 ! occupation_self politicalid_7);
49 Age=year-birthyear;
50 where countrycit eq 'US' & occupation_self in ('29-1000','31-1000') & sex_5 in (1,2);
51 if D_biep_White_Good_all ne .;
52 if politicalid_7 eq "." then delete;
53 if Age lt 16 then delete;
54 if raceomb_002 eq "." then delete;
55 label sex_5='Gender'
56 raceomb_002='Race'
57 D_biep_White_Good_all='Overall IAT D Score'
58 politicalid_7='Political Ideology Spectrum'
59 occupation_self='Occupation'
60 countrycit='Country';
61 run;
NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
52:21 54:19
NOTE: There were 0 observations read from the data set WORK.IAT2016_RAW.
WHERE (countrycit='US') and occupation_self in ('29-1000', '31-1000') and sex_5 in (1, 2);
NOTE: The data set WORK.IAT2016 has 0 observations and 9 variables.
NOTE: DATA statement used (Total process time):
real time 35.62 seconds
cpu time 8.03 seconds
62 proc contents data=IAT2016;
63 run;
NOTE: PROCEDURE CONTENTS used (Total process time):
real time 0.03 seconds
cpu time 0.04 seconds
64 proc means data=IAT2016;
65 class occupation_self;
66 var D_biep_White_Good_all;
67 output out = Summary
68 mean =
69 n = / autoname;
70 run;
NOTE: No observations in data set WORK.IAT2016.
NOTE: The data set WORK.SUMMARY has 0 observations and 5 variables.
NOTE: PROCEDURE MEANS used (Total process time):
real time 0.05 seconds
cpu time 0.04 seconds
So is 8,364 the sample size you expected?
The issue was the occupation code. You were comparing the formatted value to a list of the coded values because you used the PUT() function in the WHERE statement.
Some of us refuse to download Excel files because they are a security risk. We also encourage not to visit unknown web sites.
So I can't see your data.
I don't see any syntax errors in your WHERE statement
where countrycit eq 'US' & occupation_self in ('29-1000','31-1000') & sex_5 in (1,2);
this looks fine, so the problem is that you are not finding any matching records, and so the next step is for YOU to LOOK AT the data in data set work.IAT2016_raw with your own eyes, and see what the reason is for the mismatch.
I apologize as I did not think of this, and the file downloads into a zip file but I still wasn't able to upload that because of the size. I'll try to see if compressing it myself will work somehow. I have looked at the original data multiple times. I also ran proc contents to look at the type as well. I even tried using other similar variables as well too but I still came up with the same result. The data that I tried to subset shows no observations so I can't really match it up to the original. Hopefully, I will be able to post the data somehow as I am at a complete loss.
Show us a portion (say the first 20 rows, and only the variables in the WHERE statement) of the SAS data set (not the Excel file) by following these instructions.
UPDATE: Include the variables mentioned by @ballardw
Sorry for the delay! I had trouble with the steps. This is the output of the first 20 without the WHERE and IF statements. I created the Age variable which is why the year and birthyear are added here as well. I posted the contents too just in case!
Update: Also, I originally used raceomb_002 instead of raceomb in my original post by accident (both variables are in the dataset but there are no observations under raceomb_002). This did not change my output though when I changed it to the correct variable for Race.
So, in these 20 records, are the conditions of the where statement met so that the record will be in the output data set? I pose this as a question for you to figure out and answer, given what you have showed us. Explain your answer.
No not in the first 20, but I ran the frequency table that was written by @ballardw and I found the combinations in which the conditions of the WHERE (along with the IF) statement would be met. What I need are individuals that identify as Male and Female that are US Citizens (shows as 1 in the output), and are either Healthcare - Nursing and Home Health Assistants or Healthcare - Diagnosing and Treating Practitioners (MD, Dentist, etc.). I find this combination in the frequency table.
I'm going to try to run it with only the WHERE statements (so I will omit the IF statements) as I think that was the issue all along!
It still says no observations. I am completely out of ideas. Please let me know if anyone has an idea or might have a clue about what is going wrong!
Also, I can't compress the data further to upload here since the zip file is already at 157MB.
@tainaj wrote:
No not in the first 20, but I ran the frequency table that was written by @ballardw and I found the combinations in which the conditions of the WHERE (along with the IF) statement would be met. What I need are individuals that identify as Male and Female that are US Citizens (shows as 1 in the output), and are either Healthcare - Nursing and Home Health Assistants or Healthcare - Diagnosing and Treating Practitioners (MD, Dentist, etc.). I find this combination in the frequency table.
I'm going to try to run it with only the WHERE statements (so I will omit the IF statements) as I think that was the issue all along!
Show us this output from PROC FREQ (part of it, so we can see records that should pass the WHERE statement)
one of the reasons why we have very specific instructions that we would like you to follow to show us the data (and I guess we need more than 20 rows) is because of the formats applied to the variables in the WHERE statement. We still need you to provide the data (more than 20 rows, I guess) following the instructions given.
Here are the results from the frequency table! Maybe I didn't follow the steps correctly if it didn't show what you needed to see. I downloaded the data2datastep from that instruction page, and I ran it. Then, I ran my data with the variables you stated to show. I'll copy and paste what I did in a second!
Update: here is the log statement showing what I ran below
471 select memlabel
------
180
ERROR 180-322: Statement is not valid or it is used out of proper order.
472 into :memlabel trimmed
473 from dictionary.tables
474 where libname="&lib"
WARNING: Apparent symbolic reference LIB not resolved.
475 and memname="&dsn"
WARNING: Apparent symbolic reference DSN not resolved.
476 ;
WARNING: Apparent symbolic reference FMT not resolved.
WARNING: Argument 2 to macro function %QSUBSTR is out of range.
WARNING: Argument 3 to macro function %QSUBSTR is out of range.
477 %if %qsubstr(%superq(fmt),1,1)=Y %then %do;
478 select strip(catx(' ',Name,format))
479 into :fmtlist separated by ' '
480 from dictionary.columns
481 where libname="&lib"
482 and memname="&dsn"
483 and format is not null
484 and format not like '$%'
485 ;
486 %end;
487 %else %let fmtlist=;
488 quit;
489
490 %put _local_;
491
492 data _null_;
493 file "&file" dsd;
WARNING: Apparent symbolic reference FILE not resolved.
WARNING: Apparent symbolic reference FILE not resolved.
494 if _n_ =1 then do;
495 %if %superq(memlabel)= %then %do;
ERROR: Nesting of %IF statements in open code is not supported. %IF ignored.
ERROR: Skipping to next %END statement.
496 put "data &outlib..&dsn;";
WARNING: Apparent symbolic reference OUTLIB not resolved.
WARNING: Apparent symbolic reference DSN not resolved.
497 %end;
498 %else %do;
ERROR: The %ELSE statement is not valid in open code.
WARNING: Apparent symbolic reference OUTLIB not resolved.
WARNING: Apparent symbolic reference DSN not resolved.
499 put "data &outlib..&dsn(label=%tslit(%superq(memlabel)));";
WARNING: Apparent symbolic reference OUTLIB not resolved.
WARNING: Apparent symbolic reference DSN not resolved.
WARNING: Apparent symbolic reference MEMLABEL not resolved.
WARNING: Apparent symbolic reference OUTLIB not resolved.
WARNING: Apparent symbolic reference DSN not resolved.
ERROR: The %END statement is not valid in open code.
WARNING: Apparent symbolic reference INPUTLIST not resolved.
500 %end;
501 put @3 "infile datalines dsd truncover;";
502 put @3 "input %superq(inputlist);";
503 %if not (%superq(fmtlist)=) %then %do;
504 put @3 "format %superq(fmtlist);";
505 %end;
506 %if not (%superq(lbllist)=) %then %do;
WARNING: Apparent symbolic reference LBLLIST not resolved.
507 put @3 "label %superq(lbllist);";
508 %end;
509 put "datalines4;";
510 end;
511 set &lib..&dsn(obs=&obs) end=last;
-
22
200
WARNING: Apparent symbolic reference LIB not resolved.
WARNING: Apparent symbolic reference DSN not resolved.
511 set &lib..&dsn(obs=&obs) end=last;
-
22
ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string, ;, CUROBS, END,
INDSNAME, KEY, KEYRESET, KEYS, NOBS, OPEN, POINT, _DATA_, _LAST_, _NULL_.
ERROR 200-322: The symbol is not recognized and will be ignored.
ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string,
a numeric constant, a datetime constant, a missing value, bitstring, INPUT, PUT.
ERROR: Undeclared array referenced: dsn.
WARNING: Apparent symbolic reference OBS not resolved.
511 set &lib..&dsn(obs=&obs) end=last;
---
22
ERROR 22-322: Syntax error, expecting one of the following: +, =.
ERROR: Variable dsn has not been declared as an array.
512 put &varlist @;
-
22
WARNING: Apparent symbolic reference VARLIST not resolved.
ERROR 22-322: Expecting a name.
513 if last then do;
514 put;
515 put ';;;;';
516 end;
517 else put;
518 run;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: DATA statement used (Total process time):
real time 0.14 seconds
cpu time 0.11 seconds
ERROR: No matching %MACRO statement for this %MEND statement.
519 %mend;
520 proc import out=work.IAT2016_raw
521 datafile="C:\Users\tjoseph6\Documents\My SAS Files\IAT2016.sav"
522 dbms=sav replace;
523 run;
NOTE: Variable Name Change. D_biep.White_Good_all -> D_biep_White_Good_all
NOTE: Variable Name Change. D_biep.White_Good_36 -> D_biep_White_Good_36
NOTE: Variable Name Change. D_biep.White_Good_47 -> D_biep_White_Good_47
NOTE: One or more variables were converted because the data type is not supported by the V9 engine.
For more details, run with options MSGLEVEL=I.
NOTE: The import data set has 1051105 observations and 534 variables.
NOTE: WORK.IAT2016_RAW data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 1:14.25
cpu time 27.12 seconds
524 data IAT2016;
525 set work.IAT2016_raw (keep=year birthyear sex_5 raceomb D_biep_White_Good_all countrycit
525! occupation_self politicalid_7);
526 Age=year-birthyear;
NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
454843 at 526:9
NOTE: There were 1051105 observations read from the data set WORK.IAT2016_RAW.
NOTE: The data set WORK.IAT2016 has 1051105 observations and 9 variables.
NOTE: DATA statement used (Total process time):
real time 39.02 seconds
cpu time 11.54 seconds
527 proc print data=IAT2016 (obs=20);
528 run;
NOTE: There were 20 observations read from the data set WORK.IAT2016.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.07 seconds
cpu time 0.03 seconds
529 proc import out=work.IAT2016_raw
530 datafile="C:\Users\tjoseph6\Documents\My SAS Files\IAT2016.sav"
531 dbms=sav replace;
532 run;
NOTE: Variable Name Change. D_biep.White_Good_all -> D_biep_White_Good_all
NOTE: Variable Name Change. D_biep.White_Good_36 -> D_biep_White_Good_36
NOTE: Variable Name Change. D_biep.White_Good_47 -> D_biep_White_Good_47
NOTE: One or more variables were converted because the data type is not supported by the V9 engine.
For more details, run with options MSGLEVEL=I.
NOTE: The import data set has 1051105 observations and 534 variables.
NOTE: WORK.IAT2016_RAW data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 1:11.29
cpu time 27.11 seconds
We have all been there. A second pair of eyes usually helps...
This from your log indicates some misunderstanding of your data or code:
52 if politicalid_7 eq "." then delete; 53 if Age lt 16 then delete; 54 if raceomb_002 eq "." then delete;
The note associated with lines 52 and 54 are that the varaibles are numeric. So the . should not be in quotes.
Or save some possible headaches with variable types and use: If missing(variable) then .... The Missing function uses the correct interpretation for both numeric and character variables.
You can check combinations of values that actually appear in your data with Proc Freq fairly easily.
Proc freq data= work.IAT2016_raw; tables countrycit*occupation_sel*sex_5 / list missing; run;
This may make a long table but will show all the combinations, including missing values for the variables on your where statement. If you do not see the value combination you request on WHERE then that is your answer to the where issue.
@tainaj wrote:
Thank you for your reply! I removed the quotes, and I produced the table as well. I see the combinations that I want in the table, so I guess the WHERE statement wasn't the issue like I thought.
You have a very possible issue with the actual value of your occupation variable vs the formatted value. The proc contents shows that you have a custom format assigned. So while proc print, freq or opening the table may show values like '29-1000' the actual value used with IN as you have written the code would use the unformatted value.
Try this to use the formatted value.
& put( occupation_self,$ocuppaa. ) in ('29-1000','31-1000')
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.