Hi
this is my third time trying to post this question but adding the log keeps flagging and deleting the question due to spam... @PaigeMiller you last commented so i hope you don't mind me tagging you into this new post.
the code:
PROC IMPORT
DATAFILE = '/home/imdb-videogames.csv'
OUT = Videogame_data
DBMS = CSV
REPLACE;
RUN;
DATA Videogame_data;
SET Videogame_data;
IF NOT MISSING(rating) AND NOT MISSING(votes) AND NOT MISSING(certificate);
RUN;
PROC SORT DATA=Videogame_data OUT=clean_videogamedata NODUPKEY;
BY name;
RUN;
DATA Videogame_data;
SET clean_videogamedata;
RUN;
proc sql;
create table multi_genre_games as
select name,
case when action = 'True' then 'Action' else '' end ||
case when adventure = 'True' then ', Adventure' else '' end ||
case when comedy = 'True' then ', Comedy' else '' end ||
case when crime = 'True' then ', Crime' else '' end ||
case when family = 'True' then ', Family' else '' end ||
case when fantasy = 'True' then ', Fantasy' else '' end ||
case when mystery = 'True' then ', Mystery' else '' end ||
case when sci_fi = 'True' then ', Sci-Fi' else '' end ||
case when thriller = 'True' then ', Thriller' else '' end as genres
from clean_videogamedata
group by name
having countw(genres, ',') > 1;
quit;
proc print data=multi_genre_games;
var name genres;
title 'Games with multiple genres';
run;
proc sql;
select count(*) as num_multi_genre_games
from multi_genre_games;
quit;
proc sql;
select count(*) as num_single_genre_games
from (
select name,
case when action = 'True' then 1 else 0 end +
case when adventure = 'True' then 1 else 0 end +
case when comedy = 'True' then 1 else 0 end +
case when crime = 'True' then 1 else 0 end +
case when family = 'True' then 1 else 0 end +
case when fantasy = 'True' then 1 else 0 end +
case when mystery = 'True' then 1 else 0 end +
case when sci_fi = 'True' then 1 else 0 end +
case when thriller = 'True' then 1 else 0 end as num_genres
from clean_videogamedata
)
where num_genres = 1;
quit;
proc freq data=multi_genre_games;
tables genres / nocum nopercent out=multi_genre_counts;
run;
proc sql;
create table game_counts as
select 'Multi-Genre' as type, count(*) as count from multi_genre_games
union all
select 'Single-Genre' as type, count(*) as count from clean_videogamedata where name not in (select name from multi_genre_games);
quit;
proc print data=game_counts;
title 'Game Genre Distribution';
run;
title 'Game Genre Distribution';
proc gchart data=game_counts;
pie type / sumvar=count;
run;
QUIT;
the log:
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 68 69 PROC IMPORT 70 DATAFILE = '/home/imdb-videogames.csv' 71 OUT = Videogame_data 72 DBMS = CSV 73 REPLACE; 74 RUN; NOTE: Unable to open parameter catalog: SASUSER.PARMS.PARMS.SLIST in update mode. Temporary parameter values will be saved to WORK.PARMS.PARMS.SLIST. Name is not a valid SAS name. Problems were detected with provided names. See LOG. 75 /********************************************************************** 76 * PRODUCT: SAS 77 * VERSION: 9.4 78 * CREATOR: External File Interface 79 * DATE: 01MAY23 80 * DESC: Generated SAS Datastep Code 81 * TEMPLATE SOURCE: (None Specified.) 82 ***********************************************************************/ 83 data WORK.VIDEOGAME_DATA ; 84 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */ 85 infile '/home/imdb-videogames.csv' delimiter = ',' MISSOVER DSD lrecl=32767 85 ! firstobs=2 ; 86 informat VAR1 best32. ; 87 informat name $29. ; 88 informat url $53. ; 89 informat year best32. ; 90 informat certificate $1. ; 91 informat rating best32. ; 92 informat votes $8. ; 93 informat plot $239. ; 94 informat Action $4. ; 95 informat Adventure $5. ; 96 informat Comedy $5. ; 97 informat Crime $5. ; 98 informat Family $5. ; 99 informat Fantasy $5. ; 100 informat Mystery $5. ; 101 informat "Sci-Fi"N $5. ; 102 informat Thriller $5. ; 103 format VAR1 best12. ; 104 format name $29. ; 105 format url $53. ; 106 format year best12. ; 107 format certificate $1. ; 108 format rating best12. ; 109 format votes $8. ; 110 format plot $239. ; 111 format Action $4. ; 112 format Adventure $5. ; 113 format Comedy $5. ; 114 format Crime $5. ; 115 format Family $5. ; 116 format Fantasy $5. ; 117 format Mystery $5. ; 118 format "Sci-Fi"N $5. ; 119 format Thriller $5. ; 120 input 121 VAR1 122 name $ 123 url $ 124 year 125 certificate $ 126 rating 127 votes $ 128 plot $ 129 Action $ 130 Adventure $ 131 Comedy $ 132 Crime $ 133 Family $ 134 Fantasy $ 135 Mystery $ 136 "Sci-Fi"N $ 137 Thriller $ 138 ; 139 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */ 140 run; NOTE: The infile '/home/imdb-videogames.csv' is: Filename=/home/imdb-videogames.csv, Owner Name=u63300316,Group Name=oda, Access Permission=-rw-r--r--, Last Modified=01 April 2023 13:21:12, File Size (bytes)=5156401 NOTE: Invalid data for VAR1 in line 6550 1-93. NOTE: Invalid data for year in line 6550 106-110. NOTE: Invalid data for rating in line 6550 118-122. RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0 6550 The Gameplay is similar to that of Ultimate Marvel vs. Capcom 3. It ... See full synopsis »",True,F 101 alse,False,False,False,False,False,True,False 145 VAR1=. name=True url=False year=. certificate=F rating=. votes=False plot=False Action=True Adventure=False Comedy= Crime= Family= Fantasy= Mystery= 'Sci-Fi'n= Thriller= _ERROR_=1 _N_=6549 NOTE: 20804 records were read from the infile '/home/imdb-videogames.csv'. The minimum record length was 130. The maximum record length was 471. NOTE: The data set WORK.VIDEOGAME_DATA has 20804 observations and 17 variables. NOTE: DATA statement used (Total process time): real time 0.04 seconds user cpu time 0.04 seconds system cpu time 0.01 seconds memory 10895.71k OS Memory 33312.00k Timestamp 01/05/2023 11:27:25 AM Step Count 108 Switch Count 2 Page Faults 0 Page Reclaims 288 Page Swaps 0 Voluntary Context Switches 11 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 16392 Errors detected in submitted DATA step. Examine log. 20804 rows created in WORK.VIDEOGAME_DATA from /home/imdb-videogames.csv. ERROR: Import unsuccessful. See SAS Log for details. NOTE: The SAS System stopped processing this step because of errors. NOTE: PROCEDURE IMPORT used (Total process time): real time 0.17 seconds user cpu time 0.15 seconds system cpu time 0.02 seconds memory 10895.71k OS Memory 33568.00k Timestamp 01/05/2023 11:27:25 AM Step Count 108 Switch Count 13 Page Faults 0 Page Reclaims 5043 Page Swaps 0 Voluntary Context Switches 101 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 16464 141 142 DATA Videogame_data; 143 SET Videogame_data; 144 IF NOT MISSING(rating) AND NOT MISSING(votes) AND NOT MISSING(certificate); 145 RUN; NOTE: There were 20804 observations read from the data set WORK.VIDEOGAME_DATA. NOTE: The data set WORK.VIDEOGAME_DATA has 6599 observations and 17 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds user cpu time 0.00 seconds system cpu time 0.01 seconds memory 3567.15k OS Memory 29100.00k Timestamp 01/05/2023 11:27:25 AM Step Count 109 Switch Count 3 Page Faults 0 Page Reclaims 462 Page Swaps 0 Voluntary Context Switches 16 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 5384 146 147 PROC SORT DATA=Videogame_data OUT=clean_videogamedata NODUPKEY; 148 BY name; 149 RUN; NOTE: There were 6599 observations read from the data set WORK.VIDEOGAME_DATA. NOTE: 785 observations with duplicate key values were deleted. NOTE: The data set WORK.CLEAN_VIDEOGAMEDATA has 5814 observations and 17 variables. NOTE: PROCEDURE SORT used (Total process time): real time 0.00 seconds user cpu time 0.00 seconds system cpu time 0.01 seconds memory 5769.84k OS Memory 30916.00k Timestamp 01/05/2023 11:27:25 AM Step Count 110 Switch Count 2 Page Faults 0 Page Reclaims 1068 Page Swaps 0 Voluntary Context Switches 11 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 4624 150 151 DATA Videogame_data; 152 SET clean_videogamedata; 153 RUN; NOTE: There were 5814 observations read from the data set WORK.CLEAN_VIDEOGAMEDATA. NOTE: The data set WORK.VIDEOGAME_DATA has 5814 observations and 17 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds user cpu time 0.00 seconds system cpu time 0.00 seconds memory 3566.65k OS Memory 29100.00k Timestamp 01/05/2023 11:27:25 AM Step Count 111 Switch Count 2 Page Faults 0 Page Reclaims 502 Page Swaps 0 Voluntary Context Switches 10 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 4616 154 155 proc sql; 156 create table multi_genre_games as 157 select name, 158 case when action = 'True' then 'Action' else '' end || 159 case when adventure = 'True' then ', Adventure' else '' end || 160 case when comedy = 'True' then ', Comedy' else '' end || 161 case when crime = 'True' then ', Crime' else '' end || 162 case when family = 'True' then ', Family' else '' end || 163 case when fantasy = 'True' then ', Fantasy' else '' end || 164 case when mystery = 'True' then ', Mystery' else '' end || 165 case when sci_fi = 'True' then ', Sci-Fi' else '' end || 166 case when thriller = 'True' then ', Thriller' else '' end as genres 167 from clean_videogamedata 168 group by name 169 having countw(genres, ',') > 1; WARNING: A GROUP BY clause has been transformed into an ORDER BY clause because neither the SELECT clause nor the optional HAVING clause of the associated table-expression referenced a summary function. ERROR: The following columns were not found in the contributing tables: sci_fi. NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements. 170 quit; NOTE: The SAS System stopped processing this step because of errors. NOTE: PROCEDURE SQL used (Total process time): real time 0.00 seconds user cpu time 0.00 seconds system cpu time 0.00 seconds memory 7092.43k OS Memory 32428.00k Timestamp 01/05/2023 11:27:25 AM Step Count 112 Switch Count 0 Page Faults 0 Page Reclaims 242 Page Swaps 0 Voluntary Context Switches 0 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 0 171 proc print data=multi_genre_games; ERROR: File WORK.MULTI_GENRE_GAMES.DATA does not exist. 172 var name genres; 173 title 'Games with multiple genres'; 174 run; NOTE: The SAS System stopped processing this step because of errors. NOTE: PROCEDURE PRINT used (Total process time): real time 0.00 seconds user cpu time 0.01 seconds system cpu time 0.00 seconds memory 279.50k OS Memory 26276.00k Timestamp 01/05/2023 11:27:25 AM Step Count 113 Switch Count 1 Page Faults 0 Page Reclaims 16 Page Swaps 0 Voluntary Context Switches 6 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 0 175 proc sql; 176 select count(*) as num_multi_genre_games 177 from multi_genre_games; ERROR: File WORK.MULTI_GENRE_GAMES.DATA does not exist. NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements. 178 quit; NOTE: The SAS System stopped processing this step because of errors. NOTE: PROCEDURE SQL used (Total process time): real time 0.00 seconds user cpu time 0.00 seconds system cpu time 0.00 seconds memory 306.71k OS Memory 26276.00k Timestamp 01/05/2023 11:27:25 AM Step Count 114 Switch Count 0 Page Faults 0 Page Reclaims 14 Page Swaps 0 Voluntary Context Switches 0 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 0 179 proc sql; 180 select count(*) as num_single_genre_games 181 from ( 182 select name, 183 case when action = 'True' then 1 else 0 end + 184 case when adventure = 'True' then 1 else 0 end + 185 case when comedy = 'True' then 1 else 0 end + 186 case when crime = 'True' then 1 else 0 end + 187 case when family = 'True' then 1 else 0 end + 188 case when fantasy = 'True' then 1 else 0 end + 189 case when mystery = 'True' then 1 else 0 end + 190 case when sci_fi = 'True' then 1 else 0 end + 191 case when thriller = 'True' then 1 else 0 end as num_genres 192 from clean_videogamedata 193 ) 194 where num_genres = 1; ERROR: The following columns were not found in the contributing tables: sci_fi. NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements. 195 quit; NOTE: The SAS System stopped processing this step because of errors. NOTE: PROCEDURE SQL used (Total process time): real time 0.00 seconds user cpu time 0.00 seconds system cpu time 0.00 seconds memory 7072.12k OS Memory 32428.00k Timestamp 01/05/2023 11:27:25 AM Step Count 115 Switch Count 0 Page Faults 0 Page Reclaims 246 Page Swaps 0 Voluntary Context Switches 0 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 8 196 proc freq data=multi_genre_games; ERROR: File WORK.MULTI_GENRE_GAMES.DATA does not exist. 197 tables genres / nocum nopercent out=multi_genre_counts; ERROR: No data set open to look up variables. 198 run; NOTE: The SAS System stopped processing this step because of errors. WARNING: The data set WORK.MULTI_GENRE_COUNTS may be incomplete. When this step was stopped there were 0 observations and 0 variables. WARNING: Data set WORK.MULTI_GENRE_COUNTS was not replaced because this step was stopped. NOTE: PROCEDURE FREQ used (Total process time): real time 0.00 seconds user cpu time 0.00 seconds system cpu time 0.00 seconds memory 422.90k OS Memory 26276.00k Timestamp 01/05/2023 11:27:25 AM Step Count 116 Switch Count 0 Page Faults 0 Page Reclaims 17 Page Swaps 0 Voluntary Context Switches 0 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 8 199 proc sql; 200 create table game_counts as 201 select 'Multi-Genre' as type, count(*) as count from multi_genre_games 202 union all 203 select 'Single-Genre' as type, count(*) as count from clean_videogamedata where name not in (select name from 203 ! multi_genre_games); ERROR: File WORK.MULTI_GENRE_GAMES.DATA does not exist. ERROR: File WORK.MULTI_GENRE_GAMES.DATA does not exist. NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements. 204 quit; NOTE: The SAS System stopped processing this step because of errors. NOTE: PROCEDURE SQL used (Total process time): real time 0.00 seconds user cpu time 0.01 seconds system cpu time 0.01 seconds memory 1915.37k OS Memory 27304.00k Timestamp 01/05/2023 11:27:25 AM Step Count 117 Switch Count 0 Page Faults 0 Page Reclaims 240 Page Swaps 0 Voluntary Context Switches 0 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 8 205 proc print data=game_counts; ERROR: File WORK.GAME_COUNTS.DATA does not exist. 206 title 'Game Genre Distribution'; 207 run; NOTE: The SAS System stopped processing this step because of errors. NOTE: PROCEDURE PRINT used (Total process time): real time 0.00 seconds user cpu time 0.00 seconds system cpu time 0.00 seconds memory 280.15k OS Memory 26276.00k Timestamp 01/05/2023 11:27:25 AM Step Count 118 Switch Count 1 Page Faults 0 Page Reclaims 16 Page Swaps 0 Voluntary Context Switches 6 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 0 208 title 'Game Genre Distribution'; 209 proc gchart data=game_counts; ERROR: File WORK.GAME_COUNTS.DATA does not exist. 210 pie type / sumvar=count; ERROR: No data set open to look up variables. ERROR: No data set open to look up variables. NOTE: The previous statement has been deleted. 211 run; 212 QUIT; NOTE: The SAS System stopped processing this step because of errors. NOTE: PROCEDURE GCHART used (Total process time): real time 0.00 seconds user cpu time 0.00 seconds system cpu time 0.00 seconds memory 298.84k OS Memory 26276.00k Timestamp 01/05/2023 11:27:25 AM Step Count 119 Switch Count 0 Page Faults 0 Page Reclaims 14 Page Swaps 0 Voluntary Context Switches 0 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 0 213 214 215 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 225
"File does not exist' is not the error you need to fix. You need to fix the first error before checking if the later errors also need fixing.
The first error is that the variable SCI_FI does not exists. That is what is causing the dataset to not get created.
Add the validvarname=V7 option. Someone must have set your default to use validvarname=ANY so every time you start a new SAS session if goes back to that.
Also add the GUESSINGROWS=MAX; statement to your PROC IMPORT.
options validvarname=v7;
PROC IMPORT
DATAFILE = '/home/imdb-videogames.csv'
OUT = Videogame_data
DBMS = CSV
REPLACE
;
guessingrows=max;
RUN;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.