BookmarkSubscribeRSS Feed
jacko1801
Obsidian | Level 7

Hi

this is my third time trying to post this question but adding the log keeps flagging and deleting the question due to spam... @PaigeMiller you last commented so i hope you don't mind me tagging you into this new post. 

 

the code:

PROC IMPORT 
  DATAFILE = '/home/imdb-videogames.csv'
  OUT = Videogame_data
  DBMS = CSV 
  REPLACE;
RUN;

DATA Videogame_data;
  SET Videogame_data;
  IF NOT MISSING(rating) AND NOT MISSING(votes) AND NOT MISSING(certificate);
RUN;

PROC SORT DATA=Videogame_data OUT=clean_videogamedata NODUPKEY;
  BY name;
RUN;

DATA Videogame_data;
  SET clean_videogamedata;
RUN;

proc sql;
  create table multi_genre_games as
  select name, 
         case when action = 'True' then 'Action' else '' end ||
         case when adventure = 'True' then ', Adventure' else '' end ||
         case when comedy = 'True' then ', Comedy' else '' end ||
         case when crime = 'True' then ', Crime' else '' end ||
         case when family = 'True' then ', Family' else '' end ||
         case when fantasy = 'True' then ', Fantasy' else '' end ||
         case when mystery = 'True' then ', Mystery' else '' end ||
         case when sci_fi = 'True' then ', Sci-Fi' else '' end ||
         case when thriller = 'True' then ', Thriller' else '' end as genres
  from clean_videogamedata
  group by name
  having countw(genres, ',') > 1;
quit;
proc print data=multi_genre_games;
  var name genres;
  title 'Games with multiple genres';
run;
proc sql;
  select count(*) as num_multi_genre_games
  from multi_genre_games;
quit;
proc sql;
  select count(*) as num_single_genre_games
  from (
    select name,
           case when action = 'True' then 1 else 0 end +
           case when adventure = 'True' then 1 else 0 end +
           case when comedy = 'True' then 1 else 0 end +
           case when crime = 'True' then 1 else 0 end +
           case when family = 'True' then 1 else 0 end +
           case when fantasy = 'True' then 1 else 0 end +
           case when mystery = 'True' then 1 else 0 end +
           case when sci_fi = 'True' then 1 else 0 end +
           case when thriller = 'True' then 1 else 0 end as num_genres
    from clean_videogamedata
  )
  where num_genres = 1;
quit;
proc freq data=multi_genre_games;
  tables genres / nocum nopercent out=multi_genre_counts;
run;
proc sql;
  create table game_counts as
  select 'Multi-Genre' as type, count(*) as count from multi_genre_games
  union all
  select 'Single-Genre' as type, count(*) as count from clean_videogamedata where name not in (select name from multi_genre_games);
quit;
proc print data=game_counts;
  title 'Game Genre Distribution';
run;
title 'Game Genre Distribution';
proc gchart data=game_counts;
  pie type / sumvar=count;
run;
QUIT;

 

the log:

1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 68         
 69         PROC IMPORT
 70           DATAFILE = '/home/imdb-videogames.csv'
 71           OUT = Videogame_data
 72           DBMS = CSV
 73           REPLACE;
 74         RUN;
 
 NOTE: Unable to open parameter catalog: SASUSER.PARMS.PARMS.SLIST in update mode. Temporary parameter values will be saved to 
 WORK.PARMS.PARMS.SLIST.
 Name  is not a valid SAS name.
 Problems were detected with provided names.  See LOG. 
 75          /**********************************************************************
 76          *   PRODUCT:   SAS
 77          *   VERSION:   9.4
 78          *   CREATOR:   External File Interface
 79          *   DATE:      01MAY23
 80          *   DESC:      Generated SAS Datastep Code
 81          *   TEMPLATE SOURCE:  (None Specified.)
 82          ***********************************************************************/
 83             data WORK.VIDEOGAME_DATA    ;
 84             %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
 85             infile '/home/imdb-videogames.csv' delimiter = ',' MISSOVER DSD lrecl=32767
 85       ! firstobs=2 ;
 86                informat VAR1 best32. ;
 87                informat name $29. ;
 88                informat url $53. ;
 89                informat year best32. ;
 90                informat certificate $1. ;
 91                informat rating best32. ;
 92                informat votes $8. ;
 93                informat plot $239. ;
 94                informat Action $4. ;
 95                informat Adventure $5. ;
 96                informat Comedy $5. ;
 97                informat Crime $5. ;
 98                informat Family $5. ;
 99                informat Fantasy $5. ;
 100               informat Mystery $5. ;
 101               informat "Sci-Fi"N $5. ;
 102               informat Thriller $5. ;
 103               format VAR1 best12. ;
 104               format name $29. ;
 105               format url $53. ;
 106               format year best12. ;
 107               format certificate $1. ;
 108               format rating best12. ;
 109               format votes $8. ;
 110               format plot $239. ;
 111               format Action $4. ;
 112               format Adventure $5. ;
 113               format Comedy $5. ;
 114               format Crime $5. ;
 115               format Family $5. ;
 116               format Fantasy $5. ;
 117               format Mystery $5. ;
 118               format "Sci-Fi"N $5. ;
 119               format Thriller $5. ;
 120            input
 121                        VAR1
 122                        name  $
 123                        url  $
 124                        year
 125                        certificate  $
 126                        rating
 127                        votes  $
 128                        plot  $
 129                        Action  $
 130                        Adventure  $
 131                        Comedy  $
 132                        Crime  $
 133                        Family  $
 134                        Fantasy  $
 135                        Mystery  $
 136                        "Sci-Fi"N  $
 137                        Thriller  $
 138            ;
 139            if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */
 140            run;
 
 NOTE: The infile '/home/imdb-videogames.csv' is:
       Filename=/home/imdb-videogames.csv,
       Owner Name=u63300316,Group Name=oda,
       Access Permission=-rw-r--r--,
       Last Modified=01 April 2023 13:21:12,
       File Size (bytes)=5156401
 
 NOTE: Invalid data for VAR1 in line 6550 1-93.
 NOTE: Invalid data for year in line 6550 106-110.
 NOTE: Invalid data for rating in line 6550 118-122.
 RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0                      
 6550      The Gameplay is similar to that of Ultimate Marvel vs. Capcom 3. It ... See full synopsis »",True,F
      101  alse,False,False,False,False,False,True,False 145
 VAR1=. name=True url=False year=. certificate=F rating=. votes=False plot=False Action=True Adventure=False Comedy=  Crime= 
 Family=  Fantasy=  Mystery=  'Sci-Fi'n=  Thriller=  _ERROR_=1 _N_=6549
 NOTE: 20804 records were read from the infile '/home/imdb-videogames.csv'.
       The minimum record length was 130.
       The maximum record length was 471.
 NOTE: The data set WORK.VIDEOGAME_DATA has 20804 observations and 17 variables.
 NOTE: DATA statement used (Total process time):
       real time           0.04 seconds
       user cpu time       0.04 seconds
       system cpu time     0.01 seconds
       memory              10895.71k
       OS Memory           33312.00k
       Timestamp           01/05/2023 11:27:25 AM
       Step Count                        108  Switch Count  2
       Page Faults                       0
       Page Reclaims                     288
       Page Swaps                        0
       Voluntary Context Switches        11
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           16392
       
 
 Errors detected in submitted DATA step. Examine log.
 20804 rows created in WORK.VIDEOGAME_DATA from /home/imdb-videogames.csv.
   
   
   
 ERROR: Import unsuccessful.  See SAS Log for details.
 NOTE: The SAS System stopped processing this step because of errors.
 NOTE: PROCEDURE IMPORT used (Total process time):
       real time           0.17 seconds
       user cpu time       0.15 seconds
       system cpu time     0.02 seconds
       memory              10895.71k
       OS Memory           33568.00k
       Timestamp           01/05/2023 11:27:25 AM
       Step Count                        108  Switch Count  13
       Page Faults                       0
       Page Reclaims                     5043
       Page Swaps                        0
       Voluntary Context Switches        101
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           16464
       
 141        
 
 
 142        DATA Videogame_data;
 143          SET Videogame_data;
 144          IF NOT MISSING(rating) AND NOT MISSING(votes) AND NOT MISSING(certificate);
 145        RUN;
 
 NOTE: There were 20804 observations read from the data set WORK.VIDEOGAME_DATA.
 NOTE: The data set WORK.VIDEOGAME_DATA has 6599 observations and 17 variables.
 NOTE: DATA statement used (Total process time):
       real time           0.01 seconds
       user cpu time       0.00 seconds
       system cpu time     0.01 seconds
       memory              3567.15k
       OS Memory           29100.00k
       Timestamp           01/05/2023 11:27:25 AM
       Step Count                        109  Switch Count  3
       Page Faults                       0
       Page Reclaims                     462
       Page Swaps                        0
       Voluntary Context Switches        16
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           5384
       
 
 146        
 147        PROC SORT DATA=Videogame_data OUT=clean_videogamedata NODUPKEY;
 148          BY name;
 149        RUN;
 
 NOTE: There were 6599 observations read from the data set WORK.VIDEOGAME_DATA.
 NOTE: 785 observations with duplicate key values were deleted.
 NOTE: The data set WORK.CLEAN_VIDEOGAMEDATA has 5814 observations and 17 variables.
 NOTE: PROCEDURE SORT used (Total process time):
       real time           0.00 seconds
       user cpu time       0.00 seconds
       system cpu time     0.01 seconds
       memory              5769.84k
       OS Memory           30916.00k
       Timestamp           01/05/2023 11:27:25 AM
       Step Count                        110  Switch Count  2
       Page Faults                       0
       Page Reclaims                     1068
       Page Swaps                        0
       Voluntary Context Switches        11
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           4624
       
 
 150        
 151        DATA Videogame_data;
 152          SET clean_videogamedata;
 153        RUN;
 
 NOTE: There were 5814 observations read from the data set WORK.CLEAN_VIDEOGAMEDATA.
 NOTE: The data set WORK.VIDEOGAME_DATA has 5814 observations and 17 variables.
 NOTE: DATA statement used (Total process time):
       real time           0.00 seconds
       user cpu time       0.00 seconds
       system cpu time     0.00 seconds
       memory              3566.65k
       OS Memory           29100.00k
       Timestamp           01/05/2023 11:27:25 AM
       Step Count                        111  Switch Count  2
       Page Faults                       0
       Page Reclaims                     502
       Page Swaps                        0
       Voluntary Context Switches        10
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           4616
       
 
 154        
 155        proc sql;
 156          create table multi_genre_games as
 157          select name,
 158                 case when action = 'True' then 'Action' else '' end ||
 159                 case when adventure = 'True' then ', Adventure' else '' end ||
 160                 case when comedy = 'True' then ', Comedy' else '' end ||
 161                 case when crime = 'True' then ', Crime' else '' end ||
 162                 case when family = 'True' then ', Family' else '' end ||
 163                 case when fantasy = 'True' then ', Fantasy' else '' end ||
 164                 case when mystery = 'True' then ', Mystery' else '' end ||
 165                 case when sci_fi = 'True' then ', Sci-Fi' else '' end ||
 166                 case when thriller = 'True' then ', Thriller' else '' end as genres
 167          from clean_videogamedata
 168          group by name
 169          having countw(genres, ',') > 1;
 WARNING: A GROUP BY clause has been transformed into an ORDER BY clause because neither the SELECT clause nor the optional HAVING 
          clause of the associated table-expression referenced a summary function.
 ERROR: The following columns were not found in the contributing tables: sci_fi.
 NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
 170        quit;
 NOTE: The SAS System stopped processing this step because of errors.
 NOTE: PROCEDURE SQL used (Total process time):
       real time           0.00 seconds
       user cpu time       0.00 seconds
       system cpu time     0.00 seconds
       memory              7092.43k
       OS Memory           32428.00k
       Timestamp           01/05/2023 11:27:25 AM
       Step Count                        112  Switch Count  0
       Page Faults                       0
       Page Reclaims                     242
       Page Swaps                        0
       Voluntary Context Switches        0
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           0
       
 
 
 171        proc print data=multi_genre_games;
 ERROR: File WORK.MULTI_GENRE_GAMES.DATA does not exist.
 172          var name genres;
 173          title 'Games with multiple genres';
 174        run;
 
 NOTE: The SAS System stopped processing this step because of errors.
 NOTE: PROCEDURE PRINT used (Total process time):
       real time           0.00 seconds
       user cpu time       0.01 seconds
       system cpu time     0.00 seconds
       memory              279.50k
       OS Memory           26276.00k
       Timestamp           01/05/2023 11:27:25 AM
       Step Count                        113  Switch Count  1
       Page Faults                       0
       Page Reclaims                     16
       Page Swaps                        0
       Voluntary Context Switches        6
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           0
       
 
 
 175        proc sql;
 176          select count(*) as num_multi_genre_games
 177          from multi_genre_games;
 ERROR: File WORK.MULTI_GENRE_GAMES.DATA does not exist.
 NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
 178        quit;
 NOTE: The SAS System stopped processing this step because of errors.
 NOTE: PROCEDURE SQL used (Total process time):
       real time           0.00 seconds
       user cpu time       0.00 seconds
       system cpu time     0.00 seconds
       memory              306.71k
       OS Memory           26276.00k
       Timestamp           01/05/2023 11:27:25 AM
       Step Count                        114  Switch Count  0
       Page Faults                       0
       Page Reclaims                     14
       Page Swaps                        0
       Voluntary Context Switches        0
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           0
       
 
 
 179        proc sql;
 180          select count(*) as num_single_genre_games
 181          from (
 182            select name,
 183                   case when action = 'True' then 1 else 0 end +
 184                   case when adventure = 'True' then 1 else 0 end +
 185                   case when comedy = 'True' then 1 else 0 end +
 186                   case when crime = 'True' then 1 else 0 end +
 187                   case when family = 'True' then 1 else 0 end +
 188                   case when fantasy = 'True' then 1 else 0 end +
 189                   case when mystery = 'True' then 1 else 0 end +
 190                   case when sci_fi = 'True' then 1 else 0 end +
 191                   case when thriller = 'True' then 1 else 0 end as num_genres
 192            from clean_videogamedata
 193          )
 194          where num_genres = 1;
 ERROR: The following columns were not found in the contributing tables: sci_fi.
 NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
 195        quit;
 NOTE: The SAS System stopped processing this step because of errors.
 NOTE: PROCEDURE SQL used (Total process time):
       real time           0.00 seconds
       user cpu time       0.00 seconds
       system cpu time     0.00 seconds
       memory              7072.12k
       OS Memory           32428.00k
       Timestamp           01/05/2023 11:27:25 AM
       Step Count                        115  Switch Count  0
       Page Faults                       0
       Page Reclaims                     246
       Page Swaps                        0
       Voluntary Context Switches        0
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           8
       
 
 
 196        proc freq data=multi_genre_games;
 ERROR: File WORK.MULTI_GENRE_GAMES.DATA does not exist.
 197          tables genres / nocum nopercent out=multi_genre_counts;
 ERROR: No data set open to look up variables.
 198        run;
 
 NOTE: The SAS System stopped processing this step because of errors.
 WARNING: The data set WORK.MULTI_GENRE_COUNTS may be incomplete.  When this step was stopped there were 0 observations and 0 
          variables.
 WARNING: Data set WORK.MULTI_GENRE_COUNTS was not replaced because this step was stopped.
 NOTE: PROCEDURE FREQ used (Total process time):
       real time           0.00 seconds
       user cpu time       0.00 seconds
       system cpu time     0.00 seconds
       memory              422.90k
       OS Memory           26276.00k
       Timestamp           01/05/2023 11:27:25 AM
       Step Count                        116  Switch Count  0
       Page Faults                       0
       Page Reclaims                     17
       Page Swaps                        0
       Voluntary Context Switches        0
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           8
       
 
 
 199        proc sql;
 200          create table game_counts as
 201          select 'Multi-Genre' as type, count(*) as count from multi_genre_games
 202          union all
 203          select 'Single-Genre' as type, count(*) as count from clean_videogamedata where name not in (select name from
 203      ! multi_genre_games);
 ERROR: File WORK.MULTI_GENRE_GAMES.DATA does not exist.
 ERROR: File WORK.MULTI_GENRE_GAMES.DATA does not exist.
 NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
 204        quit;
 NOTE: The SAS System stopped processing this step because of errors.
 NOTE: PROCEDURE SQL used (Total process time):
       real time           0.00 seconds
       user cpu time       0.01 seconds
       system cpu time     0.01 seconds
       memory              1915.37k
       OS Memory           27304.00k
       Timestamp           01/05/2023 11:27:25 AM
       Step Count                        117  Switch Count  0
       Page Faults                       0
       Page Reclaims                     240
       Page Swaps                        0
       Voluntary Context Switches        0
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           8
       
 
 
 205        proc print data=game_counts;
 ERROR: File WORK.GAME_COUNTS.DATA does not exist.
 206          title 'Game Genre Distribution';
 207        run;
 
 NOTE: The SAS System stopped processing this step because of errors.
 NOTE: PROCEDURE PRINT used (Total process time):
       real time           0.00 seconds
       user cpu time       0.00 seconds
       system cpu time     0.00 seconds
       memory              280.15k
       OS Memory           26276.00k
       Timestamp           01/05/2023 11:27:25 AM
       Step Count                        118  Switch Count  1
       Page Faults                       0
       Page Reclaims                     16
       Page Swaps                        0
       Voluntary Context Switches        6
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           0
       
 208        title 'Game Genre Distribution';
 
 
 209        proc gchart data=game_counts;
 ERROR: File WORK.GAME_COUNTS.DATA does not exist.
 210          pie type / sumvar=count;
 ERROR: No data set open to look up variables.
 ERROR: No data set open to look up variables.
 NOTE: The previous statement has been deleted.
 211        run;
 
 212        QUIT;
 
 NOTE: The SAS System stopped processing this step because of errors.
 NOTE: PROCEDURE GCHART used (Total process time):
       real time           0.00 seconds
       user cpu time       0.00 seconds
       system cpu time     0.00 seconds
       memory              298.84k
       OS Memory           26276.00k
       Timestamp           01/05/2023 11:27:25 AM
       Step Count                        119  Switch Count  0
       Page Faults                       0
       Page Reclaims                     14
       Page Swaps                        0
       Voluntary Context Switches        0
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           0
       
 213        
 214        
 215        OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 225        
 
1 REPLY 1
Tom
Super User Tom
Super User

"File does not exist' is not the error you need to fix.  You need to fix the first error before checking if the later errors also need fixing.

 

The first error is that the variable SCI_FI does not exists.  That is what is causing the dataset to not get created.

 

Add the validvarname=V7 option.  Someone must have set your default to use validvarname=ANY so every time you start a new SAS session if goes back to that.

Also add the GUESSINGROWS=MAX; statement to your PROC IMPORT.

options validvarname=v7;
PROC IMPORT 
  DATAFILE = '/home/imdb-videogames.csv'
  OUT = Videogame_data
  DBMS = CSV 
  REPLACE
;
  guessingrows=max;
RUN;

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 300 views
  • 1 like
  • 2 in conversation