BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
jacko1801
Obsidian | Level 7

Hi

this is my third time trying to post this question but adding the log keeps flagging and deleting the question due to spam... @PaigeMiller you last commented so i hope you don't mind me tagging you into this new post. 

 

the code:

PROC IMPORT 
  DATAFILE = '/home/imdb-videogames.csv'
  OUT = Videogame_data
  DBMS = CSV 
  REPLACE;
RUN;

DATA Videogame_data;
  SET Videogame_data;
  IF NOT MISSING(rating) AND NOT MISSING(votes) AND NOT MISSING(certificate);
RUN;

PROC SORT DATA=Videogame_data OUT=clean_videogamedata NODUPKEY;
  BY name;
RUN;

DATA Videogame_data;
  SET clean_videogamedata;
RUN;

proc sql;
  create table multi_genre_games as
  select name, 
         case when action = 'True' then 'Action' else '' end ||
         case when adventure = 'True' then ', Adventure' else '' end ||
         case when comedy = 'True' then ', Comedy' else '' end ||
         case when crime = 'True' then ', Crime' else '' end ||
         case when family = 'True' then ', Family' else '' end ||
         case when fantasy = 'True' then ', Fantasy' else '' end ||
         case when mystery = 'True' then ', Mystery' else '' end ||
         case when sci_fi = 'True' then ', Sci-Fi' else '' end ||
         case when thriller = 'True' then ', Thriller' else '' end as genres
  from clean_videogamedata
  group by name
  having countw(genres, ',') > 1;
quit;
proc print data=multi_genre_games;
  var name genres;
  title 'Games with multiple genres';
run;
proc sql;
  select count(*) as num_multi_genre_games
  from multi_genre_games;
quit;
proc sql;
  select count(*) as num_single_genre_games
  from (
    select name,
           case when action = 'True' then 1 else 0 end +
           case when adventure = 'True' then 1 else 0 end +
           case when comedy = 'True' then 1 else 0 end +
           case when crime = 'True' then 1 else 0 end +
           case when family = 'True' then 1 else 0 end +
           case when fantasy = 'True' then 1 else 0 end +
           case when mystery = 'True' then 1 else 0 end +
           case when sci_fi = 'True' then 1 else 0 end +
           case when thriller = 'True' then 1 else 0 end as num_genres
    from clean_videogamedata
  )
  where num_genres = 1;
quit;
proc freq data=multi_genre_games;
  tables genres / nocum nopercent out=multi_genre_counts;
run;
proc sql;
  create table game_counts as
  select 'Multi-Genre' as type, count(*) as count from multi_genre_games
  union all
  select 'Single-Genre' as type, count(*) as count from clean_videogamedata where name not in (select name from multi_genre_games);
quit;
proc print data=game_counts;
  title 'Game Genre Distribution';
run;
title 'Game Genre Distribution';
proc gchart data=game_counts;
  pie type / sumvar=count;
run;
QUIT;
LOG
 
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
68
69 PROC IMPORT
70 DATAFILE = '/home/u63300316/Business Analytics/cw2/B.A.G6/imdb-videogames.csv'
71 OUT = Videogame_data
72 DBMS = CSV
73 REPLACE;
74 RUN;
 
NOTE: Unable to open parameter catalog: SASUSER.PARMS.PARMS.SLIST in update mode. Temporary parameter values will be saved to
WORK.PARMS.PARMS.SLIST.
Name is not a valid SAS name.
Problems were detected with provided names. See LOG.
75 /**********************************************************************
76 * PRODUCT: SAS
77 * VERSION: 9.4
78 * CREATOR: External File Interface
79 * DATE: 01MAY23
80 * DESC: Generated SAS Datastep Code
81 * TEMPLATE SOURCE: (None Specified.)
82 ***********************************************************************/
83 data WORK.VIDEOGAME_DATA ;
84 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
85 infile '/home/u63300316/Business Analytics/cw2/B.A.G6/imdb-videogames.csv' delimiter = ',' MISSOVER DSD lrecl=32767
85 ! firstobs=2 ;
86 informat VAR1 best32. ;
87 informat name $29. ;
88 informat url $53. ;
89 informat year best32. ;
90 informat certificate $1. ;
91 informat rating best32. ;
92 informat votes $8. ;
93 informat plot $239. ;
94 informat Action $4. ;
95 informat Adventure $5. ;
96 informat Comedy $5. ;
97 informat Crime $5. ;
98 informat Family $5. ;
99 informat Fantasy $5. ;
100 informat Mystery $5. ;
101 informat "Sci-Fi"N $5. ;
102 informat Thriller $5. ;
103 format VAR1 best12. ;
104 format name $29. ;
105 format url $53. ;
106 format year best12. ;
107 format certificate $1. ;
108 format rating best12. ;
109 format votes $8. ;
110 format plot $239. ;
111 format Action $4. ;
112 format Adventure $5. ;
113 format Comedy $5. ;
114 format Crime $5. ;
115 format Family $5. ;
116 format Fantasy $5. ;
117 format Mystery $5. ;
118 format "Sci-Fi"N $5. ;
119 format Thriller $5. ;
120 input
121 VAR1
122 name $
123 url $
124 year
125 certificate $
126 rating
127 votes $
128 plot $
129 Action $
130 Adventure $
131 Comedy $
132 Crime $
133 Family $
134 Fantasy $
135 Mystery $
136 "Sci-Fi"N $
137 Thriller $
138 ;
139 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
140 run;
 
NOTE: The infile '/home/u63300316/Business Analytics/cw2/B.A.G6/imdb-videogames.csv' is:
Filename=/home/u63300316/Business Analytics/cw2/B.A.G6/imdb-videogames.csv,
Owner Name=u63300316,Group Name=oda,
Access Permission=-rw-r--r--,
Last Modified=01 April 2023 13:21:12,
File Size (bytes)=5156401
 
NOTE: Invalid data for VAR1 in line 6550 1-93.
NOTE: Invalid data for year in line 6550 106-110.
NOTE: Invalid data for rating in line 6550 118-122.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
6550 The Gameplay is similar to that of Ultimate Marvel vs. Capcom 3. It ... See full synopsis »",True,F
101 alse,False,False,False,False,False,True,False 145
VAR1=. name=True url=False year=. certificate=F rating=. votes=False plot=False Action=True Adventure=False Comedy= Crime=
Family= Fantasy= Mystery= 'Sci-Fi'n= Thriller= _ERROR_=1 _N_=6549
NOTE: 20804 records were read from the infile '/home/u63300316/Business Analytics/cw2/B.A.G6/imdb-videogames.csv'.
The minimum record length was 130.
The maximum record length was 471.
NOTE: The data set WORK.VIDEOGAME_DATA has 20804 observations and 17 variables.
NOTE: DATA statement used (Total process time):
real time 0.04 seconds
user cpu time 0.04 seconds
system cpu time 0.01 seconds
memory 10895.71k
OS Memory 33312.00k
Timestamp 01/05/2023 11:27:25 AM
Step Count 108 Switch Count 2
Page Faults 0
Page Reclaims 288
Page Swaps 0
Voluntary Context Switches 11
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 16392
 
 
Errors detected in submitted DATA step. Examine log.
20804 rows created in WORK.VIDEOGAME_DATA from /home/u63300316/Business Analytics/cw2/B.A.G6/imdb-videogames.csv.
 
 
 
ERROR: Import unsuccessful. See SAS Log for details.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.17 seconds
user cpu time 0.15 seconds
system cpu time 0.02 seconds
memory 10895.71k
OS Memory 33568.00k
Timestamp 01/05/2023 11:27:25 AM
Step Count 108 Switch Count 13
Page Faults 0
Page Reclaims 5043
Page Swaps 0
Voluntary Context Switches 101
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 16464
 
141
 
 
142 DATA Videogame_data;
143 SET Videogame_data;
144 IF NOT MISSING(rating) AND NOT MISSING(votes) AND NOT MISSING(certificate);
145 RUN;
 
NOTE: There were 20804 observations read from the data set WORK.VIDEOGAME_DATA.
NOTE: The data set WORK.VIDEOGAME_DATA has 6599 observations and 17 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
user cpu time 0.00 seconds
system cpu time 0.01 seconds
memory 3567.15k
OS Memory 29100.00k
Timestamp 01/05/2023 11:27:25 AM
Step Count 109 Switch Count 3
Page Faults 0
Page Reclaims 462
Page Swaps 0
Voluntary Context Switches 16
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 5384
 
 
146
147 PROC SORT DATA=Videogame_data OUT=clean_videogamedata NODUPKEY;
148 BY name;
149 RUN;
 
NOTE: There were 6599 observations read from the data set WORK.VIDEOGAME_DATA.
NOTE: 785 observations with duplicate key values were deleted.
NOTE: The data set WORK.CLEAN_VIDEOGAMEDATA has 5814 observations and 17 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.01 seconds
memory 5769.84k
OS Memory 30916.00k
Timestamp 01/05/2023 11:27:25 AM
Step Count 110 Switch Count 2
Page Faults 0
Page Reclaims 1068
Page Swaps 0
Voluntary Context Switches 11
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 4624
 
 
150
151 DATA Videogame_data;
152 SET clean_videogamedata;
153 RUN;
 
NOTE: There were 5814 observations read from the data set WORK.CLEAN_VIDEOGAMEDATA.
NOTE: The data set WORK.VIDEOGAME_DATA has 5814 observations and 17 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 3566.65k
OS Memory 29100.00k
Timestamp 01/05/2023 11:27:25 AM
Step Count 111 Switch Count 2
Page Faults 0
Page Reclaims 502
Page Swaps 0
Voluntary Context Switches 10
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 4616
 
 
154
155 proc sql;
156 create table multi_genre_games as
157 select name,
158 case when action = 'True' then 'Action' else '' end ||
159 case when adventure = 'True' then ', Adventure' else '' end ||
160 case when comedy = 'True' then ', Comedy' else '' end ||
161 case when crime = 'True' then ', Crime' else '' end ||
162 case when family = 'True' then ', Family' else '' end ||
163 case when fantasy = 'True' then ', Fantasy' else '' end ||
164 case when mystery = 'True' then ', Mystery' else '' end ||
165 case when sci_fi = 'True' then ', Sci-Fi' else '' end ||
166 case when thriller = 'True' then ', Thriller' else '' end as genres
167 from clean_videogamedata
168 group by name
169 having countw(genres, ',') > 1;
WARNING: A GROUP BY clause has been transformed into an ORDER BY clause because neither the SELECT clause nor the optional HAVING
clause of the associated table-expression referenced a summary function.
ERROR: The following columns were not found in the contributing tables: sci_fi.
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
170 quit;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 7092.43k
OS Memory 32428.00k
Timestamp 01/05/2023 11:27:25 AM
Step Count 112 Switch Count 0
Page Faults 0
Page Reclaims 242
Page Swaps 0
Voluntary Context Switches 0
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 0
 
 
 
171 proc print data=multi_genre_games;
ERROR: File WORK.MULTI_GENRE_GAMES.DATA does not exist.
172 var name genres;
173 title 'Games with multiple genres';
174 run;
 
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.00 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 279.50k
OS Memory 26276.00k
Timestamp 01/05/2023 11:27:25 AM
Step Count 113 Switch Count 1
Page Faults 0
Page Reclaims 16
Page Swaps 0
Voluntary Context Switches 6
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 0
 
 
 
175 proc sql;
176 select count(*) as num_multi_genre_games
177 from multi_genre_games;
ERROR: File WORK.MULTI_GENRE_GAMES.DATA does not exist.
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
178 quit;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 306.71k
OS Memory 26276.00k
Timestamp 01/05/2023 11:27:25 AM
Step Count 114 Switch Count 0
Page Faults 0
Page Reclaims 14
Page Swaps 0
Voluntary Context Switches 0
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 0
 
 
 
179 proc sql;
180 select count(*) as num_single_genre_games
181 from (
182 select name,
183 case when action = 'True' then 1 else 0 end +
184 case when adventure = 'True' then 1 else 0 end +
185 case when comedy = 'True' then 1 else 0 end +
186 case when crime = 'True' then 1 else 0 end +
187 case when family = 'True' then 1 else 0 end +
188 case when fantasy = 'True' then 1 else 0 end +
189 case when mystery = 'True' then 1 else 0 end +
190 case when sci_fi = 'True' then 1 else 0 end +
191 case when thriller = 'True' then 1 else 0 end as num_genres
192 from clean_videogamedata
193 )
194 where num_genres = 1;
ERROR: The following columns were not found in the contributing tables: sci_fi.
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
195 quit;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 7072.12k
OS Memory 32428.00k
Timestamp 01/05/2023 11:27:25 AM
Step Count 115 Switch Count 0
Page Faults 0
Page Reclaims 246
Page Swaps 0
Voluntary Context Switches 0
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 8
 
 
 
196 proc freq data=multi_genre_games;
ERROR: File WORK.MULTI_GENRE_GAMES.DATA does not exist.
197 tables genres / nocum nopercent out=multi_genre_counts;
ERROR: No data set open to look up variables.
198 run;
 
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.MULTI_GENRE_COUNTS may be incomplete. When this step was stopped there were 0 observations and 0
variables.
WARNING: Data set WORK.MULTI_GENRE_COUNTS was not replaced because this step was stopped.
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 422.90k
OS Memory 26276.00k
Timestamp 01/05/2023 11:27:25 AM
Step Count 116 Switch Count 0
Page Faults 0
Page Reclaims 17
Page Swaps 0
Voluntary Context Switches 0
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 8
 
 
 
199 proc sql;
200 create table game_counts as
201 select 'Multi-Genre' as type, count(*) as count from multi_genre_games
202 union all
203 select 'Single-Genre' as type, count(*) as count from clean_videogamedata where name not in (select name from
203 ! multi_genre_games);
ERROR: File WORK.MULTI_GENRE_GAMES.DATA does not exist.
ERROR: File WORK.MULTI_GENRE_GAMES.DATA does not exist.
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
204 quit;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.00 seconds
user cpu time 0.01 seconds
system cpu time 0.01 seconds
memory 1915.37k
OS Memory 27304.00k
Timestamp 01/05/2023 11:27:25 AM
Step Count 117 Switch Count 0
Page Faults 0
Page Reclaims 240
Page Swaps 0
Voluntary Context Switches 0
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 8
 
 
 
205 proc print data=game_counts;
ERROR: File WORK.GAME_COUNTS.DATA does not exist.
206 title 'Game Genre Distribution';
207 run;
 
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 280.15k
OS Memory 26276.00k
Timestamp 01/05/2023 11:27:25 AM
Step Count 118 Switch Count 1
Page Faults 0
Page Reclaims 16
Page Swaps 0
Voluntary Context Switches 6
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 0
 
208 title 'Game Genre Distribution';
 
 
209 proc gchart data=game_counts;
ERROR: File WORK.GAME_COUNTS.DATA does not exist.
210 pie type / sumvar=count;
ERROR: No data set open to look up variables.
ERROR: No data set open to look up variables.
NOTE: The previous statement has been deleted.
211 run;
 
212 QUIT;
 
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE GCHART used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 298.84k
OS Memory 26276.00k
Timestamp 01/05/2023 11:27:25 AM
Step Count 119 Switch Count 0
Page Faults 0
Page Reclaims 14
Page Swaps 0
Voluntary Context Switches 0
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 0
 
213
214
215 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
225
1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

From now on, please use a code box (by clicking on the </> icon) for inserting the LOG into your message, like this:

 

167 from clean_videogamedata
168 group by name
169 having countw(genres, ',') > 1;
WARNING: A GROUP BY clause has been transformed into an ORDER BY clause because neither the SELECT clause nor the optional HAVING
clause of the associated table-expression referenced a summary function.
ERROR: The following columns were not found in the contributing tables: sci_fi.
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
170 quit;

This PROC SQL cannot work because there is no variable named SCI_FI in the data set CLEAN_VIDEOGAMEDATA.

 

When PROC IMPORT created the first data set, there is a variable whose name is "Sci-Fi"N with a dash in it, surrounded by quotes and followed by the letter N. This is how you will have to refer to this variable, with the name in quotes, and a dash in the variable name and following the second quote you must have the letter N (or you can rename the variable to something easier to type). From the log for PROC IMPORT:

 

101 informat "Sci-Fi"N $5. ;

Admin Note:
Adding documentation links:

Rules for Variable Names (see Summary of Extended Rules for Naming SAS Variables)
Definition of SAS Name Literals 

--
Paige Miller

View solution in original post

2 REPLIES 2
PaigeMiller
Diamond | Level 26

From now on, please use a code box (by clicking on the </> icon) for inserting the LOG into your message, like this:

 

167 from clean_videogamedata
168 group by name
169 having countw(genres, ',') > 1;
WARNING: A GROUP BY clause has been transformed into an ORDER BY clause because neither the SELECT clause nor the optional HAVING
clause of the associated table-expression referenced a summary function.
ERROR: The following columns were not found in the contributing tables: sci_fi.
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
170 quit;

This PROC SQL cannot work because there is no variable named SCI_FI in the data set CLEAN_VIDEOGAMEDATA.

 

When PROC IMPORT created the first data set, there is a variable whose name is "Sci-Fi"N with a dash in it, surrounded by quotes and followed by the letter N. This is how you will have to refer to this variable, with the name in quotes, and a dash in the variable name and following the second quote you must have the letter N (or you can rename the variable to something easier to type). From the log for PROC IMPORT:

 

101 informat "Sci-Fi"N $5. ;

Admin Note:
Adding documentation links:

Rules for Variable Names (see Summary of Extended Rules for Naming SAS Variables)
Definition of SAS Name Literals 

--
Paige Miller
jacko1801
Obsidian | Level 7

so by adding "OPTION Validvarname=v7;" to the top of the sas document fixed everything

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 530 views
  • 0 likes
  • 2 in conversation