Hi folks, I am running SAS 9.4 (TS1M7) 64 bit Windows.
It seems that SAS generates a phantom variable in the import file when I attempt to import a Google Spreadsheet. This variable is named "Var5" and seems to be generated between two numeric variables.
this is a public domain data set, so I can post the SAS code here:
FILENAME SchlDisc url
'https://docs.google.com/spreadsheets/d/e/2PACX-1vR_v7KStLU7ASuBZP2c_2fyMDL23EVzXyQnDIUtaZjA9HazYLNld...' debug;
run;
proc import datafile=SchlDisc out=SchlDisc
DBMS=CSV REPLACE;
GETNAMES=Yes;
DATAROW=2;
guessingrows=max;
run;
proc contents data=SchlDisc position;run;
Any ideas on what I can do to fix this? thanks!
it seems like this formatting problem is just an error in the log. The data appear correct, but having the log filled up with these notes is just kind of annoying. Thanks all for your help! Phil
Does just dropping it do what you want?
FILENAME SchlDisc url
'https://docs.google.com/spreadsheets/d/e/2PACX-1vR_v7KStLU7ASuBZP2c_2fyMDL23EVzXyQnDIUtaZjA9HazYLNld...' debug;
run;
proc import datafile=SchlDisc out=SchlDisc (drop = Var5)
DBMS=CSV REPLACE;
GETNAMES=Yes;
DATAROW=2;
guessingrows=max;
run;
proc contents data=SchlDisc position;run;
No, because it appears to contain character data. it's not just an empty variable. thanks though!
So what do you want to do with it then?
Doesn't happen when I run your code with a link that works:
10 FILENAME SchlDisc url 11 'https://docs.google.com/spreadsheets/d/e/2PACX-1vR_v7KStLU7ASuBZP2c_2 11 ! fyMDL23EVzXyQnDIUtaZjA9HazYLNld_2qCoQOrhvJe7sd-nbEKgZtw-oV/pub?output= 11 ! csv' 12 ; 13 14 proc import datafile=SchlDisc out=SchlDisc 15 DBMS=CSV REPLACE; 16 GETNAMES=Yes; 17 DATAROW=2; 18 guessingrows=max; 19 run; 20 /******************************************************************** 20 ! ** 21 * PRODUCT: SAS 22 * VERSION: 9.4 23 * CREATOR: External File Interface 24 * DATE: 25JUL23 25 * DESC: Generated SAS Datastep Code 26 * TEMPLATE SOURCE: (None Specified.) 27 ********************************************************************* 27 ! **/ 28 data WORK.SCHLDISC ; 29 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */ 30 infile SCHLDISC delimiter = ',' MISSOVER DSD firstobs=2 ; 31 informat YEAR best32. ; 32 informat COUNTY_DISTRICT_CODE best32. ; 33 informat DISTRICT_NAME $30. ; 34 informat SCHOOL_CODE best32. ; 35 informat DISTRICT_NAME $30. ; 36 informat Enrlmnt_K_12 best32. ; 37 informat Disc_IncdntS best32. ; 38 informat Disc_Incdnt_RATE best32. ; 39 informat Disc_ALCOHOL best32. ; 40 informat Disc_ALCOHOL_RATE best32. ; 41 informat Disc_DRUG best32. ; 42 informat Disc_DRUG_RATE best32. ; 43 informat Disc_OTHER best32. ; 44 informat Disc_OTHER_RATE best32. ; 45 informat Disc_TOBACCO best32. ; 46 informat Disc_TOBACCO_RATE best32. ; 47 informat Disc_VIOLENCE best32. ; 48 informat Disc_VIOLENCE_RATE best32. ; 49 informat Disc_WEAPON best32. ; 50 informat Disc_WEAPON_RATE best32. ; 51 informat Disc_INSCHL_SUSP best32. ; 52 informat Disc_INSCHL_SUSP_RATE best32. ; 53 informat Disc_OUTSCHL_SUSP best32. ; 54 informat Disc_OUTSCHL_SUSP_RATE best32. ; 55 informat Disc_Rmvl_EXPULSION best32. ; 56 informat Disc_EXPULSION_RATE best32. ; 57 informat Disc_10mDays best32. ; 58 informat Disc_10mDays_RATE best32. ; 59 format YEAR best12. ; 60 format COUNTY_DISTRICT_CODE best12. ; 61 format DISTRICT_NAME $30. ; 62 format SCHOOL_CODE best12. ; 63 format DISTRICT_NAME $30. ; 64 format Enrlmnt_K_12 best12. ; 65 format Disc_IncdntS best12. ; 66 format Disc_Incdnt_RATE best12. ; 67 format Disc_ALCOHOL best12. ; 68 format Disc_ALCOHOL_RATE best12. ; 69 format Disc_DRUG best12. ; 70 format Disc_DRUG_RATE best12. ; 71 format Disc_OTHER best12. ; 72 format Disc_OTHER_RATE best12. ; 73 format Disc_TOBACCO best12. ; 74 format Disc_TOBACCO_RATE best12. ; 75 format Disc_VIOLENCE best12. ; 76 format Disc_VIOLENCE_RATE best12. ; 77 format Disc_WEAPON best12. ; 78 format Disc_WEAPON_RATE best12. ; 79 format Disc_INSCHL_SUSP best12. ; 80 format Disc_INSCHL_SUSP_RATE best12. ; 81 format Disc_OUTSCHL_SUSP best12. ; 82 format Disc_OUTSCHL_SUSP_RATE best12. ; 83 format Disc_Rmvl_EXPULSION best12. ; 84 format Disc_EXPULSION_RATE best12. ; 85 format Disc_10mDays best12. ; 86 format Disc_10mDays_RATE best12. ; 87 input 88 YEAR 89 COUNTY_DISTRICT_CODE 90 DISTRICT_NAME $ 91 SCHOOL_CODE 92 DISTRICT_NAME $ 93 Enrlmnt_K_12 94 Disc_IncdntS 95 Disc_Incdnt_RATE 96 Disc_ALCOHOL 97 Disc_ALCOHOL_RATE 98 Disc_DRUG 99 Disc_DRUG_RATE 100 Disc_OTHER 101 Disc_OTHER_RATE 102 Disc_TOBACCO 103 Disc_TOBACCO_RATE 104 Disc_VIOLENCE 105 Disc_VIOLENCE_RATE 106 Disc_WEAPON 107 Disc_WEAPON_RATE 108 Disc_INSCHL_SUSP 109 Disc_INSCHL_SUSP_RATE 110 Disc_OUTSCHL_SUSP 111 Disc_OUTSCHL_SUSP_RATE 112 Disc_Rmvl_EXPULSION 113 Disc_EXPULSION_RATE 114 Disc_10mDays 115 Disc_10mDays_RATE 116 ; 117 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR 117! detection macro variable */ 118 run; NOTE: The infile SCHLDISC is: Filename=https://docs.google.com/spreadsheets/d/e/2PACX-1vR_v7KStLU7A SuBZP2c_2fyMDL23EVzXyQnDIUtaZjA9HazYLNld_2qCoQOrhvJe7sd-nbEKgZtw-oV/p ub?output=csv, Local Host Name=DESKTOP-C36D7SF, Local Host IP addr=fe80::61a9:7f4a:a569:a706%11, Service Hostname Name=sea30s10-in-f1.1e100.net, Service IP addr=142.251.33.97,Service Name=N/A, Service Portno=443,Lrecl=32767,Recfm=Variable NOTE: 59157 records were read from the infile SCHLDISC. The minimum record length was 58. The maximum record length was 140. NOTE: The data set WORK.SCHLDISC has 59157 observations and 27 variables. NOTE: DATA statement used (Total process time): real time 22.06 seconds cpu time 0.75 seconds 59157 rows created in WORK.SCHLDISC from SCHLDISC. NOTE: WORK.SCHLDISC data set was successfully created. NOTE: The data set WORK.SCHLDISC has 59157 observations and 27 variables. NOTE: PROCEDURE IMPORT used (Total process time): real time 56.92 seconds cpu time 15.03 seconds
Show your LOG with the generated data step as I have shown above.
Typically that "Var5" type of name occurs when one of two things happens. One is when there is no column name or header provided.
The other is when the name of the variable is exactly the same for two columns for the first 32 characters.
So perhaps when you ran your code the version of the file had one of those two problems.
Or show us the names of the two variables on either side of the Var5 as well. If it occurred between School_code and Enrlmnt_k_12 then look very closely if the column heading looks like District_name with other characters as there might be something odd there.
Note: County code, district code and any "code" value likely should not be considered numeric at all. I know that my school district 001 complains when someone uses 1.
Make sure that you post then entire link. As shown your link will not run unless taking a separate step to copy the url separately and replace your text, which returns a file not found message.
Thanks- that's a lot of progress. It seems that I no longer have the phantom var5, but the log still shows many problems trying to read in the school name (which is character and which is character for the first few rows.). The messages all look like this (I won't bore you with all of them, because the log is quite long).
NOTE: <<<
dqstring=1050 ADAIR CO. HIGH
fmt=
dqstring=1050 ADAIR CO. HIGH
fmt=
dqstring=1050 ADAIR CO. HIGH
fmt=
dqstring=1050 ADAIR CO. HIGH
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring=21ST CENTURY ALTERNATIVE
fmt=
dqstring=21ST CENTURY ALTERNATIVE
fmt=
dqstring=21ST CENTURY ALTERNATIVE
fmt=
dqstring=21ST CENTURY ALTERNATIVE
fmt=
dqstring=21ST CENTURY ALTERNATIVE
fmt=
dqstring=21ST CENTURY ALTERNATIVE
fmt=
dqstring=6TH GRADE CENTER
fmt=
dqstring=6TH GRADE CENTER
fmt=
dqstring=6TH GRADE CENTER
fmt=
dqstring=6TH GRADE CENTER
fmt=
dqstring=6TH GRADE CENTER
fmt=
dqstring=6TH GRADE CENTER
fmt=
dqstring=6TH GRADE CENTER
fmt=
dqstring=6TH GRADE CENTER
fmt=
dqstring=6TH GRADE CENTER
fmt=
dqstring=6TH GRADE CENTER
fmt=
dqstring=7th and 8th Grade Center
fmt=
dqstring=7th and 8th Grade Center
fmt=
dqstring=7th and 8th Grade Center
fmt=
dqstring=7th and 8th Grade Center
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
it seems like this formatting problem is just an error in the log. The data appear correct, but having the log filled up with these notes is just kind of annoying. Thanks all for your help! Phil
Normally error messages in the log are caused by text values in the file that are invalid for the numeric informat being used to read them. That is normally something that happens when the analysis of what type of values appear in the CSV file made the wrong GUESS at how to read it.
Are you sure you used the GUESSINGROWS = MAX statement?
To avoid guessing just write your own data step to read the delimited text file. Then you can set the names and types of the variables based on what you know they contain.
If you do have to guess you could try using a different method to make the guesses. You might try this macro instead: https://github.com/sasutils/macros/blob/master/csv2ds.sas
@PhilWood wrote:
thanks. i'll give it a look. Oddly there's a long character in the first row of the data, but yes, I did use guessingrows max.
Another possibility is that the file has embedded end of line characters in one or more of the field values.
That can make if hard for SAS to parse the lines properly.
@PhilWood wrote:
Thanks- that's a lot of progress. It seems that I no longer have the phantom var5, but the log still shows many problems trying to read in the school name (which is character and which is character for the first few rows.). The messages all look like this (I won't bore you with all of them, because the log is quite long).
NOTE: <<<
dqstring=1050 ADAIR CO. HIGH
fmt=
dqstring=1050 ADAIR CO. HIGH
fmt=
dqstring=1050 ADAIR CO. HIGH
fmt=
dqstring=1050 ADAIR CO. HIGH
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
The information about the variables as in the actual data step would be more useful and that does not even look like a typical invalid data log entry. The DATA step would tell us what INFORMAT Proc Format assigned and the order it was reading. NOTHING when I read your file given in that link has a column heading of DQSTRING so it appears not to come from the problem file.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.