Hi folks, I am running SAS 9.4 (TS1M7) 64 bit Windows.
It seems that SAS generates a phantom variable in the import file when I attempt to import a Google Spreadsheet. This variable is named "Var5" and seems to be generated between two numeric variables.
this is a public domain data set, so I can post the SAS code here:
FILENAME SchlDisc url
'https://docs.google.com/spreadsheets/d/e/2PACX-1vR_v7KStLU7ASuBZP2c_2fyMDL23EVzXyQnDIUtaZjA9HazYLNld...' debug;
run;
proc import datafile=SchlDisc out=SchlDisc
DBMS=CSV REPLACE;
GETNAMES=Yes;
DATAROW=2;
guessingrows=max;
run;
proc contents data=SchlDisc position;run;
Any ideas on what I can do to fix this? thanks!
it seems like this formatting problem is just an error in the log. The data appear correct, but having the log filled up with these notes is just kind of annoying. Thanks all for your help! Phil
Does just dropping it do what you want?
FILENAME SchlDisc url
'https://docs.google.com/spreadsheets/d/e/2PACX-1vR_v7KStLU7ASuBZP2c_2fyMDL23EVzXyQnDIUtaZjA9HazYLNld...' debug;
run;
proc import datafile=SchlDisc out=SchlDisc (drop = Var5)
DBMS=CSV REPLACE;
GETNAMES=Yes;
DATAROW=2;
guessingrows=max;
run;
proc contents data=SchlDisc position;run;
No, because it appears to contain character data. it's not just an empty variable. thanks though!
So what do you want to do with it then?
Doesn't happen when I run your code with a link that works:
10 FILENAME SchlDisc url
11 'https://docs.google.com/spreadsheets/d/e/2PACX-1vR_v7KStLU7ASuBZP2c_2
11 ! fyMDL23EVzXyQnDIUtaZjA9HazYLNld_2qCoQOrhvJe7sd-nbEKgZtw-oV/pub?output=
11 ! csv'
12 ;
13
14 proc import datafile=SchlDisc out=SchlDisc
15 DBMS=CSV REPLACE;
16 GETNAMES=Yes;
17 DATAROW=2;
18 guessingrows=max;
19 run;
20 /********************************************************************
20 ! **
21 * PRODUCT: SAS
22 * VERSION: 9.4
23 * CREATOR: External File Interface
24 * DATE: 25JUL23
25 * DESC: Generated SAS Datastep Code
26 * TEMPLATE SOURCE: (None Specified.)
27 *********************************************************************
27 ! **/
28 data WORK.SCHLDISC ;
29 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
30 infile SCHLDISC delimiter = ',' MISSOVER DSD firstobs=2 ;
31 informat YEAR best32. ;
32 informat COUNTY_DISTRICT_CODE best32. ;
33 informat DISTRICT_NAME $30. ;
34 informat SCHOOL_CODE best32. ;
35 informat DISTRICT_NAME $30. ;
36 informat Enrlmnt_K_12 best32. ;
37 informat Disc_IncdntS best32. ;
38 informat Disc_Incdnt_RATE best32. ;
39 informat Disc_ALCOHOL best32. ;
40 informat Disc_ALCOHOL_RATE best32. ;
41 informat Disc_DRUG best32. ;
42 informat Disc_DRUG_RATE best32. ;
43 informat Disc_OTHER best32. ;
44 informat Disc_OTHER_RATE best32. ;
45 informat Disc_TOBACCO best32. ;
46 informat Disc_TOBACCO_RATE best32. ;
47 informat Disc_VIOLENCE best32. ;
48 informat Disc_VIOLENCE_RATE best32. ;
49 informat Disc_WEAPON best32. ;
50 informat Disc_WEAPON_RATE best32. ;
51 informat Disc_INSCHL_SUSP best32. ;
52 informat Disc_INSCHL_SUSP_RATE best32. ;
53 informat Disc_OUTSCHL_SUSP best32. ;
54 informat Disc_OUTSCHL_SUSP_RATE best32. ;
55 informat Disc_Rmvl_EXPULSION best32. ;
56 informat Disc_EXPULSION_RATE best32. ;
57 informat Disc_10mDays best32. ;
58 informat Disc_10mDays_RATE best32. ;
59 format YEAR best12. ;
60 format COUNTY_DISTRICT_CODE best12. ;
61 format DISTRICT_NAME $30. ;
62 format SCHOOL_CODE best12. ;
63 format DISTRICT_NAME $30. ;
64 format Enrlmnt_K_12 best12. ;
65 format Disc_IncdntS best12. ;
66 format Disc_Incdnt_RATE best12. ;
67 format Disc_ALCOHOL best12. ;
68 format Disc_ALCOHOL_RATE best12. ;
69 format Disc_DRUG best12. ;
70 format Disc_DRUG_RATE best12. ;
71 format Disc_OTHER best12. ;
72 format Disc_OTHER_RATE best12. ;
73 format Disc_TOBACCO best12. ;
74 format Disc_TOBACCO_RATE best12. ;
75 format Disc_VIOLENCE best12. ;
76 format Disc_VIOLENCE_RATE best12. ;
77 format Disc_WEAPON best12. ;
78 format Disc_WEAPON_RATE best12. ;
79 format Disc_INSCHL_SUSP best12. ;
80 format Disc_INSCHL_SUSP_RATE best12. ;
81 format Disc_OUTSCHL_SUSP best12. ;
82 format Disc_OUTSCHL_SUSP_RATE best12. ;
83 format Disc_Rmvl_EXPULSION best12. ;
84 format Disc_EXPULSION_RATE best12. ;
85 format Disc_10mDays best12. ;
86 format Disc_10mDays_RATE best12. ;
87 input
88 YEAR
89 COUNTY_DISTRICT_CODE
90 DISTRICT_NAME $
91 SCHOOL_CODE
92 DISTRICT_NAME $
93 Enrlmnt_K_12
94 Disc_IncdntS
95 Disc_Incdnt_RATE
96 Disc_ALCOHOL
97 Disc_ALCOHOL_RATE
98 Disc_DRUG
99 Disc_DRUG_RATE
100 Disc_OTHER
101 Disc_OTHER_RATE
102 Disc_TOBACCO
103 Disc_TOBACCO_RATE
104 Disc_VIOLENCE
105 Disc_VIOLENCE_RATE
106 Disc_WEAPON
107 Disc_WEAPON_RATE
108 Disc_INSCHL_SUSP
109 Disc_INSCHL_SUSP_RATE
110 Disc_OUTSCHL_SUSP
111 Disc_OUTSCHL_SUSP_RATE
112 Disc_Rmvl_EXPULSION
113 Disc_EXPULSION_RATE
114 Disc_10mDays
115 Disc_10mDays_RATE
116 ;
117 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR
117! detection macro variable */
118 run;
NOTE: The infile SCHLDISC is:
Filename=https://docs.google.com/spreadsheets/d/e/2PACX-1vR_v7KStLU7A
SuBZP2c_2fyMDL23EVzXyQnDIUtaZjA9HazYLNld_2qCoQOrhvJe7sd-nbEKgZtw-oV/p
ub?output=csv,
Local Host Name=DESKTOP-C36D7SF,
Local Host IP addr=fe80::61a9:7f4a:a569:a706%11,
Service Hostname Name=sea30s10-in-f1.1e100.net,
Service IP addr=142.251.33.97,Service Name=N/A,
Service Portno=443,Lrecl=32767,Recfm=Variable
NOTE: 59157 records were read from the infile SCHLDISC.
The minimum record length was 58.
The maximum record length was 140.
NOTE: The data set WORK.SCHLDISC has 59157 observations and 27 variables.
NOTE: DATA statement used (Total process time):
real time 22.06 seconds
cpu time 0.75 seconds
59157 rows created in WORK.SCHLDISC from SCHLDISC.
NOTE: WORK.SCHLDISC data set was successfully created.
NOTE: The data set WORK.SCHLDISC has 59157 observations and 27 variables.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 56.92 seconds
cpu time 15.03 seconds
Show your LOG with the generated data step as I have shown above.
Typically that "Var5" type of name occurs when one of two things happens. One is when there is no column name or header provided.
The other is when the name of the variable is exactly the same for two columns for the first 32 characters.
So perhaps when you ran your code the version of the file had one of those two problems.
Or show us the names of the two variables on either side of the Var5 as well. If it occurred between School_code and Enrlmnt_k_12 then look very closely if the column heading looks like District_name with other characters as there might be something odd there.
Note: County code, district code and any "code" value likely should not be considered numeric at all. I know that my school district 001 complains when someone uses 1.
Make sure that you post then entire link. As shown your link will not run unless taking a separate step to copy the url separately and replace your text, which returns a file not found message.
Thanks- that's a lot of progress. It seems that I no longer have the phantom var5, but the log still shows many problems trying to read in the school name (which is character and which is character for the first few rows.). The messages all look like this (I won't bore you with all of them, because the log is quite long).
NOTE: <<<
dqstring=1050 ADAIR CO. HIGH
fmt=
dqstring=1050 ADAIR CO. HIGH
fmt=
dqstring=1050 ADAIR CO. HIGH
fmt=
dqstring=1050 ADAIR CO. HIGH
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring=21ST CENTURY ALTERNATIVE
fmt=
dqstring=21ST CENTURY ALTERNATIVE
fmt=
dqstring=21ST CENTURY ALTERNATIVE
fmt=
dqstring=21ST CENTURY ALTERNATIVE
fmt=
dqstring=21ST CENTURY ALTERNATIVE
fmt=
dqstring=21ST CENTURY ALTERNATIVE
fmt=
dqstring=6TH GRADE CENTER
fmt=
dqstring=6TH GRADE CENTER
fmt=
dqstring=6TH GRADE CENTER
fmt=
dqstring=6TH GRADE CENTER
fmt=
dqstring=6TH GRADE CENTER
fmt=
dqstring=6TH GRADE CENTER
fmt=
dqstring=6TH GRADE CENTER
fmt=
dqstring=6TH GRADE CENTER
fmt=
dqstring=6TH GRADE CENTER
fmt=
dqstring=6TH GRADE CENTER
fmt=
dqstring=7th and 8th Grade Center
fmt=
dqstring=7th and 8th Grade Center
fmt=
dqstring=7th and 8th Grade Center
fmt=
dqstring=7th and 8th Grade Center
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
it seems like this formatting problem is just an error in the log. The data appear correct, but having the log filled up with these notes is just kind of annoying. Thanks all for your help! Phil
Normally error messages in the log are caused by text values in the file that are invalid for the numeric informat being used to read them. That is normally something that happens when the analysis of what type of values appear in the CSV file made the wrong GUESS at how to read it.
Are you sure you used the GUESSINGROWS = MAX statement?
To avoid guessing just write your own data step to read the delimited text file. Then you can set the names and types of the variables based on what you know they contain.
If you do have to guess you could try using a different method to make the guesses. You might try this macro instead: https://github.com/sasutils/macros/blob/master/csv2ds.sas
@PhilWood wrote:
thanks. i'll give it a look. Oddly there's a long character in the first row of the data, but yes, I did use guessingrows max.
Another possibility is that the file has embedded end of line characters in one or more of the field values.
That can make if hard for SAS to parse the lines properly.
@PhilWood wrote:
Thanks- that's a lot of progress. It seems that I no longer have the phantom var5, but the log still shows many problems trying to read in the school name (which is character and which is character for the first few rows.). The messages all look like this (I won't bore you with all of them, because the log is quite long).
NOTE: <<<
dqstring=1050 ADAIR CO. HIGH
fmt=
dqstring=1050 ADAIR CO. HIGH
fmt=
dqstring=1050 ADAIR CO. HIGH
fmt=
dqstring=1050 ADAIR CO. HIGH
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
The information about the variables as in the actual data step would be more useful and that does not even look like a typical invalid data log entry. The DATA step would tell us what INFORMAT Proc Format assigned and the order it was reading. NOTHING when I read your file given in that link has a column heading of DQSTRING so it appears not to come from the problem file.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.