BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
PhilWood
Obsidian | Level 7

Hi folks, I am running SAS  9.4 (TS1M7) 64 bit Windows.

It seems that SAS generates a phantom variable in the import file when I attempt to import a Google Spreadsheet. This variable is named "Var5" and seems to be generated between two numeric variables. 

this is a public domain data set, so I can post the SAS code here:

FILENAME SchlDisc url
'https://docs.google.com/spreadsheets/d/e/2PACX-1vR_v7KStLU7ASuBZP2c_2fyMDL23EVzXyQnDIUtaZjA9HazYLNld...' debug;
run;
proc import datafile=SchlDisc out=SchlDisc
DBMS=CSV REPLACE;
GETNAMES=Yes;
DATAROW=2;
guessingrows=max;
run;
proc contents data=SchlDisc position;run;

Any ideas on what I can do to fix this? thanks! 

1 ACCEPTED SOLUTION

Accepted Solutions
PhilWood
Obsidian | Level 7

it seems like this formatting problem is just an error in the log. The data appear correct, but having the log filled up with these notes is just kind of annoying. Thanks all for your help! Phil

View solution in original post

10 REPLIES 10
SASKiwi
PROC Star

Does just dropping it do what you want?

FILENAME SchlDisc url
'https://docs.google.com/spreadsheets/d/e/2PACX-1vR_v7KStLU7ASuBZP2c_2fyMDL23EVzXyQnDIUtaZjA9HazYLNld...' debug;
run;
proc import datafile=SchlDisc out=SchlDisc (drop = Var5)
DBMS=CSV REPLACE;
GETNAMES=Yes;
DATAROW=2;
guessingrows=max;
run;
proc contents data=SchlDisc position;run;
PhilWood
Obsidian | Level 7

No, because it appears to contain character data. it's not just an empty variable. thanks though!

SASKiwi
PROC Star

So what do you want to do with it then?

ballardw
Super User

Doesn't happen when I run your code with a link that works:

 

10   FILENAME SchlDisc url
11   'https://docs.google.com/spreadsheets/d/e/2PACX-1vR_v7KStLU7ASuBZP2c_2
11 ! fyMDL23EVzXyQnDIUtaZjA9HazYLNld_2qCoQOrhvJe7sd-nbEKgZtw-oV/pub?output=
11 ! csv'
12   ;
13
14   proc import datafile=SchlDisc out=SchlDisc
15   DBMS=CSV REPLACE;
16   GETNAMES=Yes;
17   DATAROW=2;
18   guessingrows=max;
19   run;

20    /********************************************************************
20 ! **
21    *   PRODUCT:   SAS
22    *   VERSION:   9.4
23    *   CREATOR:   External File Interface
24    *   DATE:      25JUL23
25    *   DESC:      Generated SAS Datastep Code
26    *   TEMPLATE SOURCE:  (None Specified.)
27    *********************************************************************
27 ! **/
28       data WORK.SCHLDISC    ;
29       %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
30       infile SCHLDISC delimiter = ',' MISSOVER DSD  firstobs=2 ;
31          informat YEAR best32. ;
32          informat COUNTY_DISTRICT_CODE best32. ;
33          informat DISTRICT_NAME $30. ;
34          informat SCHOOL_CODE best32. ;
35          informat DISTRICT_NAME $30. ;
36          informat Enrlmnt_K_12 best32. ;
37          informat Disc_IncdntS best32. ;
38          informat Disc_Incdnt_RATE best32. ;
39          informat Disc_ALCOHOL best32. ;
40          informat Disc_ALCOHOL_RATE best32. ;
41          informat Disc_DRUG best32. ;
42          informat Disc_DRUG_RATE best32. ;
43          informat Disc_OTHER best32. ;
44          informat Disc_OTHER_RATE best32. ;
45          informat Disc_TOBACCO best32. ;
46          informat Disc_TOBACCO_RATE best32. ;
47          informat Disc_VIOLENCE best32. ;
48          informat Disc_VIOLENCE_RATE best32. ;
49          informat Disc_WEAPON best32. ;
50          informat Disc_WEAPON_RATE best32. ;
51          informat Disc_INSCHL_SUSP best32. ;
52          informat Disc_INSCHL_SUSP_RATE best32. ;
53          informat Disc_OUTSCHL_SUSP best32. ;
54          informat Disc_OUTSCHL_SUSP_RATE best32. ;
55          informat Disc_Rmvl_EXPULSION best32. ;
56          informat Disc_EXPULSION_RATE best32. ;
57          informat Disc_10mDays best32. ;
58          informat Disc_10mDays_RATE best32. ;
59          format YEAR best12. ;
60          format COUNTY_DISTRICT_CODE best12. ;
61          format DISTRICT_NAME $30. ;
62          format SCHOOL_CODE best12. ;
63          format DISTRICT_NAME $30. ;
64          format Enrlmnt_K_12 best12. ;
65          format Disc_IncdntS best12. ;
66          format Disc_Incdnt_RATE best12. ;
67          format Disc_ALCOHOL best12. ;
68          format Disc_ALCOHOL_RATE best12. ;
69          format Disc_DRUG best12. ;
70          format Disc_DRUG_RATE best12. ;
71          format Disc_OTHER best12. ;
72          format Disc_OTHER_RATE best12. ;
73          format Disc_TOBACCO best12. ;
74          format Disc_TOBACCO_RATE best12. ;
75          format Disc_VIOLENCE best12. ;
76          format Disc_VIOLENCE_RATE best12. ;
77          format Disc_WEAPON best12. ;
78          format Disc_WEAPON_RATE best12. ;
79          format Disc_INSCHL_SUSP best12. ;
80          format Disc_INSCHL_SUSP_RATE best12. ;
81          format Disc_OUTSCHL_SUSP best12. ;
82          format Disc_OUTSCHL_SUSP_RATE best12. ;
83          format Disc_Rmvl_EXPULSION best12. ;
84          format Disc_EXPULSION_RATE best12. ;
85          format Disc_10mDays best12. ;
86          format Disc_10mDays_RATE best12. ;
87       input
88                   YEAR
89                   COUNTY_DISTRICT_CODE
90                   DISTRICT_NAME $
91                   SCHOOL_CODE
92                   DISTRICT_NAME $
93                   Enrlmnt_K_12
94                   Disc_IncdntS
95                   Disc_Incdnt_RATE
96                   Disc_ALCOHOL
97                   Disc_ALCOHOL_RATE
98                   Disc_DRUG
99                   Disc_DRUG_RATE
100                  Disc_OTHER
101                  Disc_OTHER_RATE
102                  Disc_TOBACCO
103                  Disc_TOBACCO_RATE
104                  Disc_VIOLENCE
105                  Disc_VIOLENCE_RATE
106                  Disc_WEAPON
107                  Disc_WEAPON_RATE
108                  Disc_INSCHL_SUSP
109                  Disc_INSCHL_SUSP_RATE
110                  Disc_OUTSCHL_SUSP
111                  Disc_OUTSCHL_SUSP_RATE
112                  Disc_Rmvl_EXPULSION
113                  Disc_EXPULSION_RATE
114                  Disc_10mDays
115                  Disc_10mDays_RATE
116      ;
117      if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR
117! detection macro variable */
118      run;

NOTE: The infile SCHLDISC is:

      Filename=https://docs.google.com/spreadsheets/d/e/2PACX-1vR_v7KStLU7A
      SuBZP2c_2fyMDL23EVzXyQnDIUtaZjA9HazYLNld_2qCoQOrhvJe7sd-nbEKgZtw-oV/p
      ub?output=csv,
      Local Host Name=DESKTOP-C36D7SF,
      Local Host IP addr=fe80::61a9:7f4a:a569:a706%11,
      Service Hostname Name=sea30s10-in-f1.1e100.net,
      Service IP addr=142.251.33.97,Service Name=N/A,
      Service Portno=443,Lrecl=32767,Recfm=Variable

NOTE: 59157 records were read from the infile SCHLDISC.
      The minimum record length was 58.
      The maximum record length was 140.
NOTE: The data set WORK.SCHLDISC has 59157 observations and 27 variables.
NOTE: DATA statement used (Total process time):
      real time           22.06 seconds
      cpu time            0.75 seconds


59157 rows created in WORK.SCHLDISC from SCHLDISC.



NOTE: WORK.SCHLDISC data set was successfully created.
NOTE: The data set WORK.SCHLDISC has 59157 observations and 27 variables.
NOTE: PROCEDURE IMPORT used (Total process time):
      real time           56.92 seconds
      cpu time            15.03 seconds



Show your LOG with the generated data step as I have shown above.

 

Typically that "Var5" type of name occurs when one of two things happens. One is when there is no column name or header provided.

The other is when the name of the variable is exactly the same for two columns for the first 32 characters.

So perhaps when you ran your code the version of the file had one of those two problems.

 

Or show us the names of the two variables on  either side of the Var5 as well. If it occurred between School_code and Enrlmnt_k_12 then look very closely if the column heading looks like District_name with other characters as there might be something odd there.

 

Note: County code, district code and any "code" value likely should not be considered numeric at all. I know that my school district 001 complains when someone uses 1.

 

Make sure that you post then entire link. As shown your link will not run unless taking a separate step to copy the url separately and replace your text, which returns a file not found message.

 

 

PhilWood
Obsidian | Level 7

Thanks- that's a lot of progress. It seems that I no longer have the phantom var5, but the log still shows many problems trying to read in the school name (which is character and which is character for the first few rows.). The messages all look like this (I won't bore you with all of them, because the log is quite long). 

NOTE: <<<
dqstring=1050 ADAIR CO. HIGH
fmt=
dqstring=1050 ADAIR CO. HIGH
fmt=
dqstring=1050 ADAIR CO. HIGH
fmt=
dqstring=1050 ADAIR CO. HIGH
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring=21ST CENTURY ALTERNATIVE
fmt=
dqstring=21ST CENTURY ALTERNATIVE
fmt=
dqstring=21ST CENTURY ALTERNATIVE
fmt=
dqstring=21ST CENTURY ALTERNATIVE
fmt=
dqstring=21ST CENTURY ALTERNATIVE
fmt=
dqstring=21ST CENTURY ALTERNATIVE
fmt=
dqstring=6TH GRADE CENTER
fmt=
dqstring=6TH GRADE CENTER
fmt=
dqstring=6TH GRADE CENTER
fmt=
dqstring=6TH GRADE CENTER
fmt=
dqstring=6TH GRADE CENTER
fmt=
dqstring=6TH GRADE CENTER
fmt=
dqstring=6TH GRADE CENTER
fmt=
dqstring=6TH GRADE CENTER
fmt=
dqstring=6TH GRADE CENTER
fmt=
dqstring=6TH GRADE CENTER
fmt=
dqstring=7th and 8th Grade Center
fmt=
dqstring=7th and 8th Grade Center
fmt=
dqstring=7th and 8th Grade Center
fmt=
dqstring=7th and 8th Grade Center
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=
dqstring=7 & 8 GRADE CENTER
fmt=

 

PhilWood
Obsidian | Level 7

it seems like this formatting problem is just an error in the log. The data appear correct, but having the log filled up with these notes is just kind of annoying. Thanks all for your help! Phil

Tom
Super User Tom
Super User

Normally error messages in the log are caused by text values in the file that are invalid for the numeric informat being used to read them.  That is normally something that happens when the analysis of what type of values appear in the CSV file made the wrong GUESS at how to read it.

 

Are you sure you used the GUESSINGROWS = MAX statement?

 

To avoid guessing just write your own data step to read the delimited text file.  Then you can set the names and types of the variables based on what you know they contain. 

 

If you do have to guess you could try using a different method to make the guesses.  You might try this macro instead:  https://github.com/sasutils/macros/blob/master/csv2ds.sas

 

 

 

 

PhilWood
Obsidian | Level 7
thanks. i'll give it a look. Oddly there's a long character in the first row of the data, but yes, I did use guessingrows max.
Tom
Super User Tom
Super User

@PhilWood wrote:
thanks. i'll give it a look. Oddly there's a long character in the first row of the data, but yes, I did use guessingrows max.

Another possibility is that the file has embedded end of line characters in one or more of the field values.

https://communities.sas.com/t5/SASware-Ballot-Ideas/Enhancements-to-INFILE-FILE-to-handle-delimited-...

 

That can make if hard for SAS to parse the lines properly.

ballardw
Super User

@PhilWood wrote:

Thanks- that's a lot of progress. It seems that I no longer have the phantom var5, but the log still shows many problems trying to read in the school name (which is character and which is character for the first few rows.). The messages all look like this (I won't bore you with all of them, because the log is quite long). 

NOTE: <<<
dqstring=1050 ADAIR CO. HIGH
fmt=
dqstring=1050 ADAIR CO. HIGH
fmt=
dqstring=1050 ADAIR CO. HIGH
fmt=
dqstring=1050 ADAIR CO. HIGH
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=
dqstring="VOY SPEARS, JR. ELEM."
fmt=

 


The information about the variables as in the actual data step would be more useful and that does not even look like a typical invalid data log entry. The DATA step would tell us what INFORMAT Proc Format assigned and the order it was reading. NOTHING when I read your file given in that link has a column heading of DQSTRING so it appears not to come from the problem file.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 939 views
  • 1 like
  • 4 in conversation