BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
_maldini_
Barite | Level 11

I had a quick conversation w/ someone who was suggesting using the GUESSINGROWS=1 statement during a PROC IMPORT to "trick" SAS into assigning the desired variable type. He suggested creating a new row in the data table, the first row, and then adding a character or numeric value, depending on the desired outcome. 

 

Is this a reasonable approach?

 

I can't quite figure out how to implement this. Any suggestions? Thanks!

 

PROC IMPORT DATAFILE="/folders/myfolders/...have.csv"
OUT=wantt
GUESSINGROWS=1;
DBMS=csv replace;
GETNAMES=yes;
RUN;

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Order matters, guessingrows is a statement not an option on the proc line.

 

PROC IMPORT DATAFILE="/folders/myfolders/...have.csv"
OUT=wantt ;
GUESSINGROWS=1;
DBMS=csv replace;
GETNAMES=yes;

datarow=2;
RUN;

 

This is a reasonable approach IF all of your character variables have the longest value (number of characters) in the first row. Otherwise you may get some truncation to match the length of the example in the first row.

There could also be some issues with values that might be interpretted as dates if the format changes or isn't actually supposed to be a date.

View solution in original post

18 REPLIES 18
ballardw
Super User

Order matters, guessingrows is a statement not an option on the proc line.

 

PROC IMPORT DATAFILE="/folders/myfolders/...have.csv"
OUT=wantt ;
GUESSINGROWS=1;
DBMS=csv replace;
GETNAMES=yes;

datarow=2;
RUN;

 

This is a reasonable approach IF all of your character variables have the longest value (number of characters) in the first row. Otherwise you may get some truncation to match the length of the example in the first row.

There could also be some issues with values that might be interpretted as dates if the format changes or isn't actually supposed to be a date.

Tom
Super User Tom
Super User

I find that being picky about indentation style helps reduce this type of error.  When a single statement spans multiple lines place the ending semi-colon on a new line indented to the level of the original statement.  Like you are indenting a DO/END block.

PROC IMPORT DATAFILE="/folders/myfolders/...have.csv"
  OUT=wantt 
  DBMS=csv replace
;
  GUESSINGROWS=1;
  GETNAMES=yes;
  datarow=2; 
RUN; 

 

_maldini_
Barite | Level 11

 

@ballardw

@Tom

Thanks for your help!

 

I'm getting an error w/ the following code. Is it possible this statement doesn't work in SAS University Edition?

 

PROC IMPORT       
DATAFILE="/folders/myfolders/...guessing_rows.csv"
OUT=want 
DBMS=csv replace
;
GUESSINGROWS=1;
GETNAMES=yes;
DATAROW=2; 
RUN;

LOG:

NOTE: Unable to open parameter catalog: SASUSER.PARMS.PARMS.SLIST in update mode. Temporary parameter values will be saved to
WORK.PARMS.PARMS.SLIST.
Number of names found is greater than number of variables found.
Number of names found is greater than number of variables found.
ERROR: Import unsuccessful. See SAS Log for details.
NOTE: The SAS System stopped processing this step because of errors.
 

The data is uploaded as a .xlsx file, but I used it as a .csv.

Reeza
Super User

Upload CSV with a .txt extension. 

Excel may hide the issue. 

 

Things to check for - variable names that have commas.

 

I don't recommend proc import for CSV - use a data step and explicitly assign formats/informats/lengths. It will prevent errors down the line. I do use proc import to get the code initially and then modify it. 

 

Also, this can be automated and is documented. Changing an excel or CSV file is not traceable. Each import will be the same, you still run the risk of Proc Import not liking something if you accidently make a mistake in one file. 

 

ballardw
Super User

@_maldini_ wrote:

 

@ballardw

@Tom

Thanks for your help!

 

I'm getting an error w/ the following code. Is it possible this statement doesn't work in SAS University Edition?

 

PROC IMPORT       
DATAFILE="/folders/myfolders/...guessing_rows.csv"
OUT=want 
DBMS=csv replace
;
GUESSINGROWS=1;
GETNAMES=yes;
DATAROW=2; 
RUN;

 

 

The data is uploaded as a .xlsx file, but I used it as a .csv.


The spreadsheet appears to have a few of the headers in 2 rows. If you did a save as csv from that file your data isn't on the correct line. Line 2 looks like this:

YO,,,,,,,,,,,,,,,,SIXDIG,SIXDIG,,,,,

So your second line doesn't really have data. Using guessingrows you have to indicate at least the first datarow. Which would be 3.

However your file has at least 2 other issues:

Your next to last row of data has invalid values for what are time values in the other rows. 101 is not going to be read as time when the rest are 12:00 AM type (when using proc import).

Second, and very typical from Excel, you have about 1800 rows of blank data. I suspect that may be because you deleted rows for the example but Excel "knows" those rows had data in exports them to the csv.

These options should allow reading the file as posted.

GUESSINGROWS=3;
GETNAMES=yes;
DATAROW=3; 

Hint. If you have a CSV file to post here for example data use a TXT extension.

 

_maldini_
Barite | Level 11

@ballardw

<Line 2 looks like this: YO,,,,,,,,,,,,,,,,SIXDIG,SIXDIG,,,,,>

 

This was intentional. It was my attempt to dictate that the variables w/ these character values (e.g. YO, SIXDIG) be read-in as character variables, despite the fact that their subsequent values are purely numeric.

 

<Using guessingrows you have to indicate at least the first datarow. >

 

I may not understand how to implement this GUESSINGROWS approach =;-)...I was trying to indicate the second row - the row w/ YO and SIXDIG - as the row that SAS should use to determine the data type.

 

<Your next to last row of data has invalid values for what are time values in the other rows. 101 is not going to be read as time when the rest are 12:00 AM type (when using proc import).>

 

Will the times be read in as times?! I thought that SAS would simply create character variables for these variables given that there is no AM/PM associated w/ the time in one observation. Also, in my larger dataset, there are some instances where there is no space between the last digit and the AM or PM (e.g. 12:00PM v. 12:00 PM). 

 

Couldn't this problem also be addressed w/ the GUESSINGROWS approach, assuming I implemented it correctly? If I put a time like, 12:00 PM, in the GUESSINGROWS row, wouldn't that dictate the variable type (Character) and maybe the informat?

 

Thanks again for your help!

Reeza
Super User

@_maldini_ wrote:

If I put a time like, 12:00 PM, in the GUESSINGROWS row, wouldn't that dictate the variable type (Character) and maybe the informat?

 

 


The only thing that explicitly dictates type and informat is specifying it, which you cannot do via proc import. 

_maldini_
Barite | Level 11

Maybe "dictate" is too strong a word. Wouldn't it "trick" SAS into setting the variable type as character, and possibly use a relevant informat when reading-in the data?

Tom
Super User Tom
Super User

I don't understand the question at this point.  Do you have a CSV file or an XLSX file.  The file you posted was an XLSX file.  You can read it using PROC IMPORT.

 

proc import datafile='guessing_rows.xlsx'
  dbms=xlsx out=mydata replace
;
  sheet='GUESSING_ROWS.CSV';
run;
_maldini_
Barite | Level 11

@Tom I uploaded an .xlsx file b/c it wouldn't let me upload a .csv file. I subsequently learned that I could simply change the extension to .txt. Sorry for the confusion. I uploaded a new file w/ a .txt extension.

Tom
Super User Tom
Super User

Since your data has column headers you need to code GUESSINGROWS=2 instead of GUESSINGROWS=1. 

I converted the first 10 observations of your XLSX file back into a CSV file and was able to get PROC IMPORT to read it.

filename csv 'guessing_rows.csv' ;
proc transpose data=mydata (obs=0) out=names;
  var _all_;
run;
data _null_ ;
  file csv dsd ;
  set names ;
  put _name_ @@ ;
run;
data _null_;
  file csv dsd mod ;
  set mydata (obs=10);
  put (_all_) (+0);
run;

proc import datafile=csv out=newdata replace 
  dbms=dlm 
;
  delimiter=',';
  guessingrows=2;
run;

If you want to use GUESSINGROWS=1 then you should also add GETNAMES=NO;

proc import datafile=csv out=new2 replace 
  dbms=dlm 
;
  delimiter=',';
  guessingrows=1;
  getnames=no;
run;

 

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Out of interest, why would you want to use proc import to import CSV data?  It is a guessing procedure and hence will always have some problems.  I am sure that if you are dealing with a data import, you will have an import agreement in place, detailing the structure and possibly contents of the datafile - so there is no confusion, e.g. changes - one either side.  That document would then be the basis for a very simple datastep import program which reads the data in.  Simple process, documented, validatable, replicatable, its what every data import should be 🐵

_maldini_
Barite | Level 11

<Out of interest, why would you want to use proc import to import CSV data?>

 

This question of GUESSINGROWS arises out of a broader question. Using PROC IMPORT v. a DATA step to read-in your data. As I understand it, and I am by now means an expert here, the risk of using PROC IMPORT is that SAS "guesses" incorrectly on one or more variables, and this causes problems down the line. The risk of using a DATA step is that it may be a lot of work if you have many variables for which to set attributes. My dataset has 500 variables...

 

The thought behind the GUESSINGROWS approach was that this might be a middle ground approach. Also, @Reeza offered, what sounds like, a good approach. Use PROC IMPORT and then just copy the syntax from the log and change the attritbutes on the variables that you want to change. 

 

I am open to any other ideas or suggestions!

 

Thanks for your thoughts and help here.

ballardw
Super User

I fairly often use Proc import on a CSV file to build a skeleton of code I can modify. If you look at the log you should see a complete data step generated by SAS to read the csv file.

I copy the code into the editor and adjust the properties for the items I need to use. I set character lengths, informats (sometimes specifying my own custom informats), insure numerics are numeric, dates are date values(with appropriate formats) and character variables are and add labels and often initial processins steps such as data quality (YarX should not be missing when VarY has value in range). Often I get to specify characterstics for a variable that was completely blank in the first data set due to infrequent assignment so that in later iterations it reads properly.

Then when I need to read another file of the same time I have a program ready to use and the variable types, lengths.

 

I will do the proc import even when I have a relatively proper document describing the data because it is easier to do search and replace to get desired variable names then typing from scratch.

 

None of my datasets with this have exceeded about 240 variables so far, but one project involveds going through this for in excess of 20 data layouts.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 18 replies
  • 4240 views
  • 14 likes
  • 5 in conversation