I had a quick conversation w/ someone who was suggesting using the GUESSINGROWS=1 statement during a PROC IMPORT to "trick" SAS into assigning the desired variable type. He suggested creating a new row in the data table, the first row, and then adding a character or numeric value, depending on the desired outcome.
Is this a reasonable approach?
I can't quite figure out how to implement this. Any suggestions? Thanks!
PROC IMPORT DATAFILE="/folders/myfolders/...have.csv"
 OUT=wantt
 GUESSINGROWS=1;
 DBMS=csv replace; 
 GETNAMES=yes;
 RUN;
Order matters, guessingrows is a statement not an option on the proc line.
PROC IMPORT DATAFILE="/folders/myfolders/...have.csv"
 OUT=wantt ;
 GUESSINGROWS=1;
 DBMS=csv replace; 
 GETNAMES=yes;
datarow=2; 
 RUN;
This is a reasonable approach IF all of your character variables have the longest value (number of characters) in the first row. Otherwise you may get some truncation to match the length of the example in the first row.
There could also be some issues with values that might be interpretted as dates if the format changes or isn't actually supposed to be a date.
Order matters, guessingrows is a statement not an option on the proc line.
PROC IMPORT DATAFILE="/folders/myfolders/...have.csv"
 OUT=wantt ;
 GUESSINGROWS=1;
 DBMS=csv replace; 
 GETNAMES=yes;
datarow=2; 
 RUN;
This is a reasonable approach IF all of your character variables have the longest value (number of characters) in the first row. Otherwise you may get some truncation to match the length of the example in the first row.
There could also be some issues with values that might be interpretted as dates if the format changes or isn't actually supposed to be a date.
I find that being picky about indentation style helps reduce this type of error. When a single statement spans multiple lines place the ending semi-colon on a new line indented to the level of the original statement. Like you are indenting a DO/END block.
PROC IMPORT DATAFILE="/folders/myfolders/...have.csv"
  OUT=wantt 
  DBMS=csv replace
;
  GUESSINGROWS=1;
  GETNAMES=yes;
  datarow=2; 
RUN; 
Thanks for your help!
I'm getting an error w/ the following code. Is it possible this statement doesn't work in SAS University Edition?
PROC IMPORT DATAFILE="/folders/myfolders/...guessing_rows.csv" OUT=want DBMS=csv replace ; GUESSINGROWS=1; GETNAMES=yes; DATAROW=2; RUN;
LOG:
The data is uploaded as a .xlsx file, but I used it as a .csv.
Upload CSV with a .txt extension.
Excel may hide the issue.
Things to check for - variable names that have commas.
I don't recommend proc import for CSV - use a data step and explicitly assign formats/informats/lengths. It will prevent errors down the line. I do use proc import to get the code initially and then modify it.
Also, this can be automated and is documented. Changing an excel or CSV file is not traceable. Each import will be the same, you still run the risk of Proc Import not liking something if you accidently make a mistake in one file.
@_maldini_ wrote:
Thanks for your help!
I'm getting an error w/ the following code. Is it possible this statement doesn't work in SAS University Edition?
PROC IMPORT DATAFILE="/folders/myfolders/...guessing_rows.csv" OUT=want DBMS=csv replace ; GUESSINGROWS=1; GETNAMES=yes; DATAROW=2; RUN;
The data is uploaded as a .xlsx file, but I used it as a .csv.
The spreadsheet appears to have a few of the headers in 2 rows. If you did a save as csv from that file your data isn't on the correct line. Line 2 looks like this:
YO,,,,,,,,,,,,,,,,SIXDIG,SIXDIG,,,,,
So your second line doesn't really have data. Using guessingrows you have to indicate at least the first datarow. Which would be 3.
However your file has at least 2 other issues:
Your next to last row of data has invalid values for what are time values in the other rows. 101 is not going to be read as time when the rest are 12:00 AM type (when using proc import).
Second, and very typical from Excel, you have about 1800 rows of blank data. I suspect that may be because you deleted rows for the example but Excel "knows" those rows had data in exports them to the csv.
These options should allow reading the file as posted.
GUESSINGROWS=3;
GETNAMES=yes;
DATAROW=3; Hint. If you have a CSV file to post here for example data use a TXT extension.
<Line 2 looks like this: YO,,,,,,,,,,,,,,,,SIXDIG,SIXDIG,,,,,>
This was intentional. It was my attempt to dictate that the variables w/ these character values (e.g. YO, SIXDIG) be read-in as character variables, despite the fact that their subsequent values are purely numeric.
<Using guessingrows you have to indicate at least the first datarow. >
I may not understand how to implement this GUESSINGROWS approach =;-)...I was trying to indicate the second row - the row w/ YO and SIXDIG - as the row that SAS should use to determine the data type.
<Your next to last row of data has invalid values for what are time values in the other rows. 101 is not going to be read as time when the rest are 12:00 AM type (when using proc import).>
Will the times be read in as times?! I thought that SAS would simply create character variables for these variables given that there is no AM/PM associated w/ the time in one observation. Also, in my larger dataset, there are some instances where there is no space between the last digit and the AM or PM (e.g. 12:00PM v. 12:00 PM).
Couldn't this problem also be addressed w/ the GUESSINGROWS approach, assuming I implemented it correctly? If I put a time like, 12:00 PM, in the GUESSINGROWS row, wouldn't that dictate the variable type (Character) and maybe the informat?
Thanks again for your help!
@_maldini_ wrote:
If I put a time like, 12:00 PM, in the GUESSINGROWS row, wouldn't that dictate the variable type (Character) and maybe the informat?
The only thing that explicitly dictates type and informat is specifying it, which you cannot do via proc import.
Maybe "dictate" is too strong a word. Wouldn't it "trick" SAS into setting the variable type as character, and possibly use a relevant informat when reading-in the data?
I don't understand the question at this point. Do you have a CSV file or an XLSX file. The file you posted was an XLSX file. You can read it using PROC IMPORT.
proc import datafile='guessing_rows.xlsx'
  dbms=xlsx out=mydata replace
;
  sheet='GUESSING_ROWS.CSV';
run;
@Tom I uploaded an .xlsx file b/c it wouldn't let me upload a .csv file. I subsequently learned that I could simply change the extension to .txt. Sorry for the confusion. I uploaded a new file w/ a .txt extension.
Since your data has column headers you need to code GUESSINGROWS=2 instead of GUESSINGROWS=1.
I converted the first 10 observations of your XLSX file back into a CSV file and was able to get PROC IMPORT to read it.
filename csv 'guessing_rows.csv' ;
proc transpose data=mydata (obs=0) out=names;
  var _all_;
run;
data _null_ ;
  file csv dsd ;
  set names ;
  put _name_ @@ ;
run;
data _null_;
  file csv dsd mod ;
  set mydata (obs=10);
  put (_all_) (+0);
run;
proc import datafile=csv out=newdata replace 
  dbms=dlm 
;
  delimiter=',';
  guessingrows=2;
run;If you want to use GUESSINGROWS=1 then you should also add GETNAMES=NO;
proc import datafile=csv out=new2 replace 
  dbms=dlm 
;
  delimiter=',';
  guessingrows=1;
  getnames=no;
run;
Out of interest, why would you want to use proc import to import CSV data? It is a guessing procedure and hence will always have some problems. I am sure that if you are dealing with a data import, you will have an import agreement in place, detailing the structure and possibly contents of the datafile - so there is no confusion, e.g. changes - one either side. That document would then be the basis for a very simple datastep import program which reads the data in. Simple process, documented, validatable, replicatable, its what every data import should be 🐵
<Out of interest, why would you want to use proc import to import CSV data?>
This question of GUESSINGRO
The thought behind the GUESSINGRO
I am open to any other ideas or suggestions!
Thanks for your thoughts and help here.
I fairly often use Proc import on a CSV file to build a skeleton of code I can modify. If you look at the log you should see a complete data step generated by SAS to read the csv file.
I copy the code into the editor and adjust the properties for the items I need to use. I set character lengths, informats (sometimes specifying my own custom informats), insure numerics are numeric, dates are date values(with appropriate formats) and character variables are and add labels and often initial processins steps such as data quality (YarX should not be missing when VarY has value in range). Often I get to specify characterstics for a variable that was completely blank in the first data set due to infrequent assignment so that in later iterations it reads properly.
Then when I need to read another file of the same time I have a program ready to use and the variable types, lengths.
I will do the proc import even when I have a relatively proper document describing the data because it is easier to do search and replace to get desired variable names then typing from scratch.
None of my datasets with this have exceeded about 240 variables so far, but one project involveds going through this for in excess of 20 data layouts.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
