Hi everyone,
I am impoting CSV file of 6000 variables with option "getname=yes" but it only names till 2500 then it start taking names as Var2756.. and so on. So is there any limit of SAS or i need to supply any additional option in that.
proc import datafile="test.csv" out=xyz DBMS=CSV replace ;GUESSINGROWS=60000;getnames=yes; run;
Please help it throws no error but it does not take variable name and my task is to transpose this with the help of variable i cant even proceed by ignoring var names.
The maximum length of a SAS character variable is 32767.
So I guess that SAS runs out of space trying to read the first line with the variable names internally into a string. Since it then detects 6000 columns in the following rows, it acts as if not enough names had been supplied, and creates default ones.
Since 6000 columns point to a catastrophic design failure as such, I suggest to start correcting there.
@tjain90 wrote:
Thanks Kurt but how i can handle this situation now??
Use a LART on the client. Or go through all the complex handling yourself (along the pointers I gave you).
A big help would be to create a table with the metadata for the 6000 columns. You did get a documentation for the file, didn't you? If not, request one that lists all columns with names, datatype and format in a computer-readable form.
First try assigning a FILENAME to point to the physical file and adding LRECL option there and then ask PROC IMPORT to read from that file. If you are lucky PROC IMPORT will honor the LRECL setting on the FILENAME statement and be able to see all of the variable names.
filename MYCSV 'path to my csv file' lrecl=1000000 ;
proc import datafile=mycsv ....Otherwise just read in the variable names youself and generate rename command for the ones that were wrong.
data names ;
   infile  'path to my csv file' lrecl=1000000  dsd obs=1 ;
   do varnum=1 by 1 ;
      length name $32 ;
      input name @;
      output;
   end;
run;Well, firstly why do you have 6000 variables? That sounds like a very bad data structure, how were you planning on working with these further in code, typing 6000 names in each time? 30 or 40 is about the most I would want to work with at any one point.
Secondly, proc import is a guessing procedure. Whenever you use it your leaving your data import up to luck. Take the output from teh proc import - its in the log - then manipulates the datastep which it creates to match the data you knw best - this is the only way to a) produce reproduceable results and b) get data exactly as you require it.
Now onto your point, its quite likel that proc import breaks if you have that amount of variables. Firstly you are reading in a text file, think of it this way, the first row which has variable names in it - 6000 * 8 characters=48k characters to read in, is your lrecl value setup to handle this kind of size? Liklihood is that string is far longer, so just an example. Again, goto a datastep and on the infile statement set your lrecl to 32767, the longest possible, if your data is longer than that then you have problems and need to read it by holding over the pointer. Again this is all down to having so many variables which is unworkable.
The maximum possible lrecl on UNIX (SAS 9.4) is 1G, so you can safely assign a large value to your infile.
But: how do you read an arbitrary number of column names? My guess is that proc import uses the automatic variable _infile_, and that will most probably be limited to 32767 (otherwise all the SAS string functions would cough up).
One could devise a way where the first line is read in an extra step with recfm=n (unstructured stream), character-by-character, and the names are written to a SAS dataset from which the import data step would be created with call execute. Note that one has to take care that no single statement must exceed the same 32767 character limit, so you will end up with really lots of code, as you will have a separate
input varname@;statement for each column.
Bottom line: have the infile structured in an intelligent way. Intelligent data drives intelligent programs.
Make lrecl as long as enough.
filename x 'test.csv' lrecl=30000000000;
proc import datafile=x out=xyz DBMS=CSV replace ;GUESSINGROWS=60000;getnames=yes; run;
This will not work:
lrecl=30000000000
"In SAS 9.4, the default value for LRECL= is 32,767.... The maximum length is 1G."
@tjain90 wrote:
Hi everyone,
I am impoting CSV file of 6000 variables with option "getname=yes" but it only names till 2500 then it start taking names as Var2756.. and so on. So is there any limit of SAS or i need to supply any additional option in that.
SAS has rules for variable names that often require changing a column label from a different data source. If you have many column names with text longer than 32 characters, including spaces and non-valid for SAS name characters then SAS attempts to create something but if the first 32 characters duplicate then rules default to naming them VARXXX because SAS does not have sufficient information to determine a reasonable variable name. And sometimes there is no column header at all and those will also get a VARxxx type name.
I suggest taking a long look at the log. Proc import will generate a data step with all of the variables it assigned. Copy that from the log and edit it and assign the names and types you want. It may be that for as many variable as you have that you have blocks that may have similar meaning so you might use Cityname1 - Cityname15 if you have a block of 15 city names in the data as one example so typing lots of names might not be as much work as you think.
You also want to verify that the TYPES of variables are as you expected.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
