BookmarkSubscribeRSS Feed
qwerasdfzcv
Calcite | Level 5

There are over 50.000 columns in csv file. I tried to import it through the code below, but some of the columns were not imported and written such as var1500, var 1501. How do I fix this?

 

proc import datafile="path/sample.csv" out=data1 dbms=csv replace;

      getnames=yes;

run;

2 REPLIES 2
Tom
Super User Tom
Super User

So your CSV files has 50,000 variables?  And all of them have unique names?  And you don't know what they are before you read the file?  How are you going to analyze 50,000 variables?

 

PROC IMPORT usually does not do well when the header line is longer than 32K.  Even if your variable names were one character each (which would cause a different problem) you would need 100K for the headers because of the commas.

 

Just read the file with a data step.  If you don't know the variable names read them separately and figure out what to do with them later.  Make sure to use a long enough line length for your data lines.

Do you know the types of the variables?  Hopefully there is some pattern in the names?

 

data wide_wider_widest ;
  infile "path/sample.csv" dsd truncover firstobs=2 lrecl=2000000 ;
  input var1-var50000;
run;
data headers ;
  infile "path/sample.csv" dsd obs=1 lrecl=2000000 ;
  length name $32 label $256 ;
  name = cats('var',_n_);
  input label @@;
run;

 

 

PS: Do not use PROC IMPORT to read a CSV, other than for some quick and dirty testing.  It has to GUESS what is in the file.

 

ballardw
Super User

@qwerasdfzcv wrote:

There are over 50.000 columns in csv file. I tried to import it through the code below, but some of the columns were not imported and written such as var1500, var 1501. How do I fix this?

 

proc import datafile="path/sample.csv" out=data1 dbms=csv replace;

      getnames=yes;

run;


When you get variable names like VAR1500 one of two things is likely to be questionable in your data.

First, if there is no actual column header then the variable will be assigned VAR followed by the column number.

Second, if the length of two or more column headings are the same for the first 32 characters then the "duplicate" gets assigned Var and column number.

Show us the log from running that code. I strongly suspect there are some interesting messages about several of your columns. Rerun the code if necessary. Then go to the LOG window, copy the text from the Proc Import to the last note or message. Paste the result into a code box opened on the forum with the </> to preserve formatting of the Log text.

 

Heaven help from dealing with source data with 50K variables.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 733 views
  • 0 likes
  • 3 in conversation