BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cassie-Andra
Calcite | Level 5

I have a SAS dataset containing 4203 variables that I'm trying to export to excel in a csv format. It exports without any issue, but when I open the csv file, it just drops the column names after about 1200 of those variables.  What is odd is that the data is still in the tables, but the variable/columns names are now blank.  I know that the length of the variable names is not the issue because the remaining variable names are named with the same naming convention as the first 1200 that did output with their column names.  Any ideas on why this is happening and how to include the variable names for the entire dataset?

 

Here is my SAS code:

 

%let Schooldata=[location];

proc export data=tempfile
outfile="&Schooldata\HOLD\School survey.csv"
dbms=csv label replace;
putnames=yes;
run;

 

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@Reeza wrote:
Rather than a PROC EXPORT try a data step export where you're manually specifying the LRECL for one - I suspect you're running into lines that are larger than 32K characters most likely. I think you can get around that in a data step. I'm not sure why you can't build your conversion into #5 from a SAS data set rather than a re-import.....but that's a design issue.

It is starting with Proc Import for the same file format repeatedly, though apparently it changed.

Time to let @Cassie-Andra know that Proc Import writes a data step to the log when importing CSV.

 

So copy the Data step from the log and paste into the editor. Make the changes there for the proper variable type by changing informats (or Attrib statements).

 

Then you can read a different file by changing the infile statement and create a different output data set by changing the name.

 

When changes are made you can add new informat information for the new variables and modify the INPUT statement.

 

And since I saw the word "survey" you might investigate whether the Survey software will output files in SAS or SPSS files. I know of several packages that provide those options. Which can save you a ton of headaches.

 

 

View solution in original post

18 REPLIES 18
PaigeMiller
Diamond | Level 26

What version of Excel? 

 

Let's start with a few simple diagnostics: did you look at the CSV file with a text editor to make sure the csv file has the proper variable names, and the CSV file appears intact and whole and doesn't have problems? Are there ERRORs, WARNINGs or NOTEs in the SAS log for PROC EXPORT that indicate a problem? In Excel, if you click on a cell where the name should be, what is in there, and can you scroll up within that cell to find the name?

--
Paige Miller
Cassie-Andra
Calcite | Level 5

Thank you for the response!

 

I'm currently working with MS Excel for Office 365 MSO 32-bit.  

 

I did not look at the CSV file with a text editor.  I will have to research how to do that, but will take any recommendations.  I was able to confirm that the first ~1200 variables with names were correct.  The data seemed to be just fine.  The only issue was missing column names.  

 

Clicking on the cells didn't show anything, and there were no Errors or Warnings in the log but there was one NOTE I was curious about; it mentioned DROPOVER. 

 

NOTE: 3056 records were written to the file '[location] School survey.csv'.
The minimum record length was 4431.
The maximum record length was 32767.
NOTE: Some output data was ignored because the DROPOVER option was specified.
NOTE: There were 3055 observations read from the data set WORK.TEMPFILE.

Cassie-Andra
Calcite | Level 5

Hello,

 

I am trying to import a csv file with over 4200 variable names.  For some reason, after I import this csv file into SAS, over a thousand of the variable names are now changed to VAR[#] followed by their order number.  So instead of having all of the variable names to work with, I have over a thousand variables named VAR3037 - VAR4203.  Help!  How can I keep all these variable names that are in my csv file?  I'm trying to pull in the very first row in the csv file as the column names, but the data doesn't actually start until row 4.  Rows 2 and 3 contain the descriptions of the variables (and those descriptions can be rather line....if that matters).  

 

Here is my SAS code:

 

%let Schooldata=[location];


/* Importing the csv file 'School Survey' */
proc import datafile="&Schooldata\School Survey.csv"
out=tempfile1
dbms=csv
replace;
datarow=4;
guessingrows=max;
run;

 

Thank you!

PaigeMiller
Diamond | Level 26

Can someone merge this with @Cassie-Andra's other similar thread?

--
Paige Miller
ballardw
Super User

If you are IMPORTING the CSV file that you said in this post https://communities.sas.com/t5/SAS-Programming/variable-names-disappearing-after-exporting-to-Excel/... had "disappearing" variables names when exported then what you get is the expected result for columns that do not have any column heading. This seems very likely considering in the export you used:

outfile="&Schooldata\HOLD\School survey.csv"

and here you use:

 import datafile="&Schooldata\School Survey.csv"

If at some time you had a CSV file that you imported into SAS and then exported with the same name and REPLACE option, as shown in your Export code, then congratulations, you destroyed your original data and need to find a backup that still has the original column headings.

 

 

 

Another common cause of VARxx naming is if the column headings are identical for the first 32 characters then the following columns with the same start will get VARxx names because there is no way for the procedure to "guess" what variable name to use when you exceed the number of  characters that SAS will use for a variable.

If the length is less than 32 characters with duplicate names SAS will use suffixes for the following columns. I have had as many as 30 "total" columns some data source data. Proc Import turned those into Total, Total1, Total2, ...., Total30.

Proc import will not create two variables with the same name. Period.

 

Cassie-Andra
Calcite | Level 5

First of all, great eye for detail in cross referencing both my posts.  To make my code seem a bit 'cleaner', I took off some of my output name, but they are definitely named differently.  Here is more of my code (minus about 10000 lines in the datastep):

 

%let Schooldata=Location;
/* Importing the csv file 'School Survey' */
proc import datafile="&Schooldata\School Survey.csv"
out=tempfile1
dbms=csv
replace;
datarow=4;
guessingrows=max;
run;

/* Renaming the data variables and dropping unnecessary variables. */
data tempfile;
retain Recorded_Date Recorded_Time;
length Recorded_Date Recorded_Time 8;
set tempfile1;
Recorded_Date=datepart(recordeddate);
Recorded_Time=timepart(recordedtime);
format Recorded_Date mmddyy10. Recorded_Time time8.;
rename
Q3_4=Name
Q3_5=Address

.... (~4000 more variables);

run;

 

proc export data=tempfile
outfile="&Schooldata\HOLD\School survey temp.csv"
dbms=csv label replace;
putnames=yes;
run;

 

data import_information;
infile "&Schooldata\HOLD\School survey temp.csv"
delimiter="," MISSOVER DSD lrecl=32767 firstobs=2 ;
length
Recorded_Date 8
Recorded_Time 8
Name $ 150
Address $ 200

....

; run;

 

/* creating permanent dataset */

libname school "&Schooldata";
data school.report;
set import_information;
run;

 

Great information about the column headings.  I verified again that the longest length of a title is 29 characters, and they are all distinct names.  We have a specific system we followed to name these variables, and they are all unique.  This code worked just fine up until we recently added about a 1000 new columns to capture more data.  

Reeza
Super User
Do you really want that LABEL option in your EXPORT code? That prints out the labels not the variable names.
Tom
Super User Tom
Super User

Where did you learn that Q3_4=Name?  Do you have a dataset that describes the columns in the CSV file?  If so then use that data to generate the code to read the file instead of forcing SAS to GUESS how to read the file by using PROC IMPORT.

Tom
Super User Tom
Super User

4200 variables to going to be a nightmare to work with, but perhaps the issue is that your header row is not in the first line?  Does the data itself start in row 4?  Where are the column headers?  

 

Also PROC IMPORT does not like header rows that are too long.  You could try defining a fileref so you could set a logical record length,

%let Schooldata=[location];
filename csv "&Schooldata\School Survey.csv" lrecl=1000000 ;
proc import datafile=csv dbms=csv
  out=tempfile1 replace
;
  datarow=4;
  guessingrows=max;
run;

but I don't think it will work.

 

You could read in the headers separately and rename the variables.  Use the GETNAMES='NO' statement in the PROC IMPORT.  Since you have a text file there is no need to use PROC IMPORT to read the headers.  So assuming the headers are in line 3 the code to read the headers might look like this:

data headers;
  input "&Schooldata\School Survey.csv" dsd firstobs=3 obs=3 lrecl=1000000 ;
  length varnum 8 name $32 header $200 ;
  varnum+1;
  input header :$200. @@ ;
  name = compress(header,' ');
run;     

You can then use those generated names to try to rename the VAR1, ... variables.

filename rename temp;
data _null_;
  set headers end=eof;
  file rename lrecl=75;
  if _n_=1 then put 'rename ' @;
  put 'VAR' varnum '=' name @ ;
  if eof then put ';' ;
run;

proc datasets nolist lib=WORK ;
  modify tempfile1;
%include rename / source2;
   run;
quit;  
Cassie-Andra
Calcite | Level 5

Tom,

 

Thank you for the sample code!  I've never seen this strategy before, but am thoroughly impressed.  I was actually able to solve my problem by importing with an infile statement instead of using proc import, but I think out of curiosity I'm going to try this method as well just to see how it works. 

ballardw
Super User

@Cassie-Andra wrote:

I have a SAS dataset containing 4203 variables that I'm trying to export to excel in a csv format. It exports without any issue, but when I open the csv file, it just drops the column names after about 1200 of those variables.  What is odd is that the data is still in the tables, but the variable/columns names are now blank.  I know that the length of the variable names is not the issue because the remaining variable names are named with the same naming convention as the first 1200 that did output with their column names.  Any ideas on why this is happening and how to include the variable names for the entire dataset?

 

Here is my SAS code:

 

%let Schooldata=[location];

proc export data=tempfile
outfile="&Schooldata\HOLD\School survey.csv"
dbms=csv label replace;
putnames=yes;
run;

 

Thank you!


For heavens sake, why to you have 4200 variables? 

I think you running into an LRECL related issue.

 

Related:

https://communities.sas.com/t5/SAS-Programming/Column-header-truncated-when-exporting-to-csv-Export-...

Cassie-Andra
Calcite | Level 5

Thank you for linking the resource, but I think I'm still not sure how to solve my problem.  Can you assist me in how this would look for my import?

 

For a little more context in what I'm trying to do, this spreadsheet with all these variables is downloaded from a survey website.  It is all the survey responses beginning with the very first response.  I have little to no control over how this spreadsheet is downloaded from the website, but this data is downloaded multiple times a day due to the high usage of this survey.  I then need to take this spreadsheet, import it into SAS, restructure the data so that I can take my dataset from short and wide to long and 'thin'.  

 

When we only had about 3500 variables, my current process was as follows:

1 - download the csv from the survey website

2 - import it into SAS

3 - rename variables and create a few new ones

4 - output into csv again

5 - re-import back into SAS with an infile statement forcing the formats of all these variables.  

 

The advantage to doing it this way is that if a survey respondent doesn't fill out some information in a cell, for ex: a date of birth, I can then force that cell to be a date format instead of a blank character.  I need this because if I set these variables together and if their formats aren't the same, I get issues.  

 

When we only had about 3500 variables, this process worked just fine.  Now that we've had to add some options (thus adding on a lot more columns) to about 4200 variables, this process is no longer working for me because my variable titles are disappearing in Step 4.  This then hinders Step 5 and all the other SAS tasks that I need to do to work with this data.  

 

Cassie

Reeza
Super User
Rather than a PROC EXPORT try a data step export where you're manually specifying the LRECL for one - I suspect you're running into lines that are larger than 32K characters most likely. I think you can get around that in a data step. I'm not sure why you can't build your conversion into #5 from a SAS data set rather than a re-import.....but that's a design issue.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 18 replies
  • 2890 views
  • 4 likes
  • 7 in conversation