I am new to SAS and appreciate any help! I am using SAS Studio.
I have a dataset with three variables (ID number, exam score, and percentage points) and 619 observations.
I import my excel file to SAS and examine the dataset using the following SAS code:
/** Import grades file. **/
PROC IMPORT DATAFILE="/folders/myfolders/sasuser.v94/Exam 1 Scores.xlsx"
OUT=GRADES
DBMS=XLSX
REPLACE;
RUN;
Proc Print data=GRADES;
run;
I noticed that SAS added three new variables to the right of the data table labeled "D", "E", and "F" (see image below). I tried converting the excel file to a csv file and then uploading the csv file but the same thing happened.
Why is this happening and how do I fix it? Is it better to use excel files or csv files? Are there any tips or tricks for preparing files for import to SAS to make things go more smoothly?
Thanks!
One the many annoyances of Excel is what is a "used cell".
Enter a value in a column. Delete the value. The Column is treated by Excel as used. If you do a file save as to CSV then you will get a set of commas with no values because the column is used, so needs to be "saved" but has no values.
You will also get phantom rows of values when an entire row of values is deleted.
This is happening because the the Excel engine that SAS connects to is told those columns are used.
Doubly annoying, is often deleting those columns just promotes the replacement columns that shifted over to used.
The only way I have ever found a way to correct this is exporting the data to csv and then writing a data step to read what is wanted. If you look in the log after using Proc import with a CSV version you will see a data step program with those very missing variables because there were the exported commas mentioned. And proc import doesn't know that you only want some variables. However you can copy that data step from the log into the editor, clean up things like line numbers and remove all the references to the blank column variables, both in the informat and input sections. Re-run the modified code and only the variables you want should be in the output. Caveat: CSV has to read ALL the variables up to the rightmost one that you want in the output, no skipping columns entirely.
I have been set "data" that consisted of 5 or 6 columns of values and 100+ empty columns that were imported.
The KEEP statement is your friend.
Data want;
set have;
keep <names of the variables you want>.
run;
PS. Don't forget about possible hidden columns or rows. Those will also appear when imported.
There could be some hidden content in columns D, E and F. Try deleting the columns in Excel and re-save the workbook.
One the many annoyances of Excel is what is a "used cell".
Enter a value in a column. Delete the value. The Column is treated by Excel as used. If you do a file save as to CSV then you will get a set of commas with no values because the column is used, so needs to be "saved" but has no values.
You will also get phantom rows of values when an entire row of values is deleted.
This is happening because the the Excel engine that SAS connects to is told those columns are used.
Doubly annoying, is often deleting those columns just promotes the replacement columns that shifted over to used.
The only way I have ever found a way to correct this is exporting the data to csv and then writing a data step to read what is wanted. If you look in the log after using Proc import with a CSV version you will see a data step program with those very missing variables because there were the exported commas mentioned. And proc import doesn't know that you only want some variables. However you can copy that data step from the log into the editor, clean up things like line numbers and remove all the references to the blank column variables, both in the informat and input sections. Re-run the modified code and only the variables you want should be in the output. Caveat: CSV has to read ALL the variables up to the rightmost one that you want in the output, no skipping columns entirely.
I have been set "data" that consisted of 5 or 6 columns of values and 100+ empty columns that were imported.
The KEEP statement is your friend.
Data want;
set have;
keep <names of the variables you want>.
run;
PS. Don't forget about possible hidden columns or rows. Those will also appear when imported.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.