BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
afrocentrix
Fluorite | Level 6

Hi all, first time poster here. I did try to searching Google and this website in particular for something related to my issue which I imagine must be basic but I couldn't find anything. Essentially I have a dataset that I cleaned in Excel and uploaded to SAS. When I run Summary Statistics it says 4186 observations (as it should) but then it shows over a million missing values which are the blanks underneath the data. Is there some setting that I have wrong?

 

This is my summary statistics:

afrocentrix_0-1649217079554.png

 

1 ACCEPTED SOLUTION

Accepted Solutions
afrocentrix
Fluorite | Level 6
Thank you both for your feedback! What I ended up doing based on your feedback is copying the data itself into a new worksheet. That helped eliminate the issue.

View solution in original post

5 REPLIES 5
afrocentrix
Fluorite | Level 6
Thanks for your feedback!
ballardw
Super User

You did nothing wrong. The designers of Excel did, at least in my opinion.

Excel files often have what I call "phantom" data. At some time the file likely had some values that were deleted. These could be entire rows or columns. However Excel considered these cells "used" and the engines used to import this data creates missing values in SAS.

 

I have received data with upwards of 100,000 rows of such garbage because of repeated manual edits of the files.

Options are either to Save the data as CSV, write a data step to read the data and either edit the CSV to remove the rows of commas that result or add a statement like this after the input.

If _infile_ =: ',,,,,' then delete;

Or take a pass at the data with a data step to keep records where the first few variables are all missing (use your variable names of course)

 

If you get a bunch of variables from reading the data as Excel with names like Var15 Var16 Var17 (or similar) that can be an indication of columns with missing values and after checking the data you can drop those variables.

data have;
   set have;
   if missing(var1) and Missing(var2) and missing(var3) then delete;
run;

If you want to test this follow these simple steps:

Create a new Excel file.

Enter data in the first column for 5 to 10 rows. First row should be column heading like a variable name.

Delete all the data except the first two rows.

Import that file.

See if the number of observations matches your initial rows with all but one missing values.

 

afrocentrix
Fluorite | Level 6
Thank you for your feedback!
afrocentrix
Fluorite | Level 6
Thank you both for your feedback! What I ended up doing based on your feedback is copying the data itself into a new worksheet. That helped eliminate the issue.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 614 views
  • 3 likes
  • 3 in conversation