## How do I correct for missing values?

Occasional Contributor
Posts: 13

# How do I correct for missing values?

[ Edited ]

So here's a data set:

This is a set of observations on 27 people. Several things were noted. The first variable is the hospital in which the data
was collected, the second variable is the gender of the person, then a measure of the amount of iron in the blood called
hemoglobin (Haemo), then packed cell volume (Pcv), white blood cell count (Wbc), the number of lymphocytes (Lympho), neutrophil (Neutro), and serum lead concentration (Lead).
 hosp gender haemo pcv wbc lympho neutro lead A M 13.4 39 4100 14 25 17 B f 14.6 46 5000 15 30 20 a F 13.5 42 4500 19 21 18 A M 15 46 4600 23 16 18 b m 14.6 44 5100 17 . 19 B m 14 44 4900 20 24 19 b f 16.4 49 4300 21 17 18 A f 14.8 44 4400 16 26 29 a F 15.2 46 4100 27 13 27 B M 15.5 48 8400 34 42 36 A M 15.2 47 . 26 27 22 B F 16.9 50 5100 28 17 23 a F 14.8 44 4700 24 20 23 A M 16.2 45 5600 26 25 19 b M 14.7 43 4000 23 13 17 A F 14.7 42 3400 9 22 13 B m 16.5 45 5400 18 32 17 B f 15.4 45 6900 28 36 24 B M 15.1 45 4600 . 29 17 a M 14.2 46 4200 14 25 28 b F 15.9 46 5200 8 34 16 B f 16 47 4700 25 14 18 A F 17.4 50 8600 37 . 17 A M 14.3 43 5500 20 31 19 B F 14.8 44 4200 15 24 19 a f 15.5 . 4900 17 27 16 a m 26.5 . 5200 19 32 21

1) the missing data is noted with a period (.). How do write a statement for this? I would import data as a .csv, after would I write the missing data statement, then run it? I also need to account for the upper case letters vs lower case in the first two columns. Would I write a if-else statement? or would I just need to proc format or proc label? and what order should it be in?

here's my start:

data Hospital;
infile 'D:\hospital.csv'
delimiter= ','
firstobs=2;
input hosp \$ gender \$ haemo pcv wbc lympho neutro lead;
run;

Posts: 1,346

## Re: How do I correct for missing values?

What else did the  log say, aside from 0 observations?

Super User
Posts: 23,778

## Re: How do I correct for missing values?

[ Edited ]

trash wrote:

1) the missing data is noted with a period (.). How do write a statement for this?

When you import the data, check your imported data set. Either use PROC PRINT to display it or open the data set and explore it. What you want to 'write a statement' for is not clear.

I also need to account for the upper case letters vs lower case in the first two columns. Would I write a if-else statement? or would I just need to proc format or proc label? (what even is the difference?) and what order should it be in?

Look at the UPCASE() and LOWCASE() function that will convert all to the same case. Yes, this matters if they are equivalent and you want to treat them the same.

http://video.sas.com/detail/videos/sas-analytics-u/video/4573023399001/creating-a-new-column-in-sas?...

I had trouble importing data (kept saying 0 observations) so I couldn't work much farther.

Please see this video on SAS UE on how to import your data:

http://video.sas.com/detail/videos/sas-analytics-u/video/4573016758001/creating-a-sas-table-from-a-c...

Without the log we have no idea what went wrong. Always read your log. If your code doesn't work and you need help from here make sure to include your code and log.

In general, it sounds like you're just starting with SAS. The first SAS programming course is free and there are many videos here:

http://video.sas.com/#category/videos/sas-analytics-u

And a lot of great tutorials are here:

https://stats.idre.ucla.edu/sas/modules/

Super User
Posts: 13,583

## Re: How do I correct for missing values?

It isn't clear if you successfully read the data or not. It might help to post the LOG with the code for the data step and any messages. Post into a code box opened with the forum {I} menu icon.

If the data is not sensitive, (containing personally identifiable info for example) you might copy a few lines from the CSV file using a text editor like Notepad or Word Pad (NOT Excel it reformats data) and post that into a code box as well.

Easiest way to address case issues like "m", "M", "f" and "F" in your gender field would be to add this like of code to the read program:

gender = upcase(gender); if you want consistent "M" and "F". If you want "m" and "f" use the lowcase function. You may want to do the same thing for your hospital variable unless "a" is a different hospital than "A". The way you are reading hospital will limit the value to 8 characters. If it should be longer you need to specify a longer length with one of an informat, length or attrib statement for hospital prior to the input statement.

In a csv file a missing value would most likely look like two commas together or two commas and a space: ,,    or , ,

and SAS will treat either of those a missing for numeric variable.

Occasional Contributor
Posts: 13

## Re: How do I correct for missing values?

when you say 8 characters is that a default? It doesn't need to be longer. A=a for this data set.

here is the data set again from a notepad:

A,M,13.4,39,4100,14,25,17
B,f,14.6,46,5000,15,30,20
a,F,13.5,42,4500,19,21,18
A,M,15,46,4600,23,16,18
b,m,14.6,44,5100,17,,19
B,m,14,44,4900,20,24,19
b,f,16.4,49,4300,21,17,18
A,f,14.8,44,4400,16,26,29
a,F,15.2,46,4100,27,13,27
B,M,15.5,48,8400,34,42,36
A,M,15.2,47,,26,27,22
B,F,16.9,50,5100,28,17,23
a,F,14.8,44,4700,24,20,23
A,M,16.2,45,5600,26,25,19
b,M,14.7,43,4000,23,13,17
A,F,14.7,42,3400,9,22,13
B,m,16.5,45,5400,18,32,17
B,f,15.4,45,6900,28,36,24
B,M,15.1,45,4600,,29,17
a,M,14.2,46,4200,14,25,28
b,F,15.9,46,5200,8,34,16
B,f,16,47,4700,25,14,18
A,F,17.4,50,8600,37,,17
A,M,14.3,43,5500,20,31,19
B,F,14.8,44,4200,15,24,19
a,f,15.5,,4900,17,27,16
a,m,26.5,,5200,19,32,21

I see what you now mean with the double commas, but I'm not sure how to handle that when it comes to importing the data.

PROC Star
Posts: 266

## Re: How do I correct for missing values?

The "double commas" you can account for by using the dsd option on your INFILE statement.

The easy way to account for lower/upper cases is to use the \$UPCASE. informat.

That way, your program would be something like:

```data Hospital;
infile 'D:\hospital.csv' delimiter= ',' dsd firstobs=2;
length hosp gender \$1;
informat hosp gender \$upcase.;
input hosp \$ gender \$ haemo pcv wbc lympho neutro lead;
run;```
Discussion stats
• 5 replies
• 343 views
• 6 likes
• 5 in conversation