Help using Base SAS procedures

How do I correct for missing values?

Reply
Occasional Contributor
Posts: 13

How do I correct for missing values?

[ Edited ]

So here's a data set: 

 
This is a set of observations on 27 people. Several things were noted. The first variable is the hospital in which the data
was collected, the second variable is the gender of the person, then a measure of the amount of iron in the blood called
hemoglobin (Haemo), then packed cell volume (Pcv), white blood cell count (Wbc), the number of lymphocytes (Lympho), neutrophil (Neutro), and serum lead concentration (Lead).
hospgenderhaemopcvwbclymphoneutrolead
AM13.4394100142517
Bf14.6465000153020
aF13.5424500192118
AM15464600231618
bm14.644510017 .19
Bm14444900202419
bf16.4494300211718
Af14.8444400162629
aF15.2464100271327
BM15.5488400344236
AM15.247 .262722
BF16.9505100281723
aF14.8444700242023
AM16.2455600262519
bM14.7434000231317
AF14.742340092213
Bm16.5455400183217
Bf15.4456900283624
BM15.1454600 .2917
aM14.2464200142528
bF15.946520083416
Bf16474700251418
AF17.450860037 .17
AM14.3435500203119
BF14.8444200152419
af15.5 .4900172716
am26.552001932

21


1) the missing data is noted with a period (.). How do write a statement for this? I would import data as a .csv, after would I write the missing data statement, then run it? I also need to account for the upper case letters vs lower case in the first two columns. Would I write a if-else statement? or would I just need to proc format or proc label? and what order should it be in?

here's my start:

data Hospital;
infile 'D:\hospital.csv'
delimiter= ','
firstobs=2;
input hosp $ gender $ haemo pcv wbc lympho neutro lead;
run;

 
Trusted Advisor
Posts: 1,346

Re: How do I correct for missing values?

What else did the  log say, aside from 0 observations?

Super User
Posts: 23,778

Re: How do I correct for missing values?

[ Edited ]

trash wrote:


1) the missing data is noted with a period (.). How do write a statement for this?


When you import the data, check your imported data set. Either use PROC PRINT to display it or open the data set and explore it. What you want to 'write a statement' for is not clear. 

 

 I also need to account for the upper case letters vs lower case in the first two columns. Would I write a if-else statement? or would I just need to proc format or proc label? (what even is the difference?) and what order should it be in?

Look at the UPCASE() and LOWCASE() function that will convert all to the same case. Yes, this matters if they are equivalent and you want to treat them the same. 

http://video.sas.com/detail/videos/sas-analytics-u/video/4573023399001/creating-a-new-column-in-sas?...

 


 
 

I had trouble importing data (kept saying 0 observations) so I couldn't work much farther. 

Please see this video on SAS UE on how to import your data:

http://video.sas.com/detail/videos/sas-analytics-u/video/4573016758001/creating-a-sas-table-from-a-c...

 

Without the log we have no idea what went wrong. Always read your log. If your code doesn't work and you need help from here make sure to include your code and log.

 

In general, it sounds like you're just starting with SAS. The first SAS programming course is free and there are many videos here:

http://video.sas.com/#category/videos/sas-analytics-u

 

And a lot of great tutorials are here:

https://stats.idre.ucla.edu/sas/modules/



Super User
Posts: 13,583

Re: How do I correct for missing values?

It isn't clear if you successfully read the data or not. It might help to post the LOG with the code for the data step and any messages. Post into a code box opened with the forum {I} menu icon.

 

If the data is not sensitive, (containing personally identifiable info for example) you might copy a few lines from the CSV file using a text editor like Notepad or Word Pad (NOT Excel it reformats data) and post that into a code box as well.

 

Easiest way to address case issues like "m", "M", "f" and "F" in your gender field would be to add this like of code to the read program:

gender = upcase(gender); if you want consistent "M" and "F". If you want "m" and "f" use the lowcase function. You may want to do the same thing for your hospital variable unless "a" is a different hospital than "A". The way you are reading hospital will limit the value to 8 characters. If it should be longer you need to specify a longer length with one of an informat, length or attrib statement for hospital prior to the input statement.

 

In a csv file a missing value would most likely look like two commas together or two commas and a space: ,,    or , ,

and SAS will treat either of those a missing for numeric variable.

Occasional Contributor
Posts: 13

Re: How do I correct for missing values?

when you say 8 characters is that a default? It doesn't need to be longer. A=a for this data set. 

here is the data set again from a notepad: 

A,M,13.4,39,4100,14,25,17
B,f,14.6,46,5000,15,30,20
a,F,13.5,42,4500,19,21,18
A,M,15,46,4600,23,16,18
b,m,14.6,44,5100,17,,19
B,m,14,44,4900,20,24,19
b,f,16.4,49,4300,21,17,18
A,f,14.8,44,4400,16,26,29
a,F,15.2,46,4100,27,13,27
B,M,15.5,48,8400,34,42,36
A,M,15.2,47,,26,27,22
B,F,16.9,50,5100,28,17,23
a,F,14.8,44,4700,24,20,23
A,M,16.2,45,5600,26,25,19
b,M,14.7,43,4000,23,13,17
A,F,14.7,42,3400,9,22,13
B,m,16.5,45,5400,18,32,17
B,f,15.4,45,6900,28,36,24
B,M,15.1,45,4600,,29,17
a,M,14.2,46,4200,14,25,28
b,F,15.9,46,5200,8,34,16
B,f,16,47,4700,25,14,18
A,F,17.4,50,8600,37,,17
A,M,14.3,43,5500,20,31,19
B,F,14.8,44,4200,15,24,19
a,f,15.5,,4900,17,27,16
a,m,26.5,,5200,19,32,21

 

I see what you now mean with the double commas, but I'm not sure how to handle that when it comes to importing the data. 

PROC Star
Posts: 266

Re: How do I correct for missing values?

@trash:

The "double commas" you can account for by using the dsd option on your INFILE statement.

The easy way to account for lower/upper cases is to use the $UPCASE. informat.

That way, your program would be something like:

data Hospital;
  infile 'D:\hospital.csv' delimiter= ',' dsd firstobs=2;
  length hosp gender $1;
  informat hosp gender $upcase.;
  input hosp $ gender $ haemo pcv wbc lympho neutro lead;
run;
Ask a Question
Discussion stats
  • 5 replies
  • 343 views
  • 6 likes
  • 5 in conversation