BookmarkSubscribeRSS Feed
trash
Fluorite | Level 6

So here's a data set: 

 
This is a set of observations on 27 people. Several things were noted. The first variable is the hospital in which the data
was collected, the second variable is the gender of the person, then a measure of the amount of iron in the blood called
hemoglobin (Haemo), then packed cell volume (Pcv), white blood cell count (Wbc), the number of lymphocytes (Lympho), neutrophil (Neutro), and serum lead concentration (Lead).
hospgenderhaemopcvwbclymphoneutrolead
AM13.4394100142517
Bf14.6465000153020
aF13.5424500192118
AM15464600231618
bm14.644510017 .19
Bm14444900202419
bf16.4494300211718
Af14.8444400162629
aF15.2464100271327
BM15.5488400344236
AM15.247 .262722
BF16.9505100281723
aF14.8444700242023
AM16.2455600262519
bM14.7434000231317
AF14.742340092213
Bm16.5455400183217
Bf15.4456900283624
BM15.1454600 .2917
aM14.2464200142528
bF15.946520083416
Bf16474700251418
AF17.450860037 .17
AM14.3435500203119
BF14.8444200152419
af15.5 .4900172716
am26.552001932

21


1) the missing data is noted with a period (.). How do write a statement for this? I would import data as a .csv, after would I write the missing data statement, then run it? I also need to account for the upper case letters vs lower case in the first two columns. Would I write a if-else statement? or would I just need to proc format or proc label? and what order should it be in?

here's my start:

data Hospital;
infile 'D:\hospital.csv'
delimiter= ','
firstobs=2;
input hosp $ gender $ haemo pcv wbc lympho neutro lead;
run;

 
5 REPLIES 5
mkeintz
PROC Star

What else did the  log say, aside from 0 observations?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Reeza
Super User

@trash wrote:


1) the missing data is noted with a period (.). How do write a statement for this?


When you import the data, check your imported data set. Either use PROC PRINT to display it or open the data set and explore it. What you want to 'write a statement' for is not clear. 

 

 I also need to account for the upper case letters vs lower case in the first two columns. Would I write a if-else statement? or would I just need to proc format or proc label? (what even is the difference?) and what order should it be in?

Look at the UPCASE() and LOWCASE() function that will convert all to the same case. Yes, this matters if they are equivalent and you want to treat them the same. 

http://video.sas.com/detail/videos/sas-analytics-u/video/4573023399001/creating-a-new-column-in-sas?...

 


 
 

I had trouble importing data (kept saying 0 observations) so I couldn't work much farther. 

Please see this video on SAS UE on how to import your data:

http://video.sas.com/detail/videos/sas-analytics-u/video/4573016758001/creating-a-sas-table-from-a-c...

 

Without the log we have no idea what went wrong. Always read your log. If your code doesn't work and you need help from here make sure to include your code and log.

 

In general, it sounds like you're just starting with SAS. The first SAS programming course is free and there are many videos here:

http://video.sas.com/#category/videos/sas-analytics-u

 

And a lot of great tutorials are here:

https://stats.idre.ucla.edu/sas/modules/



ballardw
Super User

It isn't clear if you successfully read the data or not. It might help to post the LOG with the code for the data step and any messages. Post into a code box opened with the forum {I} menu icon.

 

If the data is not sensitive, (containing personally identifiable info for example) you might copy a few lines from the CSV file using a text editor like Notepad or Word Pad (NOT Excel it reformats data) and post that into a code box as well.

 

Easiest way to address case issues like "m", "M", "f" and "F" in your gender field would be to add this like of code to the read program:

gender = upcase(gender); if you want consistent "M" and "F". If you want "m" and "f" use the lowcase function. You may want to do the same thing for your hospital variable unless "a" is a different hospital than "A". The way you are reading hospital will limit the value to 8 characters. If it should be longer you need to specify a longer length with one of an informat, length or attrib statement for hospital prior to the input statement.

 

In a csv file a missing value would most likely look like two commas together or two commas and a space: ,,    or , ,

and SAS will treat either of those a missing for numeric variable.

trash
Fluorite | Level 6

when you say 8 characters is that a default? It doesn't need to be longer. A=a for this data set. 

here is the data set again from a notepad: 

A,M,13.4,39,4100,14,25,17
B,f,14.6,46,5000,15,30,20
a,F,13.5,42,4500,19,21,18
A,M,15,46,4600,23,16,18
b,m,14.6,44,5100,17,,19
B,m,14,44,4900,20,24,19
b,f,16.4,49,4300,21,17,18
A,f,14.8,44,4400,16,26,29
a,F,15.2,46,4100,27,13,27
B,M,15.5,48,8400,34,42,36
A,M,15.2,47,,26,27,22
B,F,16.9,50,5100,28,17,23
a,F,14.8,44,4700,24,20,23
A,M,16.2,45,5600,26,25,19
b,M,14.7,43,4000,23,13,17
A,F,14.7,42,3400,9,22,13
B,m,16.5,45,5400,18,32,17
B,f,15.4,45,6900,28,36,24
B,M,15.1,45,4600,,29,17
a,M,14.2,46,4200,14,25,28
b,F,15.9,46,5200,8,34,16
B,f,16,47,4700,25,14,18
A,F,17.4,50,8600,37,,17
A,M,14.3,43,5500,20,31,19
B,F,14.8,44,4200,15,24,19
a,f,15.5,,4900,17,27,16
a,m,26.5,,5200,19,32,21

 

I see what you now mean with the double commas, but I'm not sure how to handle that when it comes to importing the data. 

s_lassen
Meteorite | Level 14

@trash:

The "double commas" you can account for by using the dsd option on your INFILE statement.

The easy way to account for lower/upper cases is to use the $UPCASE. informat.

That way, your program would be something like:

data Hospital;
  infile 'D:\hospital.csv' delimiter= ',' dsd firstobs=2;
  length hosp gender $1;
  informat hosp gender $upcase.;
  input hosp $ gender $ haemo pcv wbc lympho neutro lead;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1405 views
  • 6 likes
  • 5 in conversation