So here's a data set:
hosp | gender | haemo | pcv | wbc | lympho | neutro | lead |
A | M | 13.4 | 39 | 4100 | 14 | 25 | 17 |
B | f | 14.6 | 46 | 5000 | 15 | 30 | 20 |
a | F | 13.5 | 42 | 4500 | 19 | 21 | 18 |
A | M | 15 | 46 | 4600 | 23 | 16 | 18 |
b | m | 14.6 | 44 | 5100 | 17 | . | 19 |
B | m | 14 | 44 | 4900 | 20 | 24 | 19 |
b | f | 16.4 | 49 | 4300 | 21 | 17 | 18 |
A | f | 14.8 | 44 | 4400 | 16 | 26 | 29 |
a | F | 15.2 | 46 | 4100 | 27 | 13 | 27 |
B | M | 15.5 | 48 | 8400 | 34 | 42 | 36 |
A | M | 15.2 | 47 | . | 26 | 27 | 22 |
B | F | 16.9 | 50 | 5100 | 28 | 17 | 23 |
a | F | 14.8 | 44 | 4700 | 24 | 20 | 23 |
A | M | 16.2 | 45 | 5600 | 26 | 25 | 19 |
b | M | 14.7 | 43 | 4000 | 23 | 13 | 17 |
A | F | 14.7 | 42 | 3400 | 9 | 22 | 13 |
B | m | 16.5 | 45 | 5400 | 18 | 32 | 17 |
B | f | 15.4 | 45 | 6900 | 28 | 36 | 24 |
B | M | 15.1 | 45 | 4600 | . | 29 | 17 |
a | M | 14.2 | 46 | 4200 | 14 | 25 | 28 |
b | F | 15.9 | 46 | 5200 | 8 | 34 | 16 |
B | f | 16 | 47 | 4700 | 25 | 14 | 18 |
A | F | 17.4 | 50 | 8600 | 37 | . | 17 |
A | M | 14.3 | 43 | 5500 | 20 | 31 | 19 |
B | F | 14.8 | 44 | 4200 | 15 | 24 | 19 |
a | f | 15.5 | . | 4900 | 17 | 27 | 16 |
a | m | 26.5 | . | 5200 | 19 | 32 | 21 |
1) the missing data is noted with a period (.). How do write a statement for this? I would import data as a .csv, after would I write the missing data statement, then run it? I also need to account for the upper case letters vs lower case in the first two columns. Would I write a if-else statement? or would I just need to proc format or proc label? and what order should it be in?
here's my start:
data Hospital;
infile 'D:\hospital.csv'
delimiter= ','
firstobs=2;
input hosp $ gender $ haemo pcv wbc lympho neutro lead;
run;
What else did the log say, aside from 0 observations?
@trash wrote:
1) the missing data is noted with a period (.). How do write a statement for this?
When you import the data, check your imported data set. Either use PROC PRINT to display it or open the data set and explore it. What you want to 'write a statement' for is not clear.
I also need to account for the upper case letters vs lower case in the first two columns. Would I write a if-else statement? or would I just need to proc format or proc label? (what even is the difference?) and what order should it be in?
Look at the UPCASE() and LOWCASE() function that will convert all to the same case. Yes, this matters if they are equivalent and you want to treat them the same.
I had trouble importing data (kept saying 0 observations) so I couldn't work much farther.
Please see this video on SAS UE on how to import your data:
Without the log we have no idea what went wrong. Always read your log. If your code doesn't work and you need help from here make sure to include your code and log.
In general, it sounds like you're just starting with SAS. The first SAS programming course is free and there are many videos here:
http://video.sas.com/#category/videos/sas-analytics-u
And a lot of great tutorials are here:
https://stats.idre.ucla.edu/sas/modules/
It isn't clear if you successfully read the data or not. It might help to post the LOG with the code for the data step and any messages. Post into a code box opened with the forum {I} menu icon.
If the data is not sensitive, (containing personally identifiable info for example) you might copy a few lines from the CSV file using a text editor like Notepad or Word Pad (NOT Excel it reformats data) and post that into a code box as well.
Easiest way to address case issues like "m", "M", "f" and "F" in your gender field would be to add this like of code to the read program:
gender = upcase(gender); if you want consistent "M" and "F". If you want "m" and "f" use the lowcase function. You may want to do the same thing for your hospital variable unless "a" is a different hospital than "A". The way you are reading hospital will limit the value to 8 characters. If it should be longer you need to specify a longer length with one of an informat, length or attrib statement for hospital prior to the input statement.
In a csv file a missing value would most likely look like two commas together or two commas and a space: ,, or , ,
and SAS will treat either of those a missing for numeric variable.
when you say 8 characters is that a default? It doesn't need to be longer. A=a for this data set.
here is the data set again from a notepad:
A,M,13.4,39,4100,14,25,17
B,f,14.6,46,5000,15,30,20
a,F,13.5,42,4500,19,21,18
A,M,15,46,4600,23,16,18
b,m,14.6,44,5100,17,,19
B,m,14,44,4900,20,24,19
b,f,16.4,49,4300,21,17,18
A,f,14.8,44,4400,16,26,29
a,F,15.2,46,4100,27,13,27
B,M,15.5,48,8400,34,42,36
A,M,15.2,47,,26,27,22
B,F,16.9,50,5100,28,17,23
a,F,14.8,44,4700,24,20,23
A,M,16.2,45,5600,26,25,19
b,M,14.7,43,4000,23,13,17
A,F,14.7,42,3400,9,22,13
B,m,16.5,45,5400,18,32,17
B,f,15.4,45,6900,28,36,24
B,M,15.1,45,4600,,29,17
a,M,14.2,46,4200,14,25,28
b,F,15.9,46,5200,8,34,16
B,f,16,47,4700,25,14,18
A,F,17.4,50,8600,37,,17
A,M,14.3,43,5500,20,31,19
B,F,14.8,44,4200,15,24,19
a,f,15.5,,4900,17,27,16
a,m,26.5,,5200,19,32,21
I see what you now mean with the double commas, but I'm not sure how to handle that when it comes to importing the data.
The "double commas" you can account for by using the dsd option on your INFILE statement.
The easy way to account for lower/upper cases is to use the $UPCASE. informat.
That way, your program would be something like:
data Hospital; infile 'D:\hospital.csv' delimiter= ',' dsd firstobs=2; length hosp gender $1; informat hosp gender $upcase.; input hosp $ gender $ haemo pcv wbc lympho neutro lead; run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.