Hi Everyone,
I'm new to sas, and I am trying to enter a data set, where some of the character data contain spaces, apostrophes, and a dash. I'm not sure how to do this, and i keep getting a weird output. I have tried to use both the CHAR and VARYING informats, but i could be doing it wrong. I have provided both the code that i've tried and the output im getting below. I am required to do this using datalines.
DATA cancer;
INPUT type $char. new_cases yearly_deaths survival percent3.;
DATALINES;
breast 271270 42260 90%
lung 228150 142670 23%
prostate 164690 29430 98%
colorectal 145600 51020 64%
melanoma 96480 7230 92%
bladder 80470 17670 77%
non-hodgkin's lymphoma 74200 19970 71%
kidney 73820 14770 75%
endometrial 61880 12160 84%
leukemia 61780 22840 61%
pancreatic 56770 45750 9%
thyroid 52070 2170 99%
liver 42030 31780 18%
;
RUN;
PROC PRINT data=cancer;
RUN;
gives me
The SAS System |
Obs | type | new_cases | yearly_deaths | survival |
1 | breast | 271270 | 42260 | 0.90 |
2 | lung 22 | 8150 | 142670 | 0.23 |
3 | prostat | . | 164690 | 294.00 |
4 | colorec | . | 145600 | 510.00 |
5 | melanom | . | 96480 | 723.00 |
6 | bladder | 80470 | 17670 | 0.77 |
7 | non-hod | . | . | 742.00 |
8 | kidney | 73820 | 14770 | 0.75 |
9 | endomet | . | 61880 | 121.00 |
10 | leukemi | . | 61780 | 228.00 |
11 | pancrea | . | 56770 | 457.00 |
12 | thyroid | 52070 | 2170 | 0.99 |
13 | liver 4 | 2030 | 31780 | 0.18 |
I've also tried putting "$VARYING." as well as entering DLM=',' and separating the data by commas, which made it worse.
If anyone knows what i should enter or has any advice on how to handle this is would be greatly appreciate.
Thanks
Hi @Damon1 Great effort, I'm very impressed with your clear leads that i noticed in the way you attempted. Welcome to SAS communities. You were close and it's a piece of cake. A little help from us. Have fun!
data want;
input @;
_n_=anydigit(_infile_)-1;
INPUT type $varying32. _n_
new_cases yearly_deaths survival :percent3.;
format survival percent.;
DATALINES;
breast 271270 42260 90%
lung 228150 142670 23%
prostate 164690 29430 98%
colorectal 145600 51020 64%
melanoma 96480 7230 92%
bladder 80470 17670 77%
non-hodgkin's lymphoma 74200 19970 71%
kidney 73820 14770 75%
endometrial 61880 12160 84%
leukemia 61780 22840 61%
pancreatic 56770 45750 9%
thyroid 52070 2170 99%
liver 42030 31780 18%
;
RUN;
Hi @Damon1 Great effort, I'm very impressed with your clear leads that i noticed in the way you attempted. Welcome to SAS communities. You were close and it's a piece of cake. A little help from us. Have fun!
data want;
input @;
_n_=anydigit(_infile_)-1;
INPUT type $varying32. _n_
new_cases yearly_deaths survival :percent3.;
format survival percent.;
DATALINES;
breast 271270 42260 90%
lung 228150 142670 23%
prostate 164690 29430 98%
colorectal 145600 51020 64%
melanoma 96480 7230 92%
bladder 80470 17670 77%
non-hodgkin's lymphoma 74200 19970 71%
kidney 73820 14770 75%
endometrial 61880 12160 84%
leukemia 61780 22840 61%
pancreatic 56770 45750 9%
thyroid 52070 2170 99%
liver 42030 31780 18%
;
RUN;
This worked perfectly 🙂 Thanks so much!!!
If
then you can find the position (FN) of the first numeric character, then transcribe characters 1 through FN-1 into TYPE, and (starting at position FN) use in INPUT statement to get all the other variables:
DATA cancer (drop=fn);
input @;
fn=anydigit(_infile_);
length type $40;
type=substr(_infile_,1,fn-1);
INPUT @fn new_cases yearly_deaths survival percent3.;
DATALINES;
breast 271270 42260 90%
lung 228150 142670 23%
prostate 164690 29430 98%
colorectal 145600 51020 64%
melanoma 96480 7230 92%
bladder 80470 17670 77%
non-hodgkin's lymphoma 74200 19970 71%
kidney 73820 14770 75%
endometrial 61880 12160 84%
leukemia 61780 22840 61%
pancreatic 56770 45750 9%
thyroid 52070 2170 99%
liver 42030 31780 18%
;
The "trick" here is the bald INPUT statement which does nothing to transfer the input data line to automatic variable _INFILE_. Then the ANYDIGIT function finds the position of the first number. The SUBSTR function copies the first FN-1 characters to TYPE. Then the INPUT function, starting at position FN, reads the rest.
The trailing "@" in the first INPUT statement is essential. Otherwise the next INPUT would read from the next line, instead of from the current line (i.e. from the current _INFILE_ content).
I assume that TYPE is no more the 40 characters long.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.