Hi Everyone,
I'm new to sas, and I am trying to enter a data set, where some of the character data contain spaces, apostrophes, and a dash. I'm not sure how to do this, and i keep getting a weird output. I have tried to use both the CHAR and VARYING informats, but i could be doing it wrong. I have provided both the code that i've tried and the output im getting below. I am required to do this using datalines.
DATA cancer;
INPUT type $char. new_cases yearly_deaths survival percent3.;
DATALINES;
breast 271270 42260 90%
lung 228150 142670 23%
prostate 164690 29430 98%
colorectal 145600 51020 64%
melanoma 96480 7230 92%
bladder 80470 17670 77%
non-hodgkin's lymphoma 74200 19970 71%
kidney 73820 14770 75%
endometrial 61880 12160 84%
leukemia 61780 22840 61%
pancreatic 56770 45750 9%
thyroid 52070 2170 99%
liver 42030 31780 18%
;
RUN;
PROC PRINT data=cancer;
RUN;
gives me
The SAS System |
Obs | type | new_cases | yearly_deaths | survival |
1 | breast | 271270 | 42260 | 0.90 |
2 | lung 22 | 8150 | 142670 | 0.23 |
3 | prostat | . | 164690 | 294.00 |
4 | colorec | . | 145600 | 510.00 |
5 | melanom | . | 96480 | 723.00 |
6 | bladder | 80470 | 17670 | 0.77 |
7 | non-hod | . | . | 742.00 |
8 | kidney | 73820 | 14770 | 0.75 |
9 | endomet | . | 61880 | 121.00 |
10 | leukemi | . | 61780 | 228.00 |
11 | pancrea | . | 56770 | 457.00 |
12 | thyroid | 52070 | 2170 | 0.99 |
13 | liver 4 | 2030 | 31780 | 0.18 |
I've also tried putting "$VARYING." as well as entering DLM=',' and separating the data by commas, which made it worse.
If anyone knows what i should enter or has any advice on how to handle this is would be greatly appreciate.
Thanks
Hi @Damon1 Great effort, I'm very impressed with your clear leads that i noticed in the way you attempted. Welcome to SAS communities. You were close and it's a piece of cake. A little help from us. Have fun!
data want;
input @;
_n_=anydigit(_infile_)-1;
INPUT type $varying32. _n_
new_cases yearly_deaths survival :percent3.;
format survival percent.;
DATALINES;
breast 271270 42260 90%
lung 228150 142670 23%
prostate 164690 29430 98%
colorectal 145600 51020 64%
melanoma 96480 7230 92%
bladder 80470 17670 77%
non-hodgkin's lymphoma 74200 19970 71%
kidney 73820 14770 75%
endometrial 61880 12160 84%
leukemia 61780 22840 61%
pancreatic 56770 45750 9%
thyroid 52070 2170 99%
liver 42030 31780 18%
;
RUN;
Hi @Damon1 Great effort, I'm very impressed with your clear leads that i noticed in the way you attempted. Welcome to SAS communities. You were close and it's a piece of cake. A little help from us. Have fun!
data want;
input @;
_n_=anydigit(_infile_)-1;
INPUT type $varying32. _n_
new_cases yearly_deaths survival :percent3.;
format survival percent.;
DATALINES;
breast 271270 42260 90%
lung 228150 142670 23%
prostate 164690 29430 98%
colorectal 145600 51020 64%
melanoma 96480 7230 92%
bladder 80470 17670 77%
non-hodgkin's lymphoma 74200 19970 71%
kidney 73820 14770 75%
endometrial 61880 12160 84%
leukemia 61780 22840 61%
pancreatic 56770 45750 9%
thyroid 52070 2170 99%
liver 42030 31780 18%
;
RUN;
This worked perfectly 🙂 Thanks so much!!!
If
then you can find the position (FN) of the first numeric character, then transcribe characters 1 through FN-1 into TYPE, and (starting at position FN) use in INPUT statement to get all the other variables:
DATA cancer (drop=fn);
input @;
fn=anydigit(_infile_);
length type $40;
type=substr(_infile_,1,fn-1);
INPUT @fn new_cases yearly_deaths survival percent3.;
DATALINES;
breast 271270 42260 90%
lung 228150 142670 23%
prostate 164690 29430 98%
colorectal 145600 51020 64%
melanoma 96480 7230 92%
bladder 80470 17670 77%
non-hodgkin's lymphoma 74200 19970 71%
kidney 73820 14770 75%
endometrial 61880 12160 84%
leukemia 61780 22840 61%
pancreatic 56770 45750 9%
thyroid 52070 2170 99%
liver 42030 31780 18%
;
The "trick" here is the bald INPUT statement which does nothing to transfer the input data line to automatic variable _INFILE_. Then the ANYDIGIT function finds the position of the first number. The SUBSTR function copies the first FN-1 characters to TYPE. Then the INPUT function, starting at position FN, reads the rest.
The trailing "@" in the first INPUT statement is essential. Otherwise the next INPUT would read from the next line, instead of from the current line (i.e. from the current _INFILE_ content).
I assume that TYPE is no more the 40 characters long.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.