Question:
txt file name:ABC
LineA 1022 John Chan
LineB 2/1/2021 23,000 42
LineA 3012 Susan Kwan
LineB 16/1/2021 12,000 38
LineA 1120 Tom Wong
LineB 13/01/2021 9,000 .
LineA 2401 Judy Wang
LineB 12/01/2021 135,000.5 59
LineA 2401 George Cheung
LineB 12/01/2021 . 61
I don't know how to import/infile correctly
Name can't read all
AppDate can't be changed into date form
income will be missing
Age also can't be read
What is the desired result?
Are there ALWAYS two rows for each person; there is never one row for a person and there are never three (or more) rows for a person?
it is two rows for each person.
I only need to read this txt correctly just mean finish task1.
I have read this code:
and output
but it is not correct because it should be Name rather than FName and SName.
Also, Income and Age can't read success.
(Actually, AppDate should be date in form of DD/MM/YYYY.)
I think i can finish other after read txt correctly.
data have;
input string $200.;
infile cards truncover;
cards;
LineA 1022 John Chan
LineB 2/1/2021 23,000 42
LineA 3012 Susan Kwan
LineB 16/1/2021 12,000 38
LineA 1120 Tom Wong
LineB 13/01/2021 9,000 .
LineA 2401 Judy Wang
LineB 12/01/2021 135,000.5 59
LineA 2401 George Cheung
LineB 12/01/2021 . 61
;
no, I don't need to use datafile. I need to read into SAS data set.
Also, LineA and LineB is not necessary.
I want the output is this:
But my result now:
data have;
input @7 cno name $char30. / @7 appdate ddmmyy10. income :comma12. age;
format appdate ddmmyys8.;
cards;
LineA 1022 John Chan
LineB 2/1/2021 23,000 42
LineA 3012 Susan Kwan
LineB 16/1/2021 12,000 38
LineA 1120 Tom Wong
LineB 13/01/2021 9,000 .
LineA 2401 Judy Wang
LineB 12/01/2021 135,000.5 59
LineA 2401 George Cheung
LineB 12/01/2021 . 61
;
Based on your code:
Output:
Problems:
1. Name is missing
2. first row Income is wrong
From the inputs you provided, it works for me properly. So something in the inputs you provided does not match the actual file loan_app.txt.
I just copy and paste the txt file
but i have correct something
OUTPUT:
Only the first Income is wrong which should be 23,000
Add a colon to the INPUT statement before ddmmyy10.
input @7 cno name $char30. / @7 appdate :ddmmyy10. income :comma12. age;
You could try this one :
data have; infile cards dlm=' '; input dummy1 : $20. cno name & $30. / dummy2 : $20. appdate : ddmmyy10. income :comma12. age; format appdate ddmmyys8.; cards; LineA 1022 John Chan LineB 2/1/2021 23,000 42 LineA 3012 Susan Kwan LineB 16/1/2021 12,000 38 LineA 1120 Tom Wong LineB 13/01/2021 9,000 . LineA 2401 Judy Wang LineB 12/01/2021 135,000.5 59 LineA 2401 George Cheung LineB 12/01/2021 . 61 ;
One way to deal with lines that have different fields based on their "type" is to use trailing @ on the INPUT so you can execute different INPUT statements to read the rest of the line.
First let's create a file with your test data.
options parmcards=example;
filename example temp;
parmcrds;
LineA 1022 John Chan
LineB 2/1/2021 23,000 42
LineA 3012 Susan Kwan
LineB 16/1/2021 12,000 38
LineA 1120 Tom Wong
LineB 13/01/2021 9,000 .
LineA 2401 Judy Wang
LineB 12/01/2021 135,000.5 59
LineA 2401 George Cheung
LineB 12/01/2021 . 61
;
Now let's use an INFILE statement with the FILEREF we just created that points to that text.
data want ;
infile example truncover ;
input type $ @ ;
if type='LineA' then input cno $ name $30. ;
else if type='LineB' then do;
input appdate :ddmmyy. income :comma. age ;
output;
call missing(of _all_);
end;
retain cno name;
format appdate yymmdd10.;
drop type;
run;
Results:
Obs cno name appdate income age 1 1022 John Chan 2021-01-02 23000.0 42 2 3012 Susan Kwan 2021-01-16 12000.0 38 3 1120 Tom Wong 2021-01-13 9000.0 . 4 2401 Judy Wang 2021-01-12 135000.5 59 5 2401 George Cheung 2021-01-12 . 61
So what is the code doing?
DATA statement names the dataset the step is going to create, WANT in this case.
INFILE statement points to the file to read. I am using a fileref here, but you could also just use the actual name of the file in quotes instead. The TRUNCOVER option says to not go to the next line to hunt for data if there are not enough values to satisfy the INPUT statement. Note the MISSOVER option is almost never wanted as the only different between it and TRUNCOVER is that MISSOVER will ignore text at the end of the line if it happens to be shorter than the width of the formatted input request. TRUNCOVER will instead use that "truncated" text.
The first INPUT statement is reading that first column of text in your file so it can use that to tell what type of line it is.
Then the IF/THEN/ELSE is deciding which code to run based on the TYPE read.
When the type is LineA it reads the CNO (note this looks like an identifier and not an actual number so read it as character) and NAME. For the CNO I just used the bare $ modifier on the INPUT statement to let SAS know I want it to be character. SAS will default the length to 8 bytes. For the name I use formatted input so I can read the whole name into one field. In real life names do not come in neat first/last single word pairs. By using formatted input it will read the space into the NAME instead of treating it as a delimiter.
For LineB we want to do more than one statement so we use a DO/END block. For two of the numbers on the line we need to use a special informat to read them. But we want the INPUT to still use the space as a delimiter so we prefix the informat with the : modifier. Note that we don't need to specify the width of the informat here as it will be ignored anyway. The AGE does not need any special informat to be read. The OUTPUT statement will write the observation the dataset at this point. The CALL MISSING will clear out the values retain from the the LineA records, just in case there are any ids' that only have LineB without LineA.
The RETAIN statement says to remember the values read on the LineA line so they are still there when we read the LineB line.
The FORMAT statement says to attach a special format to the appdate variable so humans will recognize the values are dates. Note that I used YMD order on the dates. It you use DMY or MDY order then either way half of your audience will be confused. The other variables do not need formats attached to them, SAS already knows how to display numbers and strings. But you might want to attach a format to INCOME so it will always show two decimal places. Use the COMMA or DOLLAR format if you want the thousands separator to print. Make sure to include those punctuation characters when calculating what width to use on the format.
The DROP statement says we don't want to have the TYPE variable on the output dataset.
The RUN statement let's SAS know the step is done being described and it can start trying to interpret the code and run it.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.