Hello,
I want to import a text file data into SAS. The file contain some issues that you will see in my example.
Some of the data is missing and the data is not well organized as you can see. Student_id and d_pin are character variable. I would like to learn how to use either infile or proc import or both to import the data CORRECTLY.
Here is sample code. I have over 10,000 records.
Student_ID|D_PIN|SERVICE_DATE
|1624413098|11/08/2023
|1628543409|11/08/2023
001020084|1615413808|10/12/2023
001021665|1616220906|03/31/2023
002781314|1462647405|07/19/2023
|1331118504|11/02/2023
|1611325610|12/26/2023
080504901||11/08/2023
Here is my code:
data STUDENT; *;
infile 'c:\dog\dummydata.txt' truncover delimiter = '|' firstobs=2;
format SERVICE_DATE mmddyy10.;
input Student_ID $ 1-9 D_PIN $ 11-20 SERVICE_DATE :mmddyy10.;
run;
I am getting many of the notes below. When I checked it seems it's because the original text file is not well arranged.
NOTE: Invalid data for SERVICE_DATE in line 1756 21-22.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
1756 |1331118504|11/02/2023 22
SERVICE_DATE=. Student_ID=|13311185 D_PIN=4|11/02/20 _ERROR_=1 _N_=1755
NOTE: Invalid data for SERVICE_DATE in line 1757 21-22.
1757 |1611325610|12/26/2023 22
Thanks
I presume the first two data lines in your sample actually have no student_id values, yes?
You have to make a couple of changes.
data STUDENT;
* infile 'c:\dog\dummydata.txt' truncover delimiter = '|' firstobs=2;
infile datalines delimiter = '|' dsd firstobs=2;
informat student_id $9. D_PIN $10. service_date mmddyy10. ;
format SERVICE_DATE mmddyy10.;
* input Student_ID $ 1-9 D_PIN $ 11-20 SERVICE_DATE :mmddyy10.;
input Student_ID D_PIN SERVICE_DATE ;
datalines;
Student_ID|D_PIN|SERVICE_DATE
|1624413098|11/08/2023
|1628543409|11/08/2023
001020084|1615413808|10/12/2023
001021665|1616220906|03/31/2023
002781314|1462647405|07/19/2023
|1331118504|11/02/2023
|1611325610|12/26/2023
080504901||11/08/2023
run;
I've preceded each of my additions with your analogous statement, commented out.
Notice the INPUT statement merely lists variable names, not locations, which means INPUT will utilize expected formats in the INFORMAT statement.
I presume the first two data lines in your sample actually have no student_id values, yes?
You have to make a couple of changes.
data STUDENT;
* infile 'c:\dog\dummydata.txt' truncover delimiter = '|' firstobs=2;
infile datalines delimiter = '|' dsd firstobs=2;
informat student_id $9. D_PIN $10. service_date mmddyy10. ;
format SERVICE_DATE mmddyy10.;
* input Student_ID $ 1-9 D_PIN $ 11-20 SERVICE_DATE :mmddyy10.;
input Student_ID D_PIN SERVICE_DATE ;
datalines;
Student_ID|D_PIN|SERVICE_DATE
|1624413098|11/08/2023
|1628543409|11/08/2023
001020084|1615413808|10/12/2023
001021665|1616220906|03/31/2023
002781314|1462647405|07/19/2023
|1331118504|11/02/2023
|1611325610|12/26/2023
080504901||11/08/2023
run;
I've preceded each of my additions with your analogous statement, commented out.
Notice the INPUT statement merely lists variable names, not locations, which means INPUT will utilize expected formats in the INFORMAT statement.
File looks well arranged to me. You have a DELIMITED file. So tell SAS that in the INFILE statement by adding DSD option.
First let's convert your example lines into an actual file so we have something we can test agains.
options parmcards=text;
filename text temp;
parmcards;
Student_ID|D_PIN|SERVICE_DATE
|1624413098|11/08/2023
|1628543409|11/08/2023
001020084|1615413808|10/12/2023
001021665|1616220906|03/31/2023
002781314|1462647405|07/19/2023
|1331118504|11/02/2023
|1611325610|12/26/2023
080504901||11/08/2023
;
Now we can start writing a data step to read the file. First is the DATA statement to name the dataset that will be created. Then the INFILE statement to tell where to find the data to be READ . Then define the variables. Then attach INFORMAT to any variables that NEED them. And attach FORMAT to any variables that NEED them. Then INPUT the variables, if you have defined the variables in the order they appear in the source file then the INPUT statement can just use a simple positional variable list. Then end the step with a RUN (or if you are using in-line data a CARDS or DATALINES block).
data want;
infile text dsd dlm='|' truncover firstobs=2;
length Student_ID $10 D_PIN $10 SERVICE_DATE 8;
informat SERVICE_DATE mmddyy.;
format SERVICE_DATE yymmdd10.;
input Student_ID -- SERVICE_DATE ;
run;
Result
HInts.
@mkeintz @Tom Thank you. I found something interesting this morning. I noticed there was a particular row in the txt file I imported with only the service date. e.g.,
Student_ID|D_PIN|SERVICE_DATE
|12/04/2023
The student_ID and D_PIN are both blank in the text file. I just noticed that this line did not output correctly with the codes provided by @mkeintz .
This is what the output looks like. The date is displayed on the D_PIN column
The problem with that line is it only has TWO values instead of THREE because there is only one DELIMITER.
Look carefully at the whole file with a text editor that displays text using a FIXED width font. Do the values of SERVICE_DATE always start in the same column? If so you can read the file as FIXED columns instead of as DELIMITED.
If instead there are not extra spaces to pad the short or missing values so that D_PIN and SERVICE_DATE always start in the same place you will have to treat it as delimited.
If just a few have this issue of missing delimiters and they all follow this same pattern then you could possible adapt the code to test how many delimiters there are and read the line differently depending on the test.
Something like this:
data want;
infile 'myfile.txt' dsd dlm='|' truncover;
length Student_ID D_PIN $10 SERVICE_DATE 8;
input @;
if countc(_infile_,'|') = 2 then input Student_ID :$10. D_PIN :$10. SERVICE_DATE :mmddyy.;
else input Student_ID SERVICE_DATE :mmddyy.;
format service_date yymmdd10.;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.