As the name suggests I have fixed width files where a variable exists across non-consective positions.
here is an example:
JOHN______DOE____________0000101000312241920
First Name: John (positions 1-10)
Last Name: Doe (positions 11-25)
Soundex of last name: D000 (position 11, and 26-28)
Date of birth: 01/01/1900 (positions 29-34, but the century comes position 41-42)
Date of death: 03/12/2024 (position 35-40, but century comes from 43-44)
Currently I am addressing this problem by doing something like the code below, but is there a way to get the input statement to read a single variable across nonconsecutive positions?
Something like the second example.
CURRENT SOLUTION
DATA example; INFILE cards; Length FirstName $10. LastName $15. Soundex $4. DOB 8. DOD 8.; INPUT @1 FirstName $10. @11 LastName $15. @26 Soundex $3. @29 DOBMO $2. @31 DOBDAY $2. @41 DOBCC $2. @33 DOBYY $2. @35 DODMO $2. @37 DODDAY $2. @43 DODCC $2. @39 DODYY $2. ; SOUNDEX = SUBSTR(LASTNAME,1,1)||SOUNDEX; DOB = MDY(DOBMO,DOBDAY,DOBCC||DOBYY); DOD = MDY(DODMO,DODDAY,DODCC||DODYY); FORMAT DOB DOD MMDDYY10.; KEEP FIRSTNAME LASTNAME SOUNDEX DOB DOD; DATALINES; JOHN DOE 0000101000312241920 ;
AIMED FOR SOLUTION
DATA example; INFILE cards; INPUT FirstName $ 1-10 LastName $ 11-25 Soundex $ 11, 26-28 DOB mmddyy10. 29-32,41-42,33-34 DOD mmddyy10. 35-38,43-44,39-40 ; DATALINES; JOHN DOE 0000101000312241920 ;
Short answer: NO.
You could use the _INFILE_ variable that holds the whole line of input, substr it multiple times to get the pieces you need and input the resulting string but not really worth the headache.
Read in the date parts as day, month and year as separate numeric variables. To create YEAR usable by the MDY function, which is the right way to go, use Century*100+Year (assuming 2 digit). The || operator the way you are currently has way to much chance of creating something that might not convert to a number in the automatic conversion SAS has to do with that many variables.
However, if your dates are in the range that your SAS system option YEARCUTOFF will handle correctly you may be able to ignore the century part. You might want to run this code:
proc options option=yearcutoff; run;
and then look in the Log. When I ran that today I get:
YEARCUTOFF=1940 Specifies the first year of a 100-year span that is used by date informats and functions to read a two-digit year.
What that means is any 2-digit year between 00 and 39 is treated as 2000 to 2039. Example
(with yearcutoff=1940)
data example; input date :mmddyy6.; format date date9.; datalines; 010101 010139 010140 010199 ;
which yields:
date 01JAN2001 01JAN2039 01JAN1940 01JAN1999
If the earliest date you need is 1925 you might set OPTIONS YEARCUTOFF=1925; and the dates with the MMDDYY6. informat.
After you have confirmed that the dates are building correctly then drop all of the day, month, year and century variables.
When I see Date of Birth (or just about any date) values of 01/01/1900 the first thing I do is ask the source if that is actually a valid date or is that just there stupid approach to handling missing data. If it is supposed to be missing. At which point you really want to add something like:
If DOB='01JAN1900'd then call missing(DOB);
to set a proper missing. Similar if someone sneaks in a Century/Year combination of 9999.
Do you really want a bunch of underscore values in your name variables? Or are you going to COMPRESS them away?
Since SAS has a SOUNDEX function I might pick a different variable name just avoid any confusion that the value you have might have come from that function.
Short answer: NO.
You could use the _INFILE_ variable that holds the whole line of input, substr it multiple times to get the pieces you need and input the resulting string but not really worth the headache.
Read in the date parts as day, month and year as separate numeric variables. To create YEAR usable by the MDY function, which is the right way to go, use Century*100+Year (assuming 2 digit). The || operator the way you are currently has way to much chance of creating something that might not convert to a number in the automatic conversion SAS has to do with that many variables.
However, if your dates are in the range that your SAS system option YEARCUTOFF will handle correctly you may be able to ignore the century part. You might want to run this code:
proc options option=yearcutoff; run;
and then look in the Log. When I ran that today I get:
YEARCUTOFF=1940 Specifies the first year of a 100-year span that is used by date informats and functions to read a two-digit year.
What that means is any 2-digit year between 00 and 39 is treated as 2000 to 2039. Example
(with yearcutoff=1940)
data example; input date :mmddyy6.; format date date9.; datalines; 010101 010139 010140 010199 ;
which yields:
date 01JAN2001 01JAN2039 01JAN1940 01JAN1999
If the earliest date you need is 1925 you might set OPTIONS YEARCUTOFF=1925; and the dates with the MMDDYY6. informat.
After you have confirmed that the dates are building correctly then drop all of the day, month, year and century variables.
When I see Date of Birth (or just about any date) values of 01/01/1900 the first thing I do is ask the source if that is actually a valid date or is that just there stupid approach to handling missing data. If it is supposed to be missing. At which point you really want to add something like:
If DOB='01JAN1900'd then call missing(DOB);
to set a proper missing. Similar if someone sneaks in a Century/Year combination of 9999.
Do you really want a bunch of underscore values in your name variables? Or are you going to COMPRESS them away?
Since SAS has a SOUNDEX function I might pick a different variable name just avoid any confusion that the value you have might have come from that function.
Thank you.
I had hoped with the versatility in the input statement of adjusting the read pointer in a lot of ways I was hoping there was a way of doing so within the read of a single variable. But its good to know there is not a way of doing so.
Thank you to you and the several others who discussed other ways of addressing soundex's and dates. While the specific example I created was more to clarify the question. These comments are quite helpful as we continue to undertake this project of cleaning up our internal library of code.
Your descriptions conflict. In one place you imply that the date strings have slashes in them, but in the example line and the number of columns you mention there is no place to have slashes.
Your first approach is right. Since the information about the dates is in TWO different locations you need to read it as at least two separate strings. It would read them as 3 strings: MMDD, YY and CC then use CATS() and INPUT() to convert that to an actual date value.
DATA example;
INFILE DATALINES;
length FirstName $10 LastName $15 Soundex $4 DOB DOD 8;
INPUT FirstName $ 1-10 Lastname $ 11-25
Soundex $ 26-28
DOBMMDD $ 29-32 DOBYY $ 33-34 DOBCC $ 41-42
DODMMDD $ 35-38 DODYY $ 39-40 DODCC $ 43-44
;
soundex=char(lastname,1)||soundex;
dob=input(cats(dobmmdd,dobcc,dobyy),mmddyy8.);
dod=input(cats(dodmmdd,dodcc,dodyy),mmddyy8.);
format dob dod yymmdd10.;
drop dobmmdd -- dodcc ;
DATALINES;
JOHN DOE 0000101000312241920
;
Results
First Last Obs Name Name Soundex DOB DOD 1 JOHN DOE D000 1900-01-01 2024-03-12
If you list the DOB components in the input statement in the order month/day/century/year (regardless of their positions in the raw data) you can use a compact list of those variables as the argument to an INPUT function:
DATA example (drop=_:);
INFILE datalines;
Length FirstName $10. LastName $15.
Soundex $4. DOB 8. DOD 8.;
INPUT @1 FirstName $10. @11 LastName $15. @26 Soundex $3.
@29 _DOBMO $2. @31 _DOBDAY $2. @41 _DOBCC $2. @33 _DOBYY $2.
@35 _DODMO $2. @37 _DODDAY $2. @43 _DODCC $2. @39 _DODYY $2. ;
SOUNDEX = char(LASTNAME,1)||SOUNDEX;
DOB = input(cats(of _dob:),mmddyy8.);
DOD = input(cats(of _dod:),mmddyy8.);
FORMAT DOB DOD MMDDYY10.;
DATALINES;
JOHN DOE 0000101000312241920
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.