BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
BenTN
Fluorite | Level 6

As the name suggests I have fixed width files where a variable exists across non-consective positions.

here is an example:

JOHN______DOE____________0000101000312241920

First Name: John  (positions 1-10)
Last Name: Doe    (positions 11-25)
Soundex of last name: D000 (position 11, and 26-28)
Date of birth: 01/01/1900 (positions 29-34, but the century comes position 41-42)
Date of death: 03/12/2024 (position 35-40, but century comes from 43-44)

Currently I am addressing this problem by doing something like the code below, but is there a way to get the input statement to read a single variable across nonconsecutive positions?

Something like the second example.

CURRENT SOLUTION

DATA example;
	INFILE cards;
	Length FirstName $10.
		   LastName $15.
		   Soundex $4.
		   DOB 8.
		   DOD 8.;
	INPUT @1 FirstName $10.
		  @11 LastName $15.
		  @26 Soundex  $3.
		  @29 DOBMO $2.
		  @31 DOBDAY $2.
		  @41 DOBCC $2.
		  @33 DOBYY $2.
		  @35 DODMO $2.
		  @37 DODDAY $2.
		  @43 DODCC $2.
		  @39 DODYY $2.
		  ;

	SOUNDEX = SUBSTR(LASTNAME,1,1)||SOUNDEX;
	DOB = MDY(DOBMO,DOBDAY,DOBCC||DOBYY);
	DOD = MDY(DODMO,DODDAY,DODCC||DODYY);

	FORMAT DOB DOD MMDDYY10.;
	KEEP FIRSTNAME LASTNAME SOUNDEX DOB DOD;

DATALINES;
JOHN      DOE            0000101000312241920
;


AIMED FOR SOLUTION

DATA example;
	INFILE cards;
	INPUT FirstName $ 1-10
		  LastName $ 11-25
		  Soundex  $ 11, 26-28
		  DOB mmddyy10. 29-32,41-42,33-34
		  DOD mmddyy10. 35-38,43-44,39-40
		  ;

DATALINES;
JOHN     DOE            0000101000312241920
;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Short answer: NO.

You could use the _INFILE_ variable that holds the whole line of input, substr it multiple times to get the pieces you need and input the resulting string but not really worth the headache.

 

Read in the date parts as day, month and year as separate numeric variables. To create YEAR usable by the MDY function, which is the right way to go, use Century*100+Year (assuming 2 digit). The || operator the way you are currently has way to much chance of creating something that might not convert to a number in the automatic conversion SAS has to do with that many variables.

 

However, if your dates are in the range that your SAS system option YEARCUTOFF will handle correctly you may be able to ignore the century part. You might want to run this code:

proc options option=yearcutoff;
run;

and then look in the Log. When I ran that today I get:

 YEARCUTOFF=1940   Specifies the first year of a 100-year span that is used by date informats and
                   functions to read a two-digit year.

What that means is any 2-digit year between 00 and 39 is treated as 2000 to 2039. Example

(with yearcutoff=1940)

data example;
   input date :mmddyy6.;
   format date date9.;
datalines;
010101
010139
010140
010199
;

which yields:

     date

01JAN2001
01JAN2039
01JAN1940
01JAN1999

If the earliest date you need is 1925 you might set OPTIONS YEARCUTOFF=1925; and the dates with the MMDDYY6. informat.

 

After you have confirmed that the dates are building correctly then drop all of the day, month, year and century variables.

 

When I see Date of Birth (or just about any date) values of 01/01/1900 the first thing I do is ask the source if that is actually a valid date or is that just there stupid approach to handling missing data. If it is supposed to be missing. At which point you really want to add something like:

If DOB='01JAN1900'd then call missing(DOB);

to set a proper missing. Similar if someone sneaks in a Century/Year combination of 9999.

 

Do you really want a bunch of underscore values in your name variables? Or are you going to COMPRESS them away?

 

Since SAS has a SOUNDEX function I might pick a different variable name just avoid any confusion that the value you have might have come from that function.

 

 

View solution in original post

4 REPLIES 4
ballardw
Super User

Short answer: NO.

You could use the _INFILE_ variable that holds the whole line of input, substr it multiple times to get the pieces you need and input the resulting string but not really worth the headache.

 

Read in the date parts as day, month and year as separate numeric variables. To create YEAR usable by the MDY function, which is the right way to go, use Century*100+Year (assuming 2 digit). The || operator the way you are currently has way to much chance of creating something that might not convert to a number in the automatic conversion SAS has to do with that many variables.

 

However, if your dates are in the range that your SAS system option YEARCUTOFF will handle correctly you may be able to ignore the century part. You might want to run this code:

proc options option=yearcutoff;
run;

and then look in the Log. When I ran that today I get:

 YEARCUTOFF=1940   Specifies the first year of a 100-year span that is used by date informats and
                   functions to read a two-digit year.

What that means is any 2-digit year between 00 and 39 is treated as 2000 to 2039. Example

(with yearcutoff=1940)

data example;
   input date :mmddyy6.;
   format date date9.;
datalines;
010101
010139
010140
010199
;

which yields:

     date

01JAN2001
01JAN2039
01JAN1940
01JAN1999

If the earliest date you need is 1925 you might set OPTIONS YEARCUTOFF=1925; and the dates with the MMDDYY6. informat.

 

After you have confirmed that the dates are building correctly then drop all of the day, month, year and century variables.

 

When I see Date of Birth (or just about any date) values of 01/01/1900 the first thing I do is ask the source if that is actually a valid date or is that just there stupid approach to handling missing data. If it is supposed to be missing. At which point you really want to add something like:

If DOB='01JAN1900'd then call missing(DOB);

to set a proper missing. Similar if someone sneaks in a Century/Year combination of 9999.

 

Do you really want a bunch of underscore values in your name variables? Or are you going to COMPRESS them away?

 

Since SAS has a SOUNDEX function I might pick a different variable name just avoid any confusion that the value you have might have come from that function.

 

 

BenTN
Fluorite | Level 6

Thank you.

I had hoped with the versatility in the input statement of adjusting the read pointer in a lot of ways I was hoping there was a way of doing so within the read of a single variable. But its good to know there is not a way of doing so.

Thank you to you and the several others who discussed other ways of addressing soundex's and dates. While the specific example I created was more to clarify the question. These comments are quite helpful as we continue to undertake this project of cleaning up our internal library of code.

Tom
Super User Tom
Super User

Your descriptions conflict.  In one place you imply that the date strings have slashes in them, but in the example line and the number of columns you mention there is no place to have slashes.

 

Your first approach is right.  Since the information about the dates is in TWO different locations you need to read it as at least two separate strings.  It would read them as 3 strings: MMDD, YY and CC then use CATS() and INPUT() to convert that to an actual date value.

DATA example;
  INFILE DATALINES;
  length FirstName $10 LastName $15 Soundex $4 DOB DOD 8;
  INPUT FirstName $ 1-10 Lastname $ 11-25
        Soundex $ 26-28 
        DOBMMDD $ 29-32 DOBYY $ 33-34 DOBCC $ 41-42
        DODMMDD $ 35-38 DODYY $ 39-40 DODCC $ 43-44
  ;
  soundex=char(lastname,1)||soundex;
  dob=input(cats(dobmmdd,dobcc,dobyy),mmddyy8.);
  dod=input(cats(dodmmdd,dodcc,dodyy),mmddyy8.);
  format dob dod yymmdd10.;
  drop dobmmdd -- dodcc ;
DATALINES;
JOHN      DOE            0000101000312241920
;

Results

       First    Last
Obs    Name     Name    Soundex           DOB           DOD

 1     JOHN     DOE      D000      1900-01-01    2024-03-12

 

mkeintz
PROC Star

If you list the DOB components in the input statement in the order month/day/century/year (regardless of their positions in the raw data) you can use a compact list of those variables as the argument to an INPUT function:

 

DATA example (drop=_:);
  INFILE datalines;
  Length FirstName $10.  LastName $15.
         Soundex $4.     DOB 8.    DOD 8.;

  INPUT   @1 FirstName $10.   @11 LastName $15.    @26 Soundex $3.
         @29 _DOBMO $2.  @31 _DOBDAY $2.  @41 _DOBCC $2.  @33 _DOBYY $2.
         @35 _DODMO $2.  @37 _DODDAY $2.  @43 _DODCC $2.  @39 _DODYY $2.  ;

  SOUNDEX = char(LASTNAME,1)||SOUNDEX;
  DOB = input(cats(of _dob:),mmddyy8.);
  DOD = input(cats(of _dod:),mmddyy8.);

  FORMAT DOB DOD MMDDYY10.;
DATALINES;
JOHN      DOE            0000101000312241920
run;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 373 views
  • 0 likes
  • 4 in conversation