BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
wlierman
Lapis Lazuli | Level 10

This seems like an easy question, but the answer has eluded me for some time.

 

I have a column of DOB values in a SAS output dataset expressed a text ike

DOB

38407

29738

42385

37471

  ....           and so on.  I want to express these in data format like 08/14/2020 for example.

 

I approached it with this code

Data Arias.Merge_data_E_18 (drop = DOB_1);
  Set Arias.Merge_data_E_18; 
   DOB_1 = input (DOB, mmddyy8.);
run;

But the log came back

304  Data Arias.Merge_data_E_18 (drop = DOB_1);
305    Set Arias.Merge_data_E_18;
306     DOB_1 = input (DOB, mmddyy8.);
307  run;

NOTE: Invalid argument to function INPUT at line 306 column 12.
County=Linn Jurisdiction=Linn CHD Contact_number=000118e5-a3d5-ea11-a813-001dd8309200
_Do_Not_Modify__Row_Checksum=aF2JuLzWGLlUSpF32cWQi4+R2awa7Yzn3d6BiH+2JnyF6rREvdofoaNKFfl+EOEce30rHjSO2

<<repeated>>

City=Warm Springs Preferred_Method_of_Contact=Phone Email=  Mobile_Phone=541-460-2541
End_of_Quarantine_Period=08/28/2020 Language=  DOB_1=. _ERROR_=1 _N_=35
NOTE: Mathematical operations could not be performed at the following places. The results of the
      operations have been set to missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      23839 at 306:12
NOTE: There were 41280 observations read from the data set ARIAS.MERGE_DATA_E_18.
NOTE: The data set ARIAS.MERGE_DATA_E_18 has 41280 observations and 29 variables.
NOTE: Compressing data set ARIAS.MERGE_DATA_E_18 decreased size by 47.20 percent.
      Compressed is 245 pages; un-compressed would require 464 pages.
NOTE: DATA statement used (Total process time):
      real time           0.17 seconds
      cpu time            0.18 seconds

 I would appreciate your help again (not the first time today).

 

wlierman

1 ACCEPTED SOLUTION

Accepted Solutions
jimbarbour
Meteorite | Level 14

Here's a quick way to do it:

Data Arias.Merge_data_E_18 ;
  Set Arias.Merge_data_E_18; 
   FORMAT DOB mmddyys10.;
run;

After you run that code, you should see dates showing in your output data set in the format mm/dd/ccyy, for example:  08/14/2020.

 

Results using the date values you supplied:

Date_Formatted_2020-09-09_16-19-44.jpg

SAS shows you the dates in the format you want, but the data has not changed.  This is just a format.  What you see is NOT the actual value of the data but rather just a formatted display of the underlying value.

 

If you actually need a new variable, you could code the following:

Data Arias.Merge_data_E_18 ;
  Set Arias.Merge_data_E_18; 
   New_DOB = PUT(DOB, mmddyys10.);
run;

 

Results:

Date_New_Column_2020-09-09_16-19-44.jpg

Notice the difference in the icons for each column. 

  • In the first example, there was a little calendar icon for DOB.  The calendar icon indicates that the actual data is numeric but that it is being displayed using a date format.
  • In the second example, there is a "#" icon for DOB.  The "#" sign indicates a numeric variable without a date format.  The icon for New_DOB is an "A" icon.  The "A" icon indicates alphanumeric (character) data.  New_DOB represents an actual change to the data.  New_DOB is not a date variable but rather contains character contents as shown.  What you see is the actual value of the data.

 

General rules for PUT and INPUT:

When you want to change or show a numeric variable in a character format, use PUT.

When you want to change or show a character variable in a numeric format, use INPUT.

 

Your original example was using INPUT on a numeric variable.  The general rule is to use PUT on numeric variables and INPUT on character variables.

 

Jim

View solution in original post

19 REPLIES 19
jimbarbour
Meteorite | Level 14

Here's a quick way to do it:

Data Arias.Merge_data_E_18 ;
  Set Arias.Merge_data_E_18; 
   FORMAT DOB mmddyys10.;
run;

After you run that code, you should see dates showing in your output data set in the format mm/dd/ccyy, for example:  08/14/2020.

 

Results using the date values you supplied:

Date_Formatted_2020-09-09_16-19-44.jpg

SAS shows you the dates in the format you want, but the data has not changed.  This is just a format.  What you see is NOT the actual value of the data but rather just a formatted display of the underlying value.

 

If you actually need a new variable, you could code the following:

Data Arias.Merge_data_E_18 ;
  Set Arias.Merge_data_E_18; 
   New_DOB = PUT(DOB, mmddyys10.);
run;

 

Results:

Date_New_Column_2020-09-09_16-19-44.jpg

Notice the difference in the icons for each column. 

  • In the first example, there was a little calendar icon for DOB.  The calendar icon indicates that the actual data is numeric but that it is being displayed using a date format.
  • In the second example, there is a "#" icon for DOB.  The "#" sign indicates a numeric variable without a date format.  The icon for New_DOB is an "A" icon.  The "A" icon indicates alphanumeric (character) data.  New_DOB represents an actual change to the data.  New_DOB is not a date variable but rather contains character contents as shown.  What you see is the actual value of the data.

 

General rules for PUT and INPUT:

When you want to change or show a numeric variable in a character format, use PUT.

When you want to change or show a character variable in a numeric format, use INPUT.

 

Your original example was using INPUT on a numeric variable.  The general rule is to use PUT on numeric variables and INPUT on character variables.

 

Jim

wlierman
Lapis Lazuli | Level 10

Hi Jim,

 

Thank you for the spot-on solution to this problem.

 

I appreciate your quick response and you taking the time to answer.

 

Take care.

 

wlierman

wlierman
Lapis Lazuli | Level 10

Hi Jim,

I am confounded now!  I coded the data step like this

Data Arias.Merge_data_E_18;
  Set Arias.Merge_data_E_18; 
  New_DOB = PUT(DOB, MMDDYYS10.);
run;

I thought it was working but then I got the log back

324  Data Arias.Merge_data_E_18;
325    Set Arias.Merge_data_E_18;
326     New_DOB = PUT(DOB, mmddyys10.);
                           ----------
                           48
ERROR 48-59: The format $MMDDYYS was not found or could not be loaded.

327  run;

NOTE: The SAS System stopped processing this step because of errors.

Okay so I looked in the Step-by_Step Base SAS manual

read how to assign the MMDDYY10. format as

 

format DepartureDate mmddyy10.;        (using the var name form the book)

 

I tried it as

Data Arias.Merge_data_E_18;
  Set Arias.Merge_data_E_18; 
  Format DOB mmddyys10.;
  *New_DOB = PUT(DOB, MMDDYYS10.);
run;

The log returned

Data Arias.Merge_data_E_18;
356    Set Arias.Merge_data_E_18;
357    Format DOB mmddyys10.;
                  ----------
                  48
ERROR 48-59: The format $MMDDYYS was not found or could not be loaded.

358    *New_DOB = PUT(DOB, MMDDYYS10.);
359  run;

So I am stumped again.

Appreciate your assistance.

 

wlierman

jimbarbour
Meteorite | Level 14

SAS is trying to apply a character format.  I think that DOB may not be defined as numeric even if it contains numbers.

 

Can you run the below and post the results?

PROC CONTENTS DATA=Arias.Merge_data_E_18;
RUN;

If DOB is a number but the variable is defined as character, then you'd have to change your code as follows:

Data Arias.Merge_data_E_18;
  Set Arias.Merge_data_E_18; 
  New_DOB = PUT(INPUT(DOB, 5.), MMDDYYS10.);
run;

I think 5. should be an appropriate Informat, but try it and see if it works.  

 

Jim

wlierman
Lapis Lazuli | Level 10
Yes. The Proc contents

20 DOB Char 19 $19. $19. DOB
jimbarbour
Meteorite | Level 14

It's a little hard to read that, but I think that my surmise was correct that for whatever reason, DOB is being stored as character.

 

Please try this code:

Data Arias.Merge_data_E_18;
  Set Arias.Merge_data_E_18; 
  New_DOB = PUT(INPUT(DOB, 5.), MMDDYYS10.);
run;

The INPUT function will read in the value stored as character and change it to numeric storage.  The PUT function will then format the data stored numeric as numeric into your desired format.

 

Jim

wlierman
Lapis Lazuli | Level 10

I ran the second code block

Data Arias.Merge_data_E_18;
  Set Arias.Merge_data_E_18; 
  New_DOB = PUT(INPUT(DOB, 5.), MMDDYYS10.);
run;

and received the following log

 Data Arias.Merge_data_E_18;
363    Set Arias.Merge_data_E_18;
364    New_DOB = PUT(INPUT(DOB, 5.), MMDDYYS10.);
365  run;

NOTE: Invalid argument to function INPUT at line 364 column 17.
County=Jackson Jurisdiction=Jackson CHD Contact_number=dbac2a5f-0cd4-ea11-a813-001dd83096fe
_Do_Not_Modify__Row_Checksum=MvbQ4ucFSa06yXIUQuV9uanIv0tnSeKcliEG0lVvJeQwDITgs0d6E2l4rdktOUpsBptzwsZXA
R6WWLZisKos1w== _Do_Not_Modify__Modified_On=04SEP20:01:06:04 Person_ID=956838 Contact_ID=5028693
Case_ID=774942 _Full_Name=Josh Charleboix First_Name=Josh Middle_Name=  Last_Name=Charleboix
Contact_Attempt_Successful=Yes Created_On=01AUG20:08:33:54 Date_of_Last_Exposure=07/23/2020
_1st_Contact_Attempt=01AUG20:15:30:00 _2nd_Contact_Attempt=02AUG20:13:30:00 _3rd_Contact_Attempt=8
Owner=Natasha Ching DOB=04/03/1861 00:00:00 Monitoring_Status=Lost to Follow-up Status=Inactive
Address_1=411 berrydale Ave #9
MEDFORD, or 97501 City=MEDFORD Preferred_Method_of_Contact=Text
Email=joshcharleboix@gmail.com Mobile_Phone=541-973-5460 End_of_Quarantine_Period=08/06/2020
Language=English New_DOB=. _ERROR_=1 _N_=35443
NOTE: Mathematical operations could not be performed at the following places. The results of the
      operations have been set to missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      1 at 364:17
NOTE: There were 41280 observations read from the data set ARIAS.MERGE_DATA_E_18.
NOTE: The data set ARIAS.MERGE_DATA_E_18 has 41280 observations and 30 variables.
NOTE: Compressing data set ARIAS.MERGE_DATA_E_18 decreased size by 46.38 percent.
      Compressed is 252 pages; un-compressed would require 470 pages.
NOTE: DATA statement used (Total process time):
      real time           0.17 seconds
      cpu time            0.12 seconds

Which was only 1 time for that error.  Why did that occur? I can probably live with one line.

wlierman
Lapis Lazuli | Level 10
WHen I checked the SAS output file the DOB var was still like

38476
24564 etc

jimbarbour
Meteorite | Level 14

Well, looking at the data dump that SAS gave us, I see:

DOB=04/03/1861 00:00:00

 

That's not a numeric date value; that's a character date-time value.  One cannot read a character date-time value with a numeric Informat. like the "5." I suggested. 

 

So, question:  Are your DOB values numeric dates or are they character date-time values?  Or do most DOB values consist of numeric dates and this one record has a character date-time value rather than a numeric date?

 

Jim

wlierman
Lapis Lazuli | Level 10
The DOB will be in the date-time format like the one you mentioned (below). However, not sure why the year = 1861. Other date-time related fields in the file that I have seen are of this format but some aren't. For example, the following field Created_On=01AUG20:08:33:54 has a different datepart. Very little consistency in the file. Not sure why it is set up that way. Data comes from contact tracer records from the field.

DOB=04/03/1861 00:00:00

wlierman
jimbarbour
Meteorite | Level 14

I'd say that the DOB in 1861 is bad data.  Someone born in 1861 would be about 160 years old today, which I'm thinking is probably not the case.  It's probably a mis-key of 1961, but that's just a guess.  If the data is bad, there isn't all that much you can do about it.   SAS will capture it as a missing value (".") which is just as it should be.   

 

Are the majority (except for bad data like our 1861 example) of the records now showing the date the way you want them to?  If so, I'd say we've found our solution.

 

Jim

wlierman
Lapis Lazuli | Level 10

Hi Jim,

 

Unfortunately the DOB is still in the char format.  Rather than copy a a **bleep** of the output dataset these are the first 10 values for DOB:

 

DOB

38407

29738

42385

37471

34608

26080

39334

37308

38671

27413

wlierman
Lapis Lazuli | Level 10
I could just drop DOB, but I am really curious now as to how to solve this. There is a likelihood that similar situations may come up as I start drilling into some of the other fields when answering requests.
I appreciate your help on this.

wlierman
jimbarbour
Meteorite | Level 14

@wlierman wrote:
I could just drop DOB, but I am really curious now as to how to solve this.

Do you mean the 1861 issue?  I think the 1861 is bad data in which case even if we read it in, it will still be wrong.  However, not all date issues will be bad data, so let's have a look:

 

We have this so far:

New_DOB = PUT(INPUT(DOB, 5.), MMDDYYS10.);

Which I believe is working fine 99% of the time -- except of course New_DOB sometimes comes up "missing."  After the above code, we can do the following:

	IF STRIP(New_DOB)	=	'.' THEN
    	New_DOB 		=	PUT(INPUT(SUBSTR(DOB, 1, (INDEX(DOB, ' '))), MMDDYY10.), MMDDYYS10.);

We'd then have the following results:

Date_Formatted_B_2020-09-09_16-19-44.jpg

 

The above code may seem a little complicated, but it does protect us from truly invalid dates, for example 04/32/1861.  A truly invalid date will result in a '.' which we may want to translate to a space since a space represents missing for character fields.  

 

Jim

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 19 replies
  • 2940 views
  • 3 likes
  • 3 in conversation