BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Walternate
Obsidian | Level 7

Hi all,

 

I have some data which includes 4 date variables. They came as string variables in Excel, so in setting up the new date variables, first I get rid of the text stuff. So if the date variable (in an array) is missing, N/A, or other text value, the new date var gets set to a nonsense value:

do i = 1 to dim(_dtvar_sa);
if _dtvar_sa[i] = ' ' then _cleandate_all[i] = '01JAN1900'd;
else if _dtvar_sa[i] in ('N/A', 'NA') then _cleandate_all[i]='01JAN1901'd;
else if anyalpha(compress(_dtvar_sa[i], ' /.'))>0 then _cleandate_all[i]='01JAN1902'd;

 

This part seems to be working okay from what I can tell. However, the other part is taking the strings that are probably dates and putting them together to form a complete date value for SAS. What I'm trying to do is split the strings into month, day, and year values, which I then use mdy to combine as a final date value. 

 

However, part of my sequence is giving me trouble. After I pull out the day and month values from the strings, I have these commands which theoretically build numeric day and month vars from those values:

 

then do;
_dayvar_sa[i] = scan(left(strip(_dtvar_sa[i])), 1, ' / .');
end;
_dayvar_sa2[i] = input(_dayvar_sa[i], 3.);

 

then do;
_mthvar_sa[i] = scan(left(strip(_dtvar_sa[i])), 2, ' / .');
end;
_mthvar_sa2[i] = input(_mthvar_sa[i], 3.);

 

It seems like the first command is working fine for both day and month, but SAS is completely ignoring the input commands for both day and month. I have a similar one for year which is working just fine. 

 

I am getting notes in the log to the effect of Invalid value for input function, but when I look at the values in question, they don't have letters or anything so I'm not sure what the problem is. 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Please do not post disconnected snippets of code. In part8'icular, it is UTTERLY useless to post a THEN branch without the IF and the condition., as that is the most important part of a conditional statement.

 

So pleasepost:

  • some examples of your data in usable form (data step with datalines, DO NOT SKIP THIS!)
  • the complete code of your data step
  • the expected results for your examples

View solution in original post

8 REPLIES 8
Kurt_Bremser
Super User

Please do not post disconnected snippets of code. In part8'icular, it is UTTERLY useless to post a THEN branch without the IF and the condition., as that is the most important part of a conditional statement.

 

So pleasepost:

  • some examples of your data in usable form (data step with datalines, DO NOT SKIP THIS!)
  • the complete code of your data step
  • the expected results for your examples
Walternate
Obsidian | Level 7
Sure, here is the full code snippet:
data want;
set have;
/*CLEAN UP DATE VARIABLES - ALL DIFFERENT FORMATS, PUNTUATION, ETC - THERE IS BASICALLY NO STANDARDIZATION*/
array _dtvar_sa {*} request_dt_d1 start_dt_d1 final_consult_dt_d1 dt_last_collect_d1;
array _yrvar_sa {*} request_yr start_yr final_consult_year year_last_collect;
array _yrvar_sa2 {*} request_yr2 start_yr final_consult_year_2 year_last_collect_2;
array _yrvar_sa3 {*} request_yr3 start_yr final_consult_year_3 year_last_collect_3;
array _mthvar_sa {*} request_mth start_mth final_consult_mth mth_last_collect;
array _mthvar_sa2 {*} request_mth2 start_mth final_consult_mth_2 mth_last_collect_2;
array _dayvar_sa {*} request_day start_day final_consult_day day_last_collect;
array _dayvar_sa2 {*} request_day2 start_day2 final_consult_day_2 day_last_collect_2;
array _cleandate_sa {*} request_dt_sa start_dt_sa final_consult_dt_sa dt_last_collect_sa;
array _cleandate_all {*} request_dt start_dt final_consult_dt dt_last_collect;

format request_dt start_dt final_consult_dt_ dt_last_collect ddmmyy10.;

/*FIRST, LET US SET VALUES OF N/A, MISSING, OR OTHER TEXT VALUES TO A NONSENSE DATE SO IT FITS INTO THE COLUMN BUT WILL*/
/*NOT BE CONFUSED AS HAVING A REAL VALUE*/
do i = 1 to dim(_dtvar_sa);
if _dtvar_sa[i] = ' ' then _cleandate_all[i] = '01JAN1900'd;
else if _dtvar_sa[i] in ('N/A', 'NA') then _cleandate_all[i]='01JAN1901'd;
else if anyalpha(compress(_dtvar_sa[i], ' /.'))>0 then _cleandate_all[i]='01JAN1902'd;


/*FOR TEXT DATE VALUES:*/

/*SPLIT DATE VALUES INTO MONTH, DAY, YEAR. ASSUME FIRST SET OF NUMBERS IS DAY UNLESS OBVIOUS OTHEWISE*/
else do;

/*IF VALUE IS SPLIT INTO SEGMENTS*/
if index(_dtvar_sa[i], '.') or index(_dtvar_sa[i], '/') or index(_dtvar_sa[i], '-') then do;
/*IF THERE ARE 3 SEGMENTS, SPLIT INTO DAY/MONTH/YEAR*/
if scan(_dtvar_sa[i], 3, './- ') ne ' ' then do;

/*ASSUME THE FIRST SET OF NUMBERS IS THE DAY*/
if scan(_dtvar_sa[i], 1, ' /.-') in ('01','1', '02','2', '03','3', '04','4', '05','5', '06','6', '07','7', '08','8', '09','9', '10','11',
'12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27',
'28', '29', '30', '31')
then do;
_dayvar_sa[i] = scan(left(strip(_dtvar_sa[i])), 1, ' / .-');
end;
_dayvar_sa2[i] = input(_dayvar_sa[i], 3.);


/*ASSUME THE SECOND SET OF NUMBERS IS THE MONTH UNLESS IT IS >12*/
if scan(_dtvar_sa[i], 2, ' /.-') in ('01','1', '02','2', '03','3', '04','4', '05','5', '06','6', '07','7', '08','8', '09','9', '10','11', '12')
then do;
_mthvar_sa[i] = scan(left(strip(_dtvar_sa[i])), 2, ' / .-');
end;
_mthvar_sa2[i] = input(_mthvar_sa[i], 3.);


/*ASSUME THE THIRD SET OF NUMBERS IS THE YEAR*/
_yrvar_sa[i]=scan(_dtvar_sa[i], 3, '/- .');

if _yrvar_sa[i] = '11' then _yrvar_sa2[i]='2011';
else if _yrvar_sa[i] = '12' then _yrvar_sa2[i]='2012';
else if _yrvar_sa[i] = '13' then _yrvar_sa2[i]='2013';
else if _yrvar_sa[i] = '14' then _yrvar_sa2[i]='2014';
else if _yrvar_sa[i] = '15' then _yrvar_sa2[i]='2015';
else if _yrvar_sa[i] = '16' then _yrvar_sa2[i]='2016';
else if _yrvar_sa[i] = '17' then _yrvar_sa2[i]='2017';
else if _yrvar_sa[i] = '18' then _yrvar_sa2[i]='2018';
else if _yrvar_sa[i] = '19' then _yrvar_sa2[i]='2019';
else if _yrvar_sa[i] = '20' then _yrvar_sa2[i]='2020';
else if _yrvar_sa[i] = '21' then _yrvar_sa2[i]='2021';
else if _yrvar_sa[i] = '22' then _yrvar_sa2[i] = '2022';
else if _yrvar_sa[i] ne ' ' then _yrvar_sa2[i] = _yrvar_sa[i];
end;end;

/*SOME VALUES JUST HAVE A YEAR AND NOTHING ELSE*/
if _dtvar_sa[i] = '2021' then _yrvar_sa2[i] = '2021';
else if _dtvar_sa[i] = '2022' then _yrvar_sa2[i]='2022';

_yrvar_sa3[i] = input(_yrvar_sa2[i], 4.);

end;end;
run;

Raw date var: Desired date var
1/1/1900
N/A 1/1/1901
RANDOM TEXT 1/1/1902
5/3/22 05/03/2022
24-02-2021 24/02/2021
PaigeMiller
Diamond | Level 26

Ok, that's the code. You have ignored this request from @Kurt_Bremser 

 

some examples of your data in usable form (data step with datalines, DO NOT SKIP THIS!)
--
Paige Miller
ballardw
Super User

@Walternate wrote:

Raw date var: Desired date var
1/1/1900
N/A 1/1/1901
RANDOM TEXT 1/1/1902
5/3/22 05/03/2022
24-02-2021 24/02/2021

Please look very closely at that example. Note that it is not possible to see the "raw date" for the first line. Paste text data into a TEXT box opened on the forum with the </> icon to prevent the message window from "cleaning up" text by removing leading spaces and such.

Now we need proc contents since all of your arrays are defaulting to the type of the original variables from your source data set. You create a bunch of CHARACTER values for Day, Month and Year. Why? Keeping dates as character just adds complexity for any use such as determining report periods, intervals, comparing dates and such.

 

The last two values should be read with a DDMMYY10 informat and no pulling apart month and day needed. Assign what ever format you want for appearance.

data example;
   input date :ddmmyy10.;
   put date= date9. date= ddmmyy10.;
datalines;
5/3/22 
24-02-2021
;

The put is just another example using a desired format at the time of display.

If your "date" is really inconsistent then perhaps you want to investigate the ANYDTDTE informat as that handles a variety of cases that you have not shown.

 

And I still think that setting date values of 01Jan1900 is very poor.

 

 

Walternate
Obsidian | Level 7
Oops, forgot the final segment which builds the date var:
/*BUILD CLEAN DATE VAR*/
if missing(_yrvar_sa2[i])=0 and missing(_mthvar_sa[i])=0 and missing(_dayvar_sa[i])=0 then do;
_cleandate_sa[i]= mdy( _dayvar_sa[i], _mthvar_sa[i], _yrvar_sa2[i]);
end;
else if missing(_yrvar_sa2[i])=0 or missing(_mthvar_sa[i])=0 or missing(_dayvar_sa[i])=0 then do;
_cleandate_sa[i] = '01JAN1903';
end;

else _cleandate_sa[i] = _dtvar_sa[i];

That is the last part of the same data step
ballardw
Super User

Strong hint: Show actual values that you are working with that do not show the expected result.

It is very likely that there is a proper Informat to read the dates and your tearing unshown strings apart to get at pieces does not show what is needed.

 

Stronger hint: Show the entire data step. I can't tell if you expect numeric or character values for some of those arrays and need the array definition to possibly debug your issues.

 

Suggestion: Do not provide valid date values as "nonsense" values. Somewhere some one is going to use those values and be surprised at a result and then spend time trying to figure out why there is garbage in the data. SAS has a group of "values" for numeric variables that are special missing values. These are a dot followed by a letter or the underscore character, i.e  ._   .A to .Z . You can use those to indicate the status of the original improper value such as .B for the 'blank', .N for the 'NA' and so one. These values are missing for use in any function and you can create custom formats to display text indicating the status as needed.

 

 

 

Walternate
Obsidian | Level 7

Example dates from my data that are being flagged as not working by the log:

 

26/2/22

29/12/22

28/1/22

14/4/22

Tom
Super User Tom
Super User

@Walternate wrote:

Example dates from my data that are being flagged as not working by the log:

 

26/2/22

29/12/22

28/1/22

14/4/22


None of those look like dates to me.  There is no month 14, 26, 28, or 29.  They do not indicate what century they should be in.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 596 views
  • 0 likes
  • 5 in conversation