BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Moksha
Pyrite | Level 9

Hi,

   I have the following dataset. 

data date_practice;
input date $;
date_monthpart=substr(date,3,3);
put date_monthpart=;
cards;
10jan2022
15feb2023
31dec1998
01cde2010
15aug1983
12jan2012
16xyz2017
;

run;

 

I want to check if the month portion is not in one of the months 'jan', 'feb', 'mar', 'apr', ..... 'dec', then I want to create a new variable new_month with value 'Dec'.

 

I have created a character array of months month_array and then used compare function to check if the date_monthpart value is in this array values. Since compare function returns 0, if there is a match, I am comparing if this return value is not 0, then writing new_month='Dec';

But, this is not working as compare function is returning different values like -1, 1, 2, -2.

Code:

data date_practice_check(keep=date date_monthpart new_month);
set date_practice;

array month_array{*} $ mon1-mon12 ('Jan' 'Feb' 'Mar' 'Apr' 'May' 'Jun' 'Jul' 'Aug' 'Sep' 'Oct' 'Nov' 'Dec');

do i=1 to dim(month_array);
compare_month=compare(date_monthpart,month_array{i},'i');
/*if compare_month>0 or compare_month<0 then new_month='Dec';*/
if compare_month ^= 0 then new_month='Dec';

put compare_month= new_month=;
end;
run;

proc print data=date_practice_check;
run;

Expected output: 

 

Obs date        date_monthpart  new_month

1    10jan202       jan                     
2    15feb202      feb                    
3     31dec199    dec                     
4     01cde201    cde                     Dec
5     15aug198    aug                     
6      12jan201    jan                       
7      16xyz201   xyz                      Dec

 

But, getting the following output:
                 
Obs date        date_monthpart  new_month

1    10jan202       jan                     Dec
2    15feb202      feb                     Dec
3     31dec199    dec                     Dec
4     01cde201    cde                     Dec
5     15aug198    aug                     Dec
6      12jan201    jan                       Dec
7      16xyz201   xyz                      Dec

 

Please, advise what's wrong with the code and also if there is any efficient way of coding this.

Instead of using compare function, is there any other way to compare if character variables are equal or not.

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@Moksha wrote:

Hi,

   I have the following dataset. 

data date_practice;
input date $;
date_monthpart=substr(date,3,3);
put date_monthpart=;
cards;
10jan2022
15feb2023
31dec1998
01cde2010
15aug1983
12jan2012
16xyz2017
;

run;

 

I want to check if the month portion is not in one of the months 'jan', 'feb', 'mar', 'apr', ..... 'dec', then I want to create a new variable new_month with value 'Dec'.

 

I have created a character array of months month_array and then used compare function to check if the date_monthpart value is in this array values. Since compare function returns 0, if there is a match, I am comparing if this return value is not 0, then writing new_month='Dec';

But, this if condition comparison is not working.

Code:

data date_practice_check(keep=date date_monthpart new_month);
set date_practice;

array month_array{*} $ mon1-mon12 ('Jan' 'Feb' 'Mar' 'Apr' 'May' 'Jun' 'Jul' 'Aug' 'Sep' 'Oct' 'Nov' 'Dec');

do i=1 to dim(month_array);
compare_month=compare(date_monthpart,month_array{i},'i');
/*if compare_month>0 or compare_month<0 then new_month='Dec';*/
if compare_month ^= 0 then new_month='Dec';

put compare_month= new_month=;
end;
run;

proc print data=date_practice_check;
run;

Expected output: 

 

Obs date        date_monthpart  new_month

1    10jan202       jan                     
2    15feb202      feb                    
3     31dec199    dec                     
4     01cde201    cde                     Dec
5     15aug198    aug                     
6      12jan201    jan                       
7      16xyz201   xyz                      Dec

 

But, getting the following output:
                 
Obs date        date_monthpart  new_month

1    10jan202       jan                     Dec
2    15feb202      feb                     Dec
3     31dec199    dec                     Dec
4     01cde201    cde                     Dec
5     15aug198    aug                     Dec
6      12jan201    jan                       Dec
7      16xyz201   xyz                      Dec

 

Please, advise what's wrong with the code and also if there is any efficient way of coding this.


If you look closely at the the values of Date_monthpart that you provide all of the months are in lower case letters.

The array definition of month_array values all have the first letter as capitol letters. So none of the values in of Date_month part are the same as any of the array values.

Second, you always test all twelve values, so the result of compare_month as written to the data set is the result of the comparison for the Dec (12th) value of the array. And again such none match you get the not equal result of DEC.

 

SAS has two functions that will check to see if a value is in a list of values and return the position number if found. WHICHC is for character values and WHICHN is for numeric values. So you can use that function to see if the value is in a list of months if you provide the correct comparison.

An example:

   
data new_practice_check(keep=date date_monthpart new_month);
   set date_practice;
   array month_array{*} $ mon1-mon12 ('JAN' 'FEB' 'MAR' 'APR' 'MAY' 'JUN' 'JUL' 'AUG' 'SEP' 'OCT' 'NOV' 'DEC');
   if whichc(upcase(date_monthpart),of month_array(*))= 0 then new_month='Dec';

run;

Note that I have changed the values of the array to all upper case letters. Then in the WHICHC function call UPCASE the Date_monthpart values so that have some chance of matching the listed values. If the value of Date_month part is not found in the list of the array values, note the "of month_array(*)" syntax to include all values of the array in array definition order then the function returns 0.

 

Also, especially for people learning SAS coding you really do not want to use the syntax where the input set and the output data set name are the same like this:

data date_practice_check(keep=date date_monthpart new_month);
set date_practice;

Legal syntax but it completely replaces the source data set. So if you have a logic error in assignment you may change values that you did not intend to and then have to rebuild the source data set. I inherited a data set that had a variable that was supposed to be coded with values of 1,2,3 that had all values as 1. I found a code file where someone had wanted to collapse the value categories of 1 and 2 to a 1 and the 3 to a 2, creating a 1,2 coded variable. The code looked like

if var in (2,3) then var=var-1;

Which would have been okay if the code had been run once. But since it was reusing the same variable the second time the code was run, likely changed to code something else, the values that had been changed to 2 in the first pass become 1 in the second pass. The coder had also used the same input/output data set name and replaced the starting values.

 

 

What is the purpose of this exercise? I ask because the number of times that dates should be treated as character values is so low that really don't do it.

View solution in original post

2 REPLIES 2
ballardw
Super User

@Moksha wrote:

Hi,

   I have the following dataset. 

data date_practice;
input date $;
date_monthpart=substr(date,3,3);
put date_monthpart=;
cards;
10jan2022
15feb2023
31dec1998
01cde2010
15aug1983
12jan2012
16xyz2017
;

run;

 

I want to check if the month portion is not in one of the months 'jan', 'feb', 'mar', 'apr', ..... 'dec', then I want to create a new variable new_month with value 'Dec'.

 

I have created a character array of months month_array and then used compare function to check if the date_monthpart value is in this array values. Since compare function returns 0, if there is a match, I am comparing if this return value is not 0, then writing new_month='Dec';

But, this if condition comparison is not working.

Code:

data date_practice_check(keep=date date_monthpart new_month);
set date_practice;

array month_array{*} $ mon1-mon12 ('Jan' 'Feb' 'Mar' 'Apr' 'May' 'Jun' 'Jul' 'Aug' 'Sep' 'Oct' 'Nov' 'Dec');

do i=1 to dim(month_array);
compare_month=compare(date_monthpart,month_array{i},'i');
/*if compare_month>0 or compare_month<0 then new_month='Dec';*/
if compare_month ^= 0 then new_month='Dec';

put compare_month= new_month=;
end;
run;

proc print data=date_practice_check;
run;

Expected output: 

 

Obs date        date_monthpart  new_month

1    10jan202       jan                     
2    15feb202      feb                    
3     31dec199    dec                     
4     01cde201    cde                     Dec
5     15aug198    aug                     
6      12jan201    jan                       
7      16xyz201   xyz                      Dec

 

But, getting the following output:
                 
Obs date        date_monthpart  new_month

1    10jan202       jan                     Dec
2    15feb202      feb                     Dec
3     31dec199    dec                     Dec
4     01cde201    cde                     Dec
5     15aug198    aug                     Dec
6      12jan201    jan                       Dec
7      16xyz201   xyz                      Dec

 

Please, advise what's wrong with the code and also if there is any efficient way of coding this.


If you look closely at the the values of Date_monthpart that you provide all of the months are in lower case letters.

The array definition of month_array values all have the first letter as capitol letters. So none of the values in of Date_month part are the same as any of the array values.

Second, you always test all twelve values, so the result of compare_month as written to the data set is the result of the comparison for the Dec (12th) value of the array. And again such none match you get the not equal result of DEC.

 

SAS has two functions that will check to see if a value is in a list of values and return the position number if found. WHICHC is for character values and WHICHN is for numeric values. So you can use that function to see if the value is in a list of months if you provide the correct comparison.

An example:

   
data new_practice_check(keep=date date_monthpart new_month);
   set date_practice;
   array month_array{*} $ mon1-mon12 ('JAN' 'FEB' 'MAR' 'APR' 'MAY' 'JUN' 'JUL' 'AUG' 'SEP' 'OCT' 'NOV' 'DEC');
   if whichc(upcase(date_monthpart),of month_array(*))= 0 then new_month='Dec';

run;

Note that I have changed the values of the array to all upper case letters. Then in the WHICHC function call UPCASE the Date_monthpart values so that have some chance of matching the listed values. If the value of Date_month part is not found in the list of the array values, note the "of month_array(*)" syntax to include all values of the array in array definition order then the function returns 0.

 

Also, especially for people learning SAS coding you really do not want to use the syntax where the input set and the output data set name are the same like this:

data date_practice_check(keep=date date_monthpart new_month);
set date_practice;

Legal syntax but it completely replaces the source data set. So if you have a logic error in assignment you may change values that you did not intend to and then have to rebuild the source data set. I inherited a data set that had a variable that was supposed to be coded with values of 1,2,3 that had all values as 1. I found a code file where someone had wanted to collapse the value categories of 1 and 2 to a 1 and the 3 to a 2, creating a 1,2 coded variable. The code looked like

if var in (2,3) then var=var-1;

Which would have been okay if the code had been run once. But since it was reusing the same variable the second time the code was run, likely changed to code something else, the values that had been changed to 2 in the first pass become 1 in the second pass. The coder had also used the same input/output data set name and replaced the starting values.

 

 

What is the purpose of this exercise? I ask because the number of times that dates should be treated as character values is so low that really don't do it.

Moksha
Pyrite | Level 9

Thank you very much ballardw for a very detailed explanation and providing easy solution. It worked. As I am new to SAS, I am trying to learn SAS and as part of this, I have tried this. I came to know about WHICHC and WHICHN functions now.

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 2208 views
  • 0 likes
  • 2 in conversation