Hi,
I have the following dataset.
data date_practice;
input date $;
date_monthpart=substr(date,3,3);
put date_monthpart=;
cards;
10jan2022
15feb2023
31dec1998
01cde2010
15aug1983
12jan2012
16xyz2017
;
run;
I want to check if the month portion is not in one of the months 'jan', 'feb', 'mar', 'apr', ..... 'dec', then I want to create a new variable new_month with value 'Dec'.
I have created a character array of months month_array and then used compare function to check if the date_monthpart value is in this array values. Since compare function returns 0, if there is a match, I am comparing if this return value is not 0, then writing new_month='Dec';
But, this is not working as compare function is returning different values like -1, 1, 2, -2.
Code:
data date_practice_check(keep=date date_monthpart new_month);
set date_practice;
array month_array{*} $ mon1-mon12 ('Jan' 'Feb' 'Mar' 'Apr' 'May' 'Jun' 'Jul' 'Aug' 'Sep' 'Oct' 'Nov' 'Dec');
do i=1 to dim(month_array);
compare_month=compare(date_monthpart,month_array{i},'i');
/*if compare_month>0 or compare_month<0 then new_month='Dec';*/
if compare_month ^= 0 then new_month='Dec';
put compare_month= new_month=;
end;
run;
proc print data=date_practice_check;
run;
Expected output:
Obs date date_monthpart new_month
1 10jan202 jan
2 15feb202 feb
3 31dec199 dec
4 01cde201 cde Dec
5 15aug198 aug
6 12jan201 jan
7 16xyz201 xyz Dec
But, getting the following output:
Obs date date_monthpart new_month
1 10jan202 jan Dec
2 15feb202 feb Dec
3 31dec199 dec Dec
4 01cde201 cde Dec
5 15aug198 aug Dec
6 12jan201 jan Dec
7 16xyz201 xyz Dec
Please, advise what's wrong with the code and also if there is any efficient way of coding this.
Instead of using compare function, is there any other way to compare if character variables are equal or not.
@Moksha wrote:
Hi,
I have the following dataset.
data date_practice;
input date $;
date_monthpart=substr(date,3,3);
put date_monthpart=;
cards;
10jan2022
15feb2023
31dec1998
01cde2010
15aug1983
12jan2012
16xyz2017
;run;
I want to check if the month portion is not in one of the months 'jan', 'feb', 'mar', 'apr', ..... 'dec', then I want to create a new variable new_month with value 'Dec'.
I have created a character array of months month_array and then used compare function to check if the date_monthpart value is in this array values. Since compare function returns 0, if there is a match, I am comparing if this return value is not 0, then writing new_month='Dec';
But, this if condition comparison is not working.
Code:
data date_practice_check(keep=date date_monthpart new_month);
set date_practice;array month_array{*} $ mon1-mon12 ('Jan' 'Feb' 'Mar' 'Apr' 'May' 'Jun' 'Jul' 'Aug' 'Sep' 'Oct' 'Nov' 'Dec');
do i=1 to dim(month_array);
compare_month=compare(date_monthpart,month_array{i},'i');
/*if compare_month>0 or compare_month<0 then new_month='Dec';*/
if compare_month ^= 0 then new_month='Dec';put compare_month= new_month=;
end;
run;proc print data=date_practice_check;
run;Expected output:
Obs date date_monthpart new_month
1 10jan202 jan
2 15feb202 feb
3 31dec199 dec
4 01cde201 cde Dec
5 15aug198 aug
6 12jan201 jan
7 16xyz201 xyz Dec
But, getting the following output:
Obs date date_monthpart new_month1 10jan202 jan Dec
2 15feb202 feb Dec
3 31dec199 dec Dec
4 01cde201 cde Dec
5 15aug198 aug Dec
6 12jan201 jan Dec
7 16xyz201 xyz Dec
Please, advise what's wrong with the code and also if there is any efficient way of coding this.
If you look closely at the the values of Date_monthpart that you provide all of the months are in lower case letters.
The array definition of month_array values all have the first letter as capitol letters. So none of the values in of Date_month part are the same as any of the array values.
Second, you always test all twelve values, so the result of compare_month as written to the data set is the result of the comparison for the Dec (12th) value of the array. And again such none match you get the not equal result of DEC.
SAS has two functions that will check to see if a value is in a list of values and return the position number if found. WHICHC is for character values and WHICHN is for numeric values. So you can use that function to see if the value is in a list of months if you provide the correct comparison.
An example:
data new_practice_check(keep=date date_monthpart new_month); set date_practice; array month_array{*} $ mon1-mon12 ('JAN' 'FEB' 'MAR' 'APR' 'MAY' 'JUN' 'JUL' 'AUG' 'SEP' 'OCT' 'NOV' 'DEC'); if whichc(upcase(date_monthpart),of month_array(*))= 0 then new_month='Dec'; run;
Note that I have changed the values of the array to all upper case letters. Then in the WHICHC function call UPCASE the Date_monthpart values so that have some chance of matching the listed values. If the value of Date_month part is not found in the list of the array values, note the "of month_array(*)" syntax to include all values of the array in array definition order then the function returns 0.
Also, especially for people learning SAS coding you really do not want to use the syntax where the input set and the output data set name are the same like this:
data date_practice_check(keep=date date_monthpart new_month); set date_practice;
Legal syntax but it completely replaces the source data set. So if you have a logic error in assignment you may change values that you did not intend to and then have to rebuild the source data set. I inherited a data set that had a variable that was supposed to be coded with values of 1,2,3 that had all values as 1. I found a code file where someone had wanted to collapse the value categories of 1 and 2 to a 1 and the 3 to a 2, creating a 1,2 coded variable. The code looked like
if var in (2,3) then var=var-1;
Which would have been okay if the code had been run once. But since it was reusing the same variable the second time the code was run, likely changed to code something else, the values that had been changed to 2 in the first pass become 1 in the second pass. The coder had also used the same input/output data set name and replaced the starting values.
What is the purpose of this exercise? I ask because the number of times that dates should be treated as character values is so low that really don't do it.
@Moksha wrote:
Hi,
I have the following dataset.
data date_practice;
input date $;
date_monthpart=substr(date,3,3);
put date_monthpart=;
cards;
10jan2022
15feb2023
31dec1998
01cde2010
15aug1983
12jan2012
16xyz2017
;run;
I want to check if the month portion is not in one of the months 'jan', 'feb', 'mar', 'apr', ..... 'dec', then I want to create a new variable new_month with value 'Dec'.
I have created a character array of months month_array and then used compare function to check if the date_monthpart value is in this array values. Since compare function returns 0, if there is a match, I am comparing if this return value is not 0, then writing new_month='Dec';
But, this if condition comparison is not working.
Code:
data date_practice_check(keep=date date_monthpart new_month);
set date_practice;array month_array{*} $ mon1-mon12 ('Jan' 'Feb' 'Mar' 'Apr' 'May' 'Jun' 'Jul' 'Aug' 'Sep' 'Oct' 'Nov' 'Dec');
do i=1 to dim(month_array);
compare_month=compare(date_monthpart,month_array{i},'i');
/*if compare_month>0 or compare_month<0 then new_month='Dec';*/
if compare_month ^= 0 then new_month='Dec';put compare_month= new_month=;
end;
run;proc print data=date_practice_check;
run;Expected output:
Obs date date_monthpart new_month
1 10jan202 jan
2 15feb202 feb
3 31dec199 dec
4 01cde201 cde Dec
5 15aug198 aug
6 12jan201 jan
7 16xyz201 xyz Dec
But, getting the following output:
Obs date date_monthpart new_month1 10jan202 jan Dec
2 15feb202 feb Dec
3 31dec199 dec Dec
4 01cde201 cde Dec
5 15aug198 aug Dec
6 12jan201 jan Dec
7 16xyz201 xyz Dec
Please, advise what's wrong with the code and also if there is any efficient way of coding this.
If you look closely at the the values of Date_monthpart that you provide all of the months are in lower case letters.
The array definition of month_array values all have the first letter as capitol letters. So none of the values in of Date_month part are the same as any of the array values.
Second, you always test all twelve values, so the result of compare_month as written to the data set is the result of the comparison for the Dec (12th) value of the array. And again such none match you get the not equal result of DEC.
SAS has two functions that will check to see if a value is in a list of values and return the position number if found. WHICHC is for character values and WHICHN is for numeric values. So you can use that function to see if the value is in a list of months if you provide the correct comparison.
An example:
data new_practice_check(keep=date date_monthpart new_month); set date_practice; array month_array{*} $ mon1-mon12 ('JAN' 'FEB' 'MAR' 'APR' 'MAY' 'JUN' 'JUL' 'AUG' 'SEP' 'OCT' 'NOV' 'DEC'); if whichc(upcase(date_monthpart),of month_array(*))= 0 then new_month='Dec'; run;
Note that I have changed the values of the array to all upper case letters. Then in the WHICHC function call UPCASE the Date_monthpart values so that have some chance of matching the listed values. If the value of Date_month part is not found in the list of the array values, note the "of month_array(*)" syntax to include all values of the array in array definition order then the function returns 0.
Also, especially for people learning SAS coding you really do not want to use the syntax where the input set and the output data set name are the same like this:
data date_practice_check(keep=date date_monthpart new_month); set date_practice;
Legal syntax but it completely replaces the source data set. So if you have a logic error in assignment you may change values that you did not intend to and then have to rebuild the source data set. I inherited a data set that had a variable that was supposed to be coded with values of 1,2,3 that had all values as 1. I found a code file where someone had wanted to collapse the value categories of 1 and 2 to a 1 and the 3 to a 2, creating a 1,2 coded variable. The code looked like
if var in (2,3) then var=var-1;
Which would have been okay if the code had been run once. But since it was reusing the same variable the second time the code was run, likely changed to code something else, the values that had been changed to 2 in the first pass become 1 in the second pass. The coder had also used the same input/output data set name and replaced the starting values.
What is the purpose of this exercise? I ask because the number of times that dates should be treated as character values is so low that really don't do it.
Thank you very much ballardw for a very detailed explanation and providing easy solution. It worked. As I am new to SAS, I am trying to learn SAS and as part of this, I have tried this. I came to know about WHICHC and WHICHN functions now.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.