Hello!
I have 2 separate files, and I need to pull a value from the second file based on conditions between both files.
File 1
ID | Date |
1 | 1/2/2020 |
2 | 1/8/2020 |
3 | 2/24/2020 |
4 | 3/16/2020 |
File 2
DateRef_min | DateRef_max | Week |
12/29/2019 | 1/4/2020 | 1 |
1/5/2020 | 1/11/2020 | 2 |
1/12/2020 | 1/18/2020 | 3 |
1/19/2020 | 1/25/2020 | 4 |
1/26/2020 | 2/1/2020 | 5 |
2/2/2020 | 2/8/2020 | 6 |
2/9/2020 | 2/15/2020 | 7 |
2/16/2020 | 2/22/2020 | 8 |
2/23/2020 | 2/29/2020 | 9 |
3/1/2020 | 3/7/2020 | 10 |
3/8/2020 | 3/14/2020 | 11 |
3/15/2020 | 3/21/2020 | 12 |
FILE WANT
ID | Date | Week |
1 | 1/2/2020 | 1 |
2 | 1/8/2020 | 2 |
3 | 2/24/2020 | 9 |
4 | 3/16/2020 | 12 |
What I'm trying to do is:
IF date >= DateRef_min & date <= DataRef_max THEN week = week, but I'm not sure how to code this across 2 different files.
Can someone help me understand the best approach?
Thank you!
UNTESTED CODED
Assumes your dates are actual numeric SAS dates, and not character strings
proc sql;
create table want as select a.id,a.date,b.week
from file1 as a,file2 as b
where b.dateref_min<=a.date<=b.dateref_max
order by a.id;
quit;
If your real data has millions of records, this will probably either fail or take forever.
If you want tested code, please provide your data as working SAS data step code which you can type in yourself (or follow these instructions) and not as screen captures and not in any other format.
IF your DATE variable is an actual SAS date value you may not need to do any look up at all. SAS has a WEEK function that depending on an optional parameter returns the week of the year based on rules around the start of the year.
And example of code:
data example; input ID Date :mmddyy10.; format data mmddyy10.; weeku= week(date,'U'); weekv= week(date,'V'); weekw= week(date,'W'); datalines; 1 1/2/2020 2 1/8/2020 3 2/24/2020 4 3/16/2020 ;
It seems possible that you may want the V parameter.
Meaning of the U, V and W parameters (from the documentation of the Week function)
U
specifies the number-of-the-week within the year. Sunday is considered the first day of the week. The number-of-the-week value is represented as a decimal number in the range 0–53. Week 53 has no special meaning. The value of
week('31dec2006'd, 'u')
is 53. U is the default value.
Tip The U and W descriptors are similar, except that the U descriptor considers Sunday as the first day of the week, and the W descriptor considers Monday as the first day of the week. V
specifies the number-of-the-week whose value is represented as a decimal number in the range 1–53. Monday is considered the first day of the week and week 1 of the year is the week that includes both January 4 and the first Thursday of the year. If the first Monday of January is the 2nd, 3rd, or 4th, the preceding days are part of the last week of the preceding year.
W
specifies the number-of-the-week within the year. Monday is considered the first day of the week. The number-of-the-week value is represented as a decimal number in the range 0–53. Week 53 has no special meaning. The value of
week('31dec2006'd, 'w')
is 52.
Tip The U and W descriptors are similar except that the U descriptor considers Sunday as the first day of the week, and the W descriptor considers Monday as the first day of the week.
Which may be a great deal more flexible than attempting to maintain 1000's of look up values.
Thank you, I have tried to use the week function, but it doesn't work for what I need. The data I provided is just a quick dummy dataset.
For my purposes, January 1 could fall within the last week of the previous year, or December 29th, 30th, or 31st could fall within the first week of the current year. The weeks are set up on a different set of rules. So I can't use this function for every date as I would have to find every exception and correct it (across 20 years).
I have a file of records with dates, and a separate file that has a date range and a third parameter. I need to assign the 3rd parameter when the original date falls within the range on the reference file. I have another instance where I need to do this same process but with the 3rd parameter contains an alphanumeric code, so even if the week function worked, I would still need an alternative approach for the alphanumeric variable.
I appreciate your help!
@ErinLM wrote:
Thank you, I have tried to use the week function, but it doesn't work for what I need. The data I provided is just a quick dummy dataset.
For my purposes, January 1 could fall within the last week of the previous year, or December 29th, 30th, or 31st could fall within the first week of the current year. The weeks are set up on a different set of rules. So I can't use this function for every date as I would have to find every exception and correct it (across 20 years).
I have a file of records with dates, and a separate file that has a date range and a third parameter. I need to assign the 3rd parameter when the original date falls within the range on the reference file. I have another instance where I need to do this same process but with the 3rd parameter contains an alphanumeric code, so even if the week function worked, I would still need an alternative approach for the alphanumeric variable.
I appreciate your help!
Highlighted text makes no sense. "Between" values when they are alphanumeric, i.e. not actual a number, means that comparisons will fail left, right and middle quite often because non-equality is returned from the first position that does not match.
Example from comparing your values if they are character:
data example; minref="12/29/2019"; maxref="1/4/2020"; input compare :$10.; result1= (minref le compare); result2= (compare le maxref); datalines; 1/2/2020 1/8/2020 2/24/2020 3/16/2020 ;
The Minref value is not less than (or equal) to 1/2/2020 or 1/8/2020 because of the sort order involving / and numerals.
Any "date" used for any comparison or computation really really really needs to be an actual SAS date value. 'Alphanumeric' generally need not apply though order of YYYYMMDD would work, but that is not the appearance you showed.
Another option that would avoid merging two tables and performance issues would be to use that min/max value pair to make a custom FORMAT (again based on actual date values)
data cntlinset; input start :MMddyy10. end :mmddyy10. label ; fmtname = "Date2wknum"; datalines; 12/29/2019 1/4/2020 1 1/5/2020 1/11/2020 2 1/12/2020 1/18/2020 3 1/19/2020 1/25/2020 4 1/26/2020 2/1/2020 5 2/2/2020 2/8/2020 6 2/9/2020 2/15/2020 7 2/16/2020 2/22/2020 8 2/23/2020 2/29/2020 9 3/1/2020 3/7/2020 10 3/8/2020 3/14/2020 11 3/15/2020 3/21/2020 12 ; proc format cntlin=cntlinset; run; data example; input ID Date :mmddyy10.; format date Date2wknum.; /* create text variable*/ wktext=put(date,date2wknum.); datalines; 1 1/2/2020 2 1/8/2020 3 2/24/2020 4 3/16/2020 ;
Create the format needed ONE time, store in a permanent library, add the path to that format to the FMTSEARCH system setting and use any time you want.
There is very little overhead in Formats that might occur with SQL.
I show using that format directly with the date variable as an option as for many reporting or analysis tasks the group created by the format can be used in the procedures and you do not even need to add a variable to the data set. Custom formats may work with graphs depending on all the data and graphs involved.
Wow! I appreciate the detail, but that is a very odd interpretation of what I said. The comparison is still the original date to the 2 reference dates. The difference is that the variable pulled is an alphanumeric code. In short, instead of returning a number for the week, I'm just trying to pull the assigned code for any record with a date between the reference dates.
DateRef_min | DateRef_max | Week | Assign |
12/29/2019 | 1/4/2020 | 1 | 1ac345 |
1/5/2020 | 1/11/2020 | 2 | 3jw743 |
1/12/2020 | 1/18/2020 | 3 | 8sz938 |
Thank you for your input.
UNTESTED CODED
Assumes your dates are actual numeric SAS dates, and not character strings
proc sql;
create table want as select a.id,a.date,b.week
from file1 as a,file2 as b
where b.dateref_min<=a.date<=b.dateref_max
order by a.id;
quit;
If your real data has millions of records, this will probably either fail or take forever.
If you want tested code, please provide your data as working SAS data step code which you can type in yourself (or follow these instructions) and not as screen captures and not in any other format.
@ErinLM wrote:
I apologize, my data is sensitive so I cannot provide it, hence the dummy data.
Even dummy data should be provided in usable form, as SAS data step code.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.