BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
ErinLM
Obsidian | Level 7

Hello!

 

I have 2 separate files, and I need to pull a value from the second file based on conditions between both files. 

 

File 1

IDDate
11/2/2020
21/8/2020
32/24/2020
43/16/2020

 

File 2

DateRef_minDateRef_maxWeek
12/29/20191/4/20201
1/5/20201/11/20202
1/12/20201/18/20203
1/19/20201/25/20204
1/26/20202/1/20205
2/2/20202/8/20206
2/9/20202/15/20207
2/16/20202/22/20208
2/23/20202/29/20209
3/1/20203/7/202010
3/8/20203/14/202011
3/15/20203/21/202012

 

FILE WANT

 

IDDateWeek
11/2/20201
21/8/20202
32/24/20209
43/16/202012

 

What I'm trying to do is:

 

IF date >= DateRef_min & date <= DataRef_max THEN week = week, but I'm not sure how to code this across 2 different files. 

 

Can someone help me understand the best approach?

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

UNTESTED CODED

Assumes your dates are actual numeric SAS dates, and not character strings

 

proc sql;
    create table want as select a.id,a.date,b.week
    from file1 as a,file2 as b
    where b.dateref_min<=a.date<=b.dateref_max
    order by a.id;
quit;

 

If your real data has millions of records, this will probably either fail or take forever.

 

If you want tested code, please provide your data as working SAS data step code which you can type in yourself (or follow these instructions) and not as screen captures and not in any other format.

--
Paige Miller

View solution in original post

7 REPLIES 7
ballardw
Super User

IF your DATE variable is an actual SAS date value you may not need to do any look up at all. SAS has a WEEK function that depending on an optional parameter returns the week of the year based on rules around the start of the year.

 

And example of code:

data example;
  input ID	Date :mmddyy10.;
  format data mmddyy10.;
  weeku= week(date,'U');
  weekv= week(date,'V');
  weekw= week(date,'W');
datalines;
1	1/2/2020
2	1/8/2020
3	2/24/2020
4	3/16/2020
;

It seems possible that you may want the V parameter.

 

Meaning of the U, V and W parameters (from the documentation of the Week function)

U

specifies the number-of-the-week within the year. Sunday is considered the first day of the week. The number-of-the-week value is represented as a decimal number in the range 0–53. Week 53 has no special meaning. The value of week('31dec2006'd, 'u') is 53. U is the default value.

Tip The U and W descriptors are similar, except that the U descriptor considers Sunday as the first day of the week, and the W descriptor considers Monday as the first day of the week.
   

V

specifies the number-of-the-week whose value is represented as a decimal number in the range 1–53. Monday is considered the first day of the week and week 1 of the year is the week that includes both January 4 and the first Thursday of the year. If the first Monday of January is the 2nd, 3rd, or 4th, the preceding days are part of the last week of the preceding year.

W

specifies the number-of-the-week within the year. Monday is considered the first day of the week. The number-of-the-week value is represented as a decimal number in the range 0–53. Week 53 has no special meaning. The value of week('31dec2006'd, 'w') is 52.

Tip The U and W descriptors are similar except that the U descriptor considers Sunday as the first day of the week, and the W descriptor considers Monday as the first day of the week.
   
 

Which may be a great deal more flexible than attempting to maintain 1000's of look up values.

 

 

 

 

ErinLM
Obsidian | Level 7

Thank you, I have tried to use the week function, but it doesn't work for what I need. The data I provided is just a quick dummy dataset. 

 

For my purposes, January 1 could fall within the last week of the previous year, or December 29th, 30th, or 31st could fall within the first week of the current year. The weeks are set up on a different set of rules. So I can't use this function for every date as I would have to find every exception and correct it (across 20 years).

 

I have a file of records with dates, and a separate file that has a date range and a third parameter. I need to assign the 3rd parameter when the original date falls within the range on the reference file. I have another instance where I need to do this same process but with the 3rd parameter contains an alphanumeric code, so even if the week function worked, I would still need an alternative approach for the alphanumeric variable. 

 

I appreciate your help!

ballardw
Super User

@ErinLM wrote:

Thank you, I have tried to use the week function, but it doesn't work for what I need. The data I provided is just a quick dummy dataset. 

 

For my purposes, January 1 could fall within the last week of the previous year, or December 29th, 30th, or 31st could fall within the first week of the current year. The weeks are set up on a different set of rules. So I can't use this function for every date as I would have to find every exception and correct it (across 20 years).

 

I have a file of records with dates, and a separate file that has a date range and a third parameter. I need to assign the 3rd parameter when the original date falls within the range on the reference file. I have another instance where I need to do this same process but with the 3rd parameter contains an alphanumeric code, so even if the week function worked, I would still need an alternative approach for the alphanumeric variable. 

 

I appreciate your help!


Highlighted text makes no sense. "Between" values when they are alphanumeric, i.e. not actual a number, means that comparisons will fail left, right and middle quite often because non-equality is returned from the first position that does not match.

 

Example from comparing your values if they are character:

 

data example;
   minref="12/29/2019";
   maxref="1/4/2020";
   input compare :$10.;
   result1= (minref le compare);
   result2= (compare le maxref);
datalines;
1/2/2020
1/8/2020
2/24/2020
3/16/2020
; 

The Minref value is not less than (or equal) to 1/2/2020 or 1/8/2020 because of the sort order involving / and numerals.

Any "date" used for any comparison or computation really really really needs to be an actual SAS date value. 'Alphanumeric' generally need not apply though order of YYYYMMDD would work, but that is not the appearance you showed.

 

Another option that would avoid merging two tables and performance issues would be to use that min/max value pair to make a custom FORMAT (again based on actual date values)

 

data cntlinset;
   input start :MMddyy10. end :mmddyy10. label ;
   fmtname = "Date2wknum";
datalines;
12/29/2019	1/4/2020	1
1/5/2020	1/11/2020	2
1/12/2020	1/18/2020	3
1/19/2020	1/25/2020	4
1/26/2020	2/1/2020	5
2/2/2020	2/8/2020	6
2/9/2020	2/15/2020	7
2/16/2020	2/22/2020	8
2/23/2020	2/29/2020	9
3/1/2020	3/7/2020	10
3/8/2020	3/14/2020	11
3/15/2020	3/21/2020	12
;

proc format cntlin=cntlinset;
run;

data example;
  input ID	Date :mmddyy10.;
  format date Date2wknum.;
  /* create text variable*/
  wktext=put(date,date2wknum.);
  
datalines;
1	1/2/2020
2	1/8/2020
3	2/24/2020
4	3/16/2020
;

Create the format needed ONE time, store in a permanent library, add the path to that format to the FMTSEARCH system setting and use any time you want.

There is very little overhead in Formats that might occur with SQL.

I show using that format directly with the date variable as an option as for many reporting or analysis tasks the group created by the format can be used in the procedures and you do not even need to add a variable to the data set. Custom formats may work with graphs depending on all the data and graphs involved.

 

ErinLM
Obsidian | Level 7

Wow! I appreciate the detail, but that is a very odd interpretation of what I said. The comparison is still the original date to the 2 reference dates. The difference is that the variable pulled is an alphanumeric code. In short, instead of returning a number for the week, I'm just trying to pull the assigned code for any record with a date between the reference dates. 

 

DateRef_minDateRef_maxWeekAssign
12/29/20191/4/202011ac345
1/5/20201/11/202023jw743
1/12/20201/18/202038sz938

 

Thank you for your input.

PaigeMiller
Diamond | Level 26

UNTESTED CODED

Assumes your dates are actual numeric SAS dates, and not character strings

 

proc sql;
    create table want as select a.id,a.date,b.week
    from file1 as a,file2 as b
    where b.dateref_min<=a.date<=b.dateref_max
    order by a.id;
quit;

 

If your real data has millions of records, this will probably either fail or take forever.

 

If you want tested code, please provide your data as working SAS data step code which you can type in yourself (or follow these instructions) and not as screen captures and not in any other format.

--
Paige Miller
ErinLM
Obsidian | Level 7
This works perfectly!
I apologize, my data is sensitive so I cannot provide it, hence the dummy data.

I knew I had seen some proc sql code that did this, but my googling wasn't bringing up any examples referencing 2 different files. I forgot about the option to do a. and b.

Thank you so much!
PaigeMiller
Diamond | Level 26

@ErinLM wrote:

I apologize, my data is sensitive so I cannot provide it, hence the dummy data.

Even dummy data should be provided in usable form, as SAS data step code.

--
Paige Miller

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 724 views
  • 2 likes
  • 3 in conversation