BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
ralizadeh
Obsidian | Level 7

I have three columns named ID, Month, and Year. My goal is to create a new column called "new_month."

 

The values for the "new_month" column will be determined based on a prompt that asks for a specific date. I will provide a date in response to the prompt. The values in the Month column that fall between the month of the prompted date and two months following that month should be assigned to the last month.

 

For example, if the prompted date is 08/01/2021, any values in the Month column that fall between 8 and 10 should be assigned the value 10. Similarly, values between 9 and 11 should be assigned 11, and values between 10 and 12 should be assigned 12. These assignments will be made in the calculated new_month column.

 

It's worth noting that there will be duplicated values for IDs because a single month, such as 10, may belong to multiple categories (10, 11, and 12). This duplication is intentional as I need to count the number of IDs from two months prior to my measuring period. The measuring periods, in this case, are 10, 11, and 12.

 

Initially, I considered using a CASE statement with multiple WHEN-THEN conditions. However, I discovered that if the first condition evaluates to TRUE, SAS will ignore the subsequent conditions.

 

I am using SAS EG. So, please keep this in mind when commenting. I would appreciate any assistance you can provide in implementing this logic.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ralizadeh
Obsidian | Level 7

I successfully addressed my specific requirement by utilizing the query builder in three distinct steps. Initially, I divided the relevant date periods and created a separate table for each. Then, I applied the Case statement to the newly generated tables. Finally, I appended all three tables, allowing me to count occurrences for each specified month.

View solution in original post

7 REPLIES 7
ballardw
Super User

Example data

Example prompt value

Desired result for the given data and prompt.

 

It sounds like you are attempting to add observations with one set of examined values which tends to lead to a DATA step solution with a Do loop around an output statement for the matches and an output for those without a match. But that's just a guess.

 

Best to  provide example data in the form of working data step code so we know the names and types of variables.

 

Make sure that the prompt is considered as a SAS date value or you're going to be fighting an uphill battle.

ralizadeh
Obsidian | Level 7

Hi @ballardw . Here is an example of what I want to accomplish. I then want to group by month to count the IDs. I am doing it all in SAS EG and would like to know if any task in EG can do this.

 

ID is Character, Month and Year are Numeric.

 

 

Thanks

ralizadeh_1-1683756969808.png

 

Here is the prompt. The prompt will get two dates, start and end date.

 

ralizadeh_0-1683757098813.png

 

 

ballardw
Super User

Task, likely not.

A code node to make the data set.

You need to provide some "month" values that are not in ANY range to show what happens with those.

You also may need to provide an example that crosses the YEAR boundary as I suspect that your unstated  rule is going to involve the YEAR as well (if you haven't thought about it). {Consider getting a "new_month" of 1 from a date to 12 2021. Your output currently would show 1 2021 which is likely to be confusing if not completely wrong}

 

And since you are now saying that you have two prompts then you need to provide actual examples of the two prompt values. If the interval of interest is always two months the second prompt should not be needed as you can build that with the INTNX function and relying  in people to always enter the two correct values is problematic if the second is supposed to depend on the first. Like what limit have you place to prevent Prompt1= 01Aug2021 and Prompt2 from being 01MAR2021? (second less than first) or 01OCT2023 (2 years later than expected)?

 

I am having a hard time interpreting this "

The values in the Month column that fall between the month of the prompted date and two months following that month should be assigned to the last month.

 

For example, if the prompted date is 08/01/2021, any values in the Month column that fall between 8 and 10 should be assigned the value 10. Similarly, values between 9 and 11 should be assigned 11, and values between 10 and 12 should be assigned 12."

with why your example of id 9901879D with month=8 only has 10 in the output (how 10? rules because you are going to do this with multiple prompts or it isn't worth bothering with a prompt) and 92985473A with 9 only has 11 but 97412415A has 10, 11 and 12. The pattern from the first and second should mean that the later only has 12 in the output ( values between 10 and 12 should be assigned 12) not the shown 10, 11 and 12. Because the value of 9 is between both 8 and 10 and 9 and 11, so expect 2 output rows.

 

ralizadeh
Obsidian | Level 7

Let me rephrase my question. 

 

I have three columns named ID, Month, and Year. My goal is to create a new column called "new_month."

The values for the "new_month" column will be determined based on a prompt that will get two dates (as shown in the previous picture), start and end date. I will provide two dates in response to the prompt. the month of the start date will be the start point. The values in the Month column that fall between the month of the start date and month (start date)+2 should be assigned to month (start date) +2. If the values in the Month column falls between the month (start date) +1 and month (start date) +3 should be assigned to month (start date) +3. Last round would be, if the values in the Month column falls between the month (start date) +2 and month (start date) +4 should be assigned to month (start date) +4. None of the IF conditions should overwrite the pervious month assignment for a given row but rather create a new row with the new month assignment, if needed. There will be duplicated values for IDs because a single month, such as 10, may belong to multiple categories (10, 11, and 12). 10 is between 8 and 10, between 9 and 11 and also between 10 and 12. This duplication is intentional as I need to count the number of IDs from two months prior to my measuring period. The measuring periods, in this case, are 10, 11, and 12.

 

Example, if the prompted date start date is 08/01/2022, and end date is 12/31/2022 any values in the Month column that fall between 8 and 10 should be assigned the value 10. Similarly, values between 9 and 11 should be assigned 11, and values between 10 and 12 should be assigned 12. And it stops here.

 

ballardw
Super User

Try running this in code node and see if the resulting set makes sense given your rules.

 

data have;
   input ID $ Month Year;
datalines;
11111 10 2021
22222  9 2021
33333  8 2021
44444  7 2021
55555  1 2022
;

%let prompt='01AUG2021'd;
data want;
   /* your node would use your data set in place of have
      name the output what you perfer instead of WANT
   */
   set have;
   /* need a DATE to compare to a date prompt*/
   tempdate= mdy(month,1,year);
   if &prompt. le tempdate le intnx('month',&prompt.,2,'b') then do;
      new_month = month(intnx('month',&prompt.,2,'b'));
      output;
   end;
   if &prompt. le tempdate le intnx('month',&prompt.,3,'b') then do;
      new_month = month(intnx('month',&prompt.,3,'b'));
      output;
   end;
   if &prompt. le tempdate le intnx('month',&prompt.,4,'b') then do;
      new_month = month(intnx('month',&prompt.,4,'b'));
      output;
   end;
   else do;
     new_month=month;
     output;
   end;
   drop tempdate;
run;

The first data set is just a small example similar to yours but you can execute it. I can't execute pictures. And I am not going to type a bunch of long random (to me) meaningless. The prompt I assigned in that 'ddMONYYYY'd is how a prompt is used as I understand it. For your purpose you really need to restrict your prompts to the first date of the month.

MDY function creates a Date value given numeric month, day of month and Year values.

INTNX is used to increment date, time or datetime values by a given interval (day, week, month, year for example) by the number after the date/time/datetime value and align the result with the Begining, End or Same relative part of the interval.

 

If this doesn't match your need you need to walk through exactly what is not correct, and why not observation by observation.

 

Hint: for future questions provide example data as data step code or at least as text. We really can't work with pictures very well and having to re-type stuff is going to add to errors and confusion.

 

I still think that you need to address when your prompt interval crosses the year boundary.

AlanC
Barite | Level 11

You are getting good help on the logic right now. I just wanted to point you to the power of SELECT WHEN if you need it. Here is part of a complex example that did bucketing:

 

   select (code);
      when ('88')
	     do ;
			select (bucket) ;
			   when (2)  
                  do ;
                     priority = "10b" ; 
                     %OutputSplitData(89);
				  end; 
               otherwise   
                  output ;
            end; 
		 end;
      when ('56')
	     do ;
			select (bucket);
			   when (2)  
                  do ;
                     priority = "16" ; 
                     %OutputSplitData(65);
				  end; 
               otherwise 
                  output ; 
			end;
		 end;
https://github.com/savian-net
ralizadeh
Obsidian | Level 7

I successfully addressed my specific requirement by utilizing the query builder in three distinct steps. Initially, I divided the relevant date periods and created a separate table for each. Then, I applied the Case statement to the newly generated tables. Finally, I appended all three tables, allowing me to count occurrences for each specified month.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 921 views
  • 2 likes
  • 3 in conversation