DATA Step, Macro, Functions and more

Excluding First Month of Data from Analysis

Accepted Solution Solved
Reply
Occasional Contributor ZDR
Occasional Contributor
Posts: 13
Accepted Solution

Excluding First Month of Data from Analysis

Hi all,

 

I'm struggling to figure out something which I suspect is very simple. I have a piece of code that compares the value of one variable to the value of that variable for a transaction in the previous month. If the criteria is met, a variable called flag1 is given a value of 1. Otherwise, it is 0. The problem I face is that obviously, for the first month of data, there is no previous month to compare it to.

 

The code works. However, the way I've coded it, the first month of data is always December 2015. So the value for Dec 2015 is ignored. But in practice, the first month of data may well be earlier, or later, than December 2015. So I need the code to dynamically search for the first month of data and exclude it. I have variables for Month (1,2,...12) and Year (2015, 2016).

 

My code is below:

 

 

data dataset;
set dataset;
by CH_ID temp_trans_date ;
if first.temp_trans_date then do;
if temp_trans_date > 20425 AND tr_sum > 1.15*(lag(tr_sum)) then flag1 = 1;
else flag1=0 ;
output;
end;
run;

 

Like I said it works, but instead of temp_trans_date > 20425 I need the logic to be along the lines of IF (not first month and year of data) AND...

 

 

 

Any help would be much appreciated. Thanks!


Accepted Solutions
Solution
‎10-07-2016 11:16 AM
Super User
Posts: 11,343

Re: Excluding First Month of Data from Analysis

One way to get the end of the month for the earliest data into a macro variable:

proc sql noprint;
   select intnx('month',min(date),0,'E') into: MinDate
   from dataset;
quit;

Then use in a comparison such as

 

 

if temp_trans_date > &MinDate. AND tr_sum > 1.15*(lag(tr_sum)) then flag1 = 1;

 

after the Proc Sql.

View solution in original post


All Replies
Solution
‎10-07-2016 11:16 AM
Super User
Posts: 11,343

Re: Excluding First Month of Data from Analysis

One way to get the end of the month for the earliest data into a macro variable:

proc sql noprint;
   select intnx('month',min(date),0,'E') into: MinDate
   from dataset;
quit;

Then use in a comparison such as

 

 

if temp_trans_date > &MinDate. AND tr_sum > 1.15*(lag(tr_sum)) then flag1 = 1;

 

after the Proc Sql.

Occasional Contributor ZDR
Occasional Contributor
Posts: 13

Re: Excluding First Month of Data from Analysis

Thanks ballardw. I will try this out and let you know if it works.

 

Just so I understand the SQL here - 

 

select intnx('month',min(date),0,'E') into: MinDate

 

- is 'month' the numeric month variable?  

- is min(date) a command? Or is date a variable name?

 

It would be good to understand what the code is doing so I can use it more comfortably in future.

Super User
Posts: 11,343

Re: Excluding First Month of Data from Analysis

The function INTNX is used to increment dates. The first parameter is what value is incremented. This may be a variable or a literal as in this case. Othe options are like 'year', 'week', and more complicated examples. Time variables get to add in hours, minutes and seconds.

 

The 0 says to increment 0 months but the last parameter is alignment. 'E' means end of period, 'B' would be beginning and 'S' means same.

So the function says "look at the loweds value of the date variable and find the end of the month that value is in".

 

The min(date), just in case that isn't clear, is telling SQL to find the lowest value (MIN aggregate function) of the specified variable. I used date to indicate any generic SAS date valued variable. It could have been temp_trans_date, datevisit, dateofbirth etc.

I may have been assuming that is what you meant by "first". If your definition of "first" is different then you'll need to be much more specific about a definition.

Respected Advisor
Posts: 4,173

Re: Excluding First Month of Data from Analysis

[ Edited ]

I'm a little bit concerned about your usage of a lag() function in an IF condition but as you write your code works as such I won't go further into details about this one.

 

The option @ballardw proposes will give you the earliest date in the whole of your data. In case you have for all CH_ID's the same earliest date OR you need the earliest date per CH_ID then one of the following two code options should do the job as well.

data dataset;
  set dataset;
  by CH_ID temp_trans_date;
  retain first_temp_trans_date;
  if _n_=1 then first_temp_trans_date=temp_trans_date;
  if first.temp_trans_date then
    do;
      if temp_trans_date > first_temp_trans_date AND tr_sum > 1.15*(lag(tr_sum)) then
        flag1 = 1;
      else flag1=0;
      output;
    end;
run;

data dataset;
  set dataset;
  by CH_ID temp_trans_date;
  retain first_temp_trans_date;
  if first.ch_id then first_temp_trans_date=temp_trans_date;
  if first.temp_trans_date then
    do;
      if temp_trans_date > first_temp_trans_date AND tr_sum > 1.15*(lag(tr_sum)) then
        flag1 = 1;
      else flag1=0;
      output;
    end;
run;
Occasional Contributor ZDR
Occasional Contributor
Posts: 13

Re: Excluding First Month of Data from Analysis

Hi Patrick,

 

 

Thanks - this is useful too. I will hold on to this code in the event that I need it. For now, ballardw's code works well.

 

 

Thank you both!

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 281 views
  • 0 likes
  • 3 in conversation