DATA Step, Macro, Functions and more

selecting data in the first 30 days since first date

Accepted Solution Solved
Reply
Super Contributor
Posts: 312
Accepted Solution

selecting data in the first 30 days since first date

[ Edited ]

I have 3 variables, ID, order_date, and order_amount for a year, and I want to get the first 30days data starting from the first order date for each ID.

 

proc sort data=year_data;
by ID order_date;
run;

data first_month;
set year_data;
by ID order_date; 
where day(order_date) - day(first.order_date) <=30;
run;

this does not work. There is a syntax error at day(first.order_date). What is wrong there?


Accepted Solutions
Solution
‎03-15-2016 12:27 PM
Super User
Posts: 5,255

Re: selecting data in the first 30 days since first date

You cannot use first. or last. variables within a where statement since where does the filtering in the input buffers. And that's prior to when the data hits the PDV (and that's where the first. variables are created).

If you wish to accomplish this in a single step look into SQL:
having order_date < min (order_date) +30
Data never sleeps

View solution in original post


All Replies
Solution
‎03-15-2016 12:27 PM
Super User
Posts: 5,255

Re: selecting data in the first 30 days since first date

You cannot use first. or last. variables within a where statement since where does the filtering in the input buffers. And that's prior to when the data hits the PDV (and that's where the first. variables are created).

If you wish to accomplish this in a single step look into SQL:
having order_date < min (order_date) +30
Data never sleeps
Super Contributor
Posts: 312

Re: selecting data in the first 30 days since first date

that works! thank you for teaching me!

Super User
Super User
Posts: 7,392

Re: selecting data in the first 30 days since first date

Well, you could do something like:

data first_month;
  set year_data;
  by id;
  retain first_date;
  if first.if then first_date=order_date;
  if first_date <= order_date <= intnx('month',first_date,1,'same') then output;
run;

Note I am assuming that date is a date variable not a datetime.  This will output only rows between first_date and first_date+1 month.t

Super Contributor
Posts: 312

Re: selecting data in the first 30 days since first date

Thank you, RW9. This requires both ID and order_date need to be sorted, right? But there is only id in the by statement.
Super User
Super User
Posts: 7,392

Re: selecting data in the first 30 days since first date

Yes, it reuiqres the data to be sorted, as per the proc sort in your first post.  The date does not need to be in the by line however, what I am doing is retaining the first observation for each id across the id, then using that to compare to the current observations date.

Super Contributor
Posts: 312

Re: selecting data in the first 30 days since first date

Thank you, RW9.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 271 views
  • 0 likes
  • 3 in conversation