BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ruslan
Calcite | Level 5

Guys,

I would be hugely grateful, if you could please help me with the following issue:

For a particular year, I have daily series of companies,days and other variables. The dataset is sorted by companies. In other words, I have

Row     Days         Company  Variable1 Variable2 Variable3

1      27/07/2000       Firm 1

2      28/07/2000       Firm 1

3      29/07/2000       Firm 1

4      30/07/2000       Firm 1

5      31/07/2000       Firm 1

6      27/08/2000       Firm 1

7      28/08/2000       Firm 1

8      29/08/2000       Firm 1

9      30/08/2000       Firm 1

10    31/08/2000       Firm 1

Then for exactly the same days, I have firm 2. My aim is to take only observations for 31/07/2000, 31/08/2000 for firm 1, then take observations that belong to end month dates (i.e. to 31/07/2000, 31/08/2000) for firm 2, then for firm 3 and so on.

THE QUESTION MENTIONED ABOVE IS ANSWERED, COULD YOU PLEASE ANSWER THE QUESTION BELOW:

For some months (except Feb), the end is not on 31st or 30th, but on 29th or 28th. For instance, for Jan 1998 in my sample, the end is on 30th of January and using your codes, I cannot extract 30th of January. SAS just skips January and moves to February or March (to months where the end is 30th or 31st). Any ideas how I can solve this problem?

I would hugely appreciate any help. Thank a lot for all replies regarding the first question. I found all your codes extremely useful.

Kind regards,

Ruslan

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

This is a two pass solution, but in one step. A 4GB file will process fine unless you need it back in 2 secs. 

data have;

informat days ddmmyy10. company $8. ;format days date9.;

input  Days         Company ;

cards;

27/07/2000       Firm1

28/07/2000       Firm1

29/07/2000       Firm1

30/07/2000       Firm1

31/07/2000       Firm1

27/08/2000       Firm1

28/08/2000       Firm1

29/08/2000       Firm1

30/08/2000       Firm1

30/08/2000       Firm1

30/08/2000       Firm1

27/07/2000       Firm2

28/07/2000       Firm2

29/07/2000       Firm2

30/07/2000       Firm2

31/07/2000       Firm2

27/08/2000       Firm2

28/08/2000       Firm2

29/08/2000       Firm2

30/08/2000       Firm2

30/08/2000       Firm2

31/08/2000       Firm2

31/08/2000       Firm2

;

run;

proc sql;

  create table want as

  select company, days, month(days) as month

  from have

  group by company, calculated month

  having days=max(days);

quit;

View solution in original post

21 REPLIES 21
Ruslan
Calcite | Level 5

I have tried this, but it still skips January and goes to March. Could you please correct the code or suggest an alternative? Thanks in advance!!!


data want; /* take end of month for each company */

set have;

if date~=(intnx('month', date, 1)-1) then delete;

run;

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Try the other example given:

data want;

     set have;

     if date ne intnx(‘month’,date,0,’e’) then delete;

run;

This will give you the last day in the month

RW9
Diamond | Level 26 RW9
Diamond | Level 26

This works for me:

data have;

  attrib date format=date9.;

  date='05JAN2014'd;

  output;

  date='31JAN2014'd;

  output;

run;

data want;

     set have;

     if date ne intnx('month',date,0,'e') then delete;

run;

Ruslan
Calcite | Level 5

Try to do:

data have;

  attrib date format=date9.;

  date='05JAN2014'd;

  output;

  date='30JAN2014'd;

  output;

run;

and it will not work!!! But I need to get 30 Jan

Peter_C
Rhodochrosite | Level 12

Is this a problem of interpretation as much as of sas code!

Are we to create a table for each company with the lastest date in each month?

Ruslan
Calcite | Level 5

Yes, you are exactly right. I need to create a table for each company with the latest date in each month.

The problem is that these days are trading days and thus date series is with gaps. So the last day in a month could be 27th or 26th as well as the start date in a month could be 2nd,4th. I do not know how to adjust intnx function to be able to cope with gaps.

Could you please help me with this issue?

Peter_C
Rhodochrosite | Level 12

Assuming data sorted by company, days

data v/view=v;

Set have;

Month=intnx('month', days,0, 'e');

Run;

Data want;

Set v;

By company month days;

If last.month ;

Run;

Ruslan
Calcite | Level 5

Thanks a lot, Peter.C, for your code. That worked perfectly for selecting end month dates

However, it seems like this code has taken only one observation for end month dates, but I have more than one observation for Jan 30. How I can take all of them? (not just one observation per end month date)

Look forward to hearing from you soon.

Peter_C
Rhodochrosite | Level 12

Sorry this couldn't be sooner.

since your data are 4GB you might like this solution which minimizes the impact of  visiting the data twice. As you will expect the first pass identifies the last day of the month for the company and the second collects the rows which match. The performance benefit of this method is that all data would be in cache at the point where the second pass follows, because it is data step.

the SET statement refers to the data (V) twice. However the BY statement keeps the data flowing in order. All the rows within company and month on the A pass finish with the last date in the month and pass through before the B rows.

Data v/view=v;

Set have ;

Month= intnx( 'month', 0, 'e') ;

Run ;

Data want ;

Set v( in= A)  v( in= B ) ;

By company month ;

If a then lastday = days;

if b ;

If days= lastday ;

Retain lastday ;

Drop   lastday;

Run;

Hopefully that will perform faster than the SQL method.

peterC

Ruslan
Calcite | Level 5

That works great and, most importantly, is faster than SQL. Thus, thanks a lot, Peter.C, for suggesting other way of extracting month end dates.

Ruslan
Calcite | Level 5

I have used that code originally and the results are the same. Reeza has suggested this code.

Then, RichardinOz has suggested another way of extracting the last day of month:

data want;

     set have;

     where Day(days + 1) = 1 ;

run ;

Both ways give exactly the same results. SAS skips January where the end in my sample is 30th and goes to March 31st. It even skips February because the end day of Feb in my sample is 26th. Could you please propose a solution?

OS2Rules
Obsidian | Level 7

I think the problem is that you don't want the month-end date, you want the last date in each month that has data in it.

Firstly  find the month and year of each record then select the highest value of date within month / year.  This gives you the

last data date of each month.

Then merge this back to your data and keep those records that have matching dates.

Ruslan
Calcite | Level 5

Yes, I have been thinking about merging as well.

But I have big datasets (approx. 4GB) and I think merging will take too much time (either using proc sql or merge). Thus, I've been planning to find the last date in each month that data in it (you are absolutely right!!!) just on-one-go, at once. Is there a direct way to do this avoiding merging?

Look forward to your answers.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 21 replies
  • 5296 views
  • 7 likes
  • 5 in conversation