07-04-2014 09:31 AM
I would be hugely grateful, if you could please help me with the following issue:
For a particular year, I have daily series of companies,days and other variables. The dataset is sorted by companies. In other words, I have
Row Days Company Variable1 Variable2 Variable3
1 27/07/2000 Firm 1
2 28/07/2000 Firm 1
3 29/07/2000 Firm 1
4 30/07/2000 Firm 1
5 31/07/2000 Firm 1
6 27/08/2000 Firm 1
7 28/08/2000 Firm 1
8 29/08/2000 Firm 1
9 30/08/2000 Firm 1
10 31/08/2000 Firm 1
Then for exactly the same days, I have firm 2. My aim is to take only observations for 31/07/2000, 31/08/2000 for firm 1, then take observations that belong to end month dates (i.e. to 31/07/2000, 31/08/2000) for firm 2, then for firm 3 and so on.
THE QUESTION MENTIONED ABOVE IS ANSWERED, COULD YOU PLEASE ANSWER THE QUESTION BELOW:
For some months (except Feb), the end is not on 31st or 30th, but on 29th or 28th. For instance, for Jan 1998 in my sample, the end is on 30th of January and using your codes, I cannot extract 30th of January. SAS just skips January and moves to February or March (to months where the end is 30th or 31st). Any ideas how I can solve this problem?
I would hugely appreciate any help. Thank a lot for all replies regarding the first question. I found all your codes extremely useful.
07-04-2014 02:06 PM
This is a two pass solution, but in one step. A 4GB file will process fine unless you need it back in 2 secs.
informat days ddmmyy10. company $8. ;format days date9.;
input Days Company ;
create table want as
select company, days, month(days) as month
group by company, calculated month
07-04-2014 09:51 AM
07-04-2014 10:11 AM
I have tried this, but it still skips January and goes to March. Could you please correct the code or suggest an alternative? Thanks in advance!!!
data want; /* take end of month for each company */
if date~=(intnx('month', date, 1)-1) then delete;
07-04-2014 10:22 AM
Try the other example given:
if date ne intnx(‘month’,date,0,’e’) then delete;
This will give you the last day in the month
07-04-2014 10:25 AM
This works for me:
attrib date format=date9.;
if date ne intnx('month',date,0,'e') then delete;
07-04-2014 10:32 AM
Try to do:
attrib date format=date9.;
and it will not work!!! But I need to get 30 Jan
07-04-2014 11:28 AM
Yes, you are exactly right. I need to create a table for each company with the latest date in each month.
The problem is that these days are trading days and thus date series is with gaps. So the last day in a month could be 27th or 26th as well as the start date in a month could be 2nd,4th. I do not know how to adjust intnx function to be able to cope with gaps.
Could you please help me with this issue?
07-04-2014 11:33 AM
Assuming data sorted by company, days
Month=intnx('month', days,0, 'e');
By company month days;
If last.month ;
07-04-2014 12:14 PM
Thanks a lot, Peter.C, for your code. That worked perfectly for selecting end month dates
However, it seems like this code has taken only one observation for end month dates, but I have more than one observation for Jan 30. How I can take all of them? (not just one observation per end month date)
Look forward to hearing from you soon.
07-04-2014 05:49 PM
Sorry this couldn't be sooner.
since your data are 4GB you might like this solution which minimizes the impact of visiting the data twice. As you will expect the first pass identifies the last day of the month for the company and the second collects the rows which match. The performance benefit of this method is that all data would be in cache at the point where the second pass follows, because it is data step.
the SET statement refers to the data (V) twice. However the BY statement keeps the data flowing in order. All the rows within company and month on the A pass finish with the last date in the month and pass through before the B rows.
Set have ;
Month= intnx( 'month', 0, 'e') ;
Data want ;
Set v( in= A) v( in= B ) ;
By company month ;
If a then lastday = days;
if b ;
If days= lastday ;
Retain lastday ;
Hopefully that will perform faster than the SQL method.
07-04-2014 06:56 PM
That works great and, most importantly, is faster than SQL. Thus, thanks a lot, Peter.C, for suggesting other way of extracting month end dates.
07-04-2014 10:27 AM
I have used that code originally and the results are the same. Reeza has suggested this code.
Then, RichardinOz has suggested another way of extracting the last day of month:
where Day(days + 1) = 1 ;
Both ways give exactly the same results. SAS skips January where the end in my sample is 30th and goes to March 31st. It even skips February because the end day of Feb in my sample is 26th. Could you please propose a solution?
07-04-2014 12:42 PM
I think the problem is that you don't want the month-end date, you want the last date in each month that has data in it.
Firstly find the month and year of each record then select the highest value of date within month / year. This gives you the
last data date of each month.
Then merge this back to your data and keep those records that have matching dates.
07-04-2014 12:52 PM
Yes, I have been thinking about merging as well.
But I have big datasets (approx. 4GB) and I think merging will take too much time (either using proc sql or merge). Thus, I've been planning to find the last date in each month that data in it (you are absolutely right!!!) just on-one-go, at once. Is there a direct way to do this avoiding merging?
Look forward to your answers.
Need further help from the community? Please ask a new question.