- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Guys,
I would be hugely grateful, if you could please help me with the following issue:
For a particular year, I have daily series of companies,days and other variables. The dataset is sorted by companies. In other words, I have
Row Days Company Variable1 Variable2 Variable3
1 27/07/2000 Firm 1
2 28/07/2000 Firm 1
3 29/07/2000 Firm 1
4 30/07/2000 Firm 1
5 31/07/2000 Firm 1
6 27/08/2000 Firm 1
7 28/08/2000 Firm 1
8 29/08/2000 Firm 1
9 30/08/2000 Firm 1
10 31/08/2000 Firm 1
Then for exactly the same days, I have firm 2. My aim is to take only observations for 31/07/2000, 31/08/2000 for firm 1, then take observations that belong to end month dates (i.e. to 31/07/2000, 31/08/2000) for firm 2, then for firm 3 and so on.
THE QUESTION MENTIONED ABOVE IS ANSWERED, COULD YOU PLEASE ANSWER THE QUESTION BELOW:
For some months (except Feb), the end is not on 31st or 30th, but on 29th or 28th. For instance, for Jan 1998 in my sample, the end is on 30th of January and using your codes, I cannot extract 30th of January. SAS just skips January and moves to February or March (to months where the end is 30th or 31st). Any ideas how I can solve this problem?
I would hugely appreciate any help. Thank a lot for all replies regarding the first question. I found all your codes extremely useful.
Kind regards,
Ruslan
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is a two pass solution, but in one step. A 4GB file will process fine unless you need it back in 2 secs.
data have;
informat days ddmmyy10. company $8. ;format days date9.;
input Days Company ;
cards;
27/07/2000 Firm1
28/07/2000 Firm1
29/07/2000 Firm1
30/07/2000 Firm1
31/07/2000 Firm1
27/08/2000 Firm1
28/08/2000 Firm1
29/08/2000 Firm1
30/08/2000 Firm1
30/08/2000 Firm1
30/08/2000 Firm1
27/07/2000 Firm2
28/07/2000 Firm2
29/07/2000 Firm2
30/07/2000 Firm2
31/07/2000 Firm2
27/08/2000 Firm2
28/08/2000 Firm2
29/08/2000 Firm2
30/08/2000 Firm2
30/08/2000 Firm2
31/08/2000 Firm2
31/08/2000 Firm2
;
run;
proc sql;
create table want as
select company, days, month(days) as month
from have
group by company, calculated month
having days=max(days);
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have tried this, but it still skips January and goes to March. Could you please correct the code or suggest an alternative? Thanks in advance!!!
data want; /* take end of month for each company */
set have;
if date~=(intnx('month', date, 1)-1) then delete;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try the other example given:
data want;
set have;
if date ne intnx(‘month’,date,0,’e’) then delete;
run;
This will give you the last day in the month
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This works for me:
data have;
attrib date format=date9.;
date='05JAN2014'd;
output;
date='31JAN2014'd;
output;
run;
data want;
set have;
if date ne intnx('month',date,0,'e') then delete;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try to do:
data have;
attrib date format=date9.;
date='05JAN2014'd;
output;
date='30JAN2014'd;
output;
run;
and it will not work!!! But I need to get 30 Jan
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Is this a problem of interpretation as much as of sas code!
Are we to create a table for each company with the lastest date in each month?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, you are exactly right. I need to create a table for each company with the latest date in each month.
The problem is that these days are trading days and thus date series is with gaps. So the last day in a month could be 27th or 26th as well as the start date in a month could be 2nd,4th. I do not know how to adjust intnx function to be able to cope with gaps.
Could you please help me with this issue?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Assuming data sorted by company, days
data v/view=v;
Set have;
Month=intnx('month', days,0, 'e');
Run;
Data want;
Set v;
By company month days;
If last.month ;
Run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks a lot, Peter.C, for your code. That worked perfectly for selecting end month dates
However, it seems like this code has taken only one observation for end month dates, but I have more than one observation for Jan 30. How I can take all of them? (not just one observation per end month date)
Look forward to hearing from you soon.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sorry this couldn't be sooner.
since your data are 4GB you might like this solution which minimizes the impact of visiting the data twice. As you will expect the first pass identifies the last day of the month for the company and the second collects the rows which match. The performance benefit of this method is that all data would be in cache at the point where the second pass follows, because it is data step.
the SET statement refers to the data (V) twice. However the BY statement keeps the data flowing in order. All the rows within company and month on the A pass finish with the last date in the month and pass through before the B rows.
Data v/view=v;
Set have ;
Month= intnx( 'month', 0, 'e') ;
Run ;
Data want ;
Set v( in= A) v( in= B ) ;
By company month ;
If a then lastday = days;
if b ;
If days= lastday ;
Retain lastday ;
Drop lastday;
Run;
Hopefully that will perform faster than the SQL method.
peterC
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That works great and, most importantly, is faster than SQL. Thus, thanks a lot, Peter.C, for suggesting other way of extracting month end dates.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have used that code originally and the results are the same. Reeza has suggested this code.
Then, RichardinOz has suggested another way of extracting the last day of month:
data want;
set have;
where Day(days + 1) = 1 ;
run ;
Both ways give exactly the same results. SAS skips January where the end in my sample is 30th and goes to March 31st. It even skips February because the end day of Feb in my sample is 26th. Could you please propose a solution?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think the problem is that you don't want the month-end date, you want the last date in each month that has data in it.
Firstly find the month and year of each record then select the highest value of date within month / year. This gives you the
last data date of each month.
Then merge this back to your data and keep those records that have matching dates.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, I have been thinking about merging as well.
But I have big datasets (approx. 4GB) and I think merging will take too much time (either using proc sql or merge). Thus, I've been planning to find the last date in each month that data in it (you are absolutely right!!!) just on-one-go, at once. Is there a direct way to do this avoiding merging?
Look forward to your answers.