Hi:
I left REPORT_DATE in my code only for teaching purposes. I wanted to show you that LAST.REPORT_DATE was inappropriate compared to LAST.R_YEAR -- in order to illustrate that point, REPORT_DATE had to be in my BY statement.
My code was not proposed as the "solution" to your issue. My code was meant to show you that LAST.REPORT_DATE was inappropriate as a condition to get you the desired rows. If I had been solving your problem, I would have written a program that used only LAST.R_YEAR without LAST.REPORT_DATE. I would NOT have captured every single FIRST. and LAST. variable.
It seems to me that your fundamental problem lies with this statement/belief where you say:
- if SAS can distinguish between march vs may in 30/3/2000 and 30/5/2000, why the code using rep_date in last.rep_date not work? For the same fund id 2000004 and the same firm id 113563, we have different dates (different in months), so logically SAS can sort from oldest dates on top and latest dates on bottom for each of the securities and for each of the firms.
OK, let's go back to fundamentals. SAS dates. What are they. SAS handles dates as an offset from January 1, 1960. What does that mean? January 1, 1960 is internally stored as 0. 15Nov1950 was 3334 days BEFORE January 1, 1960 -- so 15Nov1950 is internally stored as -3334 in a SAS dataset. As you can see, internally, the number is just a number - -it only becomes meaningful as a date when you
1) use it with a SAS date function that knows what the number represents or
2) display it with a SAS date format.
But, internally, your date is stored as a single number, no matter how you read it in, no matter what format you use. That single number is what is used in comparisons and conditions. If you need a date-related piece out of that single number, you must use a SAS function to get that piece.
Consider THIS program and data:
[pre]
data what_are_dates;
infile datalines;
input name $ bday_orig : mmddyy10.;
bday_unfmt = bday_orig;
bday_diff = bday_orig;
** Extract some date information from the numeric variable;
mm = month(bday_orig);
dd = day(bday_orig);
year = year(bday_orig);
week = week(bday_orig);
quarter = qtr(bday_orig);
putlog ' *** *** *** *** *** *** *** ';
putlog _all_;
putlog name= bday_unfmt= bday_orig= mmddyy10. bday_diff= date9. mm= dd= year= week= quarter=;
return;
datalines;
alan 11/15/1950
barb 03/30/2000
carl 05/30/2000
dana 07/31/2001
edna 11/15/2010
;
run;
[/pre]
Note that the dates, in my input data are understandable as regular date values. But when I read the dates in my program with the mmddyy10. informat, I am instructing SAS to convert the dates from their conventional date representation into the internal number that represents that date's "distance" from January 1, 1960.
Here's the LOG from the above program:
[pre]
2041 data what_are_dates;
2042 infile datalines;
2043 input name $ bday_orig : mmddyy10.;
2044 bday_unfmt = bday_orig;
2045 bday_diff = bday_orig;
2046 mm = month(bday_orig);
2047 dd = day(bday_orig);
2048 year = year(bday_orig);
2049 week = week(bday_orig);
2050 quarter = qtr(bday_orig);
2051 putlog ' *** *** *** *** *** *** *** ';
2052 putlog _all_;
2053 putlog name= bday_unfmt= bday_orig= mmddyy10. bday_diff= date9. mm= dd= year= week= quarter=;
2054 return;
2055 datalines;
*** *** *** *** *** *** ***
name=alan bday_orig=-3334 bday_unfmt=-3334 bday_diff=-3334 mm=11 dd=15 year=1950 week=46 quarter=4 _ERROR_=0 _N_=1
name=alan bday_unfmt=-3334 bday_orig=11/15/1950 bday_diff=15NOV1950 mm=11 dd=15 year=1950 week=46 quarter=4
*** *** *** *** *** *** ***
name=barb bday_orig=14699 bday_unfmt=14699 bday_diff=14699 mm=3 dd=30 year=2000 week=13 quarter=1 _ERROR_=0 _N_=2
name=barb bday_unfmt=14699 bday_orig=03/30/2000 bday_diff=30MAR2000 mm=3 dd=30 year=2000 week=13 quarter=1
*** *** *** *** *** *** ***
name=carl bday_orig=14760 bday_unfmt=14760 bday_diff=14760 mm=5 dd=30 year=2000 week=22 quarter=2 _ERROR_=0 _N_=3
name=carl bday_unfmt=14760 bday_orig=05/30/2000 bday_diff=30MAY2000 mm=5 dd=30 year=2000 week=22 quarter=2
*** *** *** *** *** *** ***
name=dana bday_orig=15187 bday_unfmt=15187 bday_diff=15187 mm=7 dd=31 year=2001 week=30 quarter=3 _ERROR_=0 _N_=4
name=dana bday_unfmt=15187 bday_orig=07/31/2001 bday_diff=31JUL2001 mm=7 dd=31 year=2001 week=30 quarter=3
*** *** *** *** *** *** ***
name=edna bday_orig=18581 bday_unfmt=18581 bday_diff=18581 mm=11 dd=15 year=2010 week=46 quarter=4 _ERROR_=0 _N_=5
name=edna bday_unfmt=18581 bday_orig=11/15/2010 bday_diff=15NOV2010 mm=11 dd=15 year=2010 week=46 quarter=4
NOTE: The data set WORK.WHAT_ARE_DATES has 5 observations and 9 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
[/pre]
From the PUTLOG statements, you can see that 30Mar2000 is internally stored as the single number 14699 and that 30May2000 is internally stored as the single number 14760. That's how SAS tells the 2 dates apart. But, when the number is stored as 14699 or 14760, SAS knows that it is a date, but a LAST.BDAY condition would be set to 1 for every date (if I used LAST.BDAY in the program, which I did NOT).
Note how the use of the date-related functions MONTH, DAY, YEAR, WEEK and QTR resulted in the extraction of "date" pieces from the program.
A PROC PRINT on the dataset can use different formats or no formats to display the internally stored number, but no matter what the number looks like in the PROC PRINT output -- all the date fields are INTERNALLY stored as a single number:
Code:
[pre]
proc print data=what_are_dates;
title 'Show formatted values for some date fields';
format bday_orig weekdate. bday_diff date9.;
run;
[/pre]
Output:
[pre]
Show formatted values for some date fields
bday_
Obs name bday_orig unfmt bday_diff mm dd year week quarter
1 alan Wednesday, November 15, 1950 -3334 15NOV1950 11 15 1950 46 4
2 barb Thursday, March 30, 2000 14699 30MAR2000 3 30 2000 13 1
3 carl Tuesday, May 30, 2000 14760 30MAY2000 5 30 2000 22 2
4 dana Tuesday, July 31, 2001 15187 31JUL2001 7 31 2001 30 3
5 edna Monday, November 15, 2010 18581 15NOV2010 11 15 2010 46 4
[/pre]
Now, let's look at BY group processing on the above data First we have to sort:
[pre]
proc sort data=what_are_dates;
by descending year descending bday_orig ;
run;
proc print data=what_are_dates;
title 'After sorting -- use different formats for dates';
** null FORMAT stmt removes all formats for display purposes;
format bday_orig ddmmyy10. bday_diff date9.;
run;
[/pre]
The newly sorted data (in descending order from highest to lowest) is:
[pre]
After sorting -- use different formats for dates
bday_
Obs name bday_orig unfmt bday_diff mm dd year week quarter
1 edna 15/11/2010 18581 15NOV2010 11 15 2010 46 4
2 dana 31/07/2001 15187 31JUL2001 7 31 2001 30 3
3 carl 30/05/2000 14760 30MAY2000 5 30 2000 22 2
4 barb 30/03/2000 14699 30MAR2000 3 30 2000 13 1
5 alan 15/11/1950 -3334 15NOV1950 11 15 1950 46 4
[/pre]
OK..now we'll use FIRST. and LAST. with the above sorted data in this program:
[pre]
2202
2203 data ck_first_last;
2204 set what_are_dates;
2205 by descending year descending bday_orig;
2206 first_bd = first.bday_orig;
2207 last_bd = last.bday_orig;
2208 first_yr = first.year;
2209 last_yr = last.year;
2210
2211 putlog _n_= name= bday_orig= first_bd= last_bd= year= first_yr= last_yr=;
2212 run;
_N_=1 name=edna bday_orig=18581 first_bd=1 last_bd=1 year=2010 first_yr=1 last_yr=1
_N_=2 name=dana bday_orig=15187 first_bd=1 last_bd=1 year=2001 first_yr=1 last_yr=1
_N_=3 name=carl bday_orig=14760 first_bd=1 last_bd=1 year=2000 first_yr=1 last_yr=0
_N_=4 name=barb bday_orig=14699 first_bd=1 last_bd=1 year=2000 first_yr=0 last_yr=1
_N_=5 name=alan bday_orig=-3334 first_bd=1 last_bd=1 year=1950 first_yr=1 last_yr=1
NOTE: There were 5 observations read from the data set WORK.WHAT_ARE_DATES.
NOTE: The data set WORK.CK_FIRST_LAST has 5 observations and 13 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
[/pre]
In the above program, BDAY_ORIG -- is showing the internally stored number for birthdate and the automatic variable LAST.BDAY_ORIG (captured into the LAST_BD variable) is set to 1 on EVERY row.
So that is a LONG answer to your question. The short answer to your question
- if SAS can distinguish between march vs may in 30/3/2000 and 30/5/2000, why the code using rep_date in last.rep_date not work?
is that SAS ONLY distinguishes between 14699 and 14760 internally -- it is your format that gives "date-meaning" to the numbers. When you used the YEAR function, you told SAS to figure out the date for the internally stored number and to extract the 4 digit year from the date into a separate variable.
cynthia