<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: A Proc Sort Mystery. Please help! in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/A-Proc-Sort-Mystery-Please-help/m-p/43877#M11518</link>
    <description>Hi:&lt;BR /&gt;
  I left REPORT_DATE in my code only for teaching purposes. I wanted to show you that LAST.REPORT_DATE was inappropriate compared to LAST.R_YEAR -- in order to illustrate that point, REPORT_DATE had to be in my BY statement.&lt;BR /&gt;
 &lt;BR /&gt;
  My code was not proposed as the "solution" to your issue. My code was meant to show you that LAST.REPORT_DATE was inappropriate as a condition to get you the desired rows. If I had been solving your problem, I would have written a program that used only LAST.R_YEAR without LAST.REPORT_DATE. I would NOT have captured every single FIRST. and LAST. variable. &lt;BR /&gt;
 &lt;BR /&gt;
  It seems to me that your fundamental problem lies with this statement/belief where you say:&lt;BR /&gt;
&lt;B&gt;&lt;BR /&gt;
- if SAS can distinguish between march vs may in 30/3/2000 and 30/5/2000, why the code using rep_date in last.rep_date not work? For the same fund id 2000004 and the same firm id 113563, we have different dates (different in months), so logically SAS can sort from oldest dates on top and latest dates on bottom for each of the securities and for each of the firms.&lt;BR /&gt;
&lt;/B&gt;&lt;BR /&gt;
&lt;BR /&gt;
 &lt;BR /&gt;
OK, let's go back to fundamentals. SAS dates. What are they. SAS handles dates as an offset from January 1, 1960. What does that mean? January 1, 1960 is internally stored as 0. 15Nov1950 was 3334 days BEFORE January 1, 1960 -- so 15Nov1950 is internally stored as -3334 in a SAS dataset. As you can see, internally, the number is just a number - -it only becomes meaningful as a date when you &lt;BR /&gt;
1) use it with a SAS date function that knows what the number represents or&lt;BR /&gt;
2) display it with a SAS date format.&lt;BR /&gt;
 &lt;BR /&gt;
But, internally, your date is stored as a single number, no matter how you read it in, no matter what format you use. That single number is what is used in comparisons and conditions. If you need a date-related piece out of that single number, you must use a SAS function to get that piece.&lt;BR /&gt;
&lt;BR /&gt;
Consider THIS program and data:&lt;BR /&gt;
[pre]&lt;BR /&gt;
data what_are_dates;&lt;BR /&gt;
   infile datalines;&lt;BR /&gt;
   input name $ bday_orig : mmddyy10.;&lt;BR /&gt;
                        &lt;BR /&gt;
   bday_unfmt = bday_orig;&lt;BR /&gt;
   bday_diff = bday_orig;&lt;BR /&gt;
        &lt;BR /&gt;
   ** Extract some date information from the numeric variable;&lt;BR /&gt;
   mm = month(bday_orig);&lt;BR /&gt;
   dd = day(bday_orig);&lt;BR /&gt;
   year = year(bday_orig);&lt;BR /&gt;
   week = week(bday_orig);&lt;BR /&gt;
   quarter = qtr(bday_orig);&lt;BR /&gt;
                    &lt;BR /&gt;
   putlog '   ***   ***   ***   ***   ***   ***   ***   ';&lt;BR /&gt;
   putlog _all_;&lt;BR /&gt;
   putlog name= bday_unfmt= bday_orig= mmddyy10. bday_diff= date9. mm= dd= year= week= quarter=;&lt;BR /&gt;
return;&lt;BR /&gt;
datalines;&lt;BR /&gt;
alan 11/15/1950&lt;BR /&gt;
barb 03/30/2000&lt;BR /&gt;
carl 05/30/2000&lt;BR /&gt;
dana 07/31/2001&lt;BR /&gt;
edna 11/15/2010&lt;BR /&gt;
;&lt;BR /&gt;
run;&lt;BR /&gt;
[/pre]&lt;BR /&gt;
                             &lt;BR /&gt;
Note that the dates, in my input data are understandable as regular date values. But when I read the dates in my program with the mmddyy10. informat, I am instructing SAS to convert the dates from their conventional date representation into the internal number that represents that date's "distance" from January 1, 1960.&lt;BR /&gt;
 &lt;BR /&gt;
Here's the LOG from the above program:&lt;BR /&gt;
[pre]&lt;BR /&gt;
2041  data what_are_dates;&lt;BR /&gt;
2042     infile datalines;&lt;BR /&gt;
2043     input name $ bday_orig : mmddyy10.;&lt;BR /&gt;
2044     bday_unfmt = bday_orig;&lt;BR /&gt;
2045     bday_diff = bday_orig;&lt;BR /&gt;
2046     mm = month(bday_orig);&lt;BR /&gt;
2047     dd = day(bday_orig);&lt;BR /&gt;
2048     year = year(bday_orig);&lt;BR /&gt;
2049     week = week(bday_orig);&lt;BR /&gt;
2050     quarter = qtr(bday_orig);&lt;BR /&gt;
2051     putlog '   ***   ***   ***   ***   ***   ***   ***   ';&lt;BR /&gt;
2052     putlog _all_;&lt;BR /&gt;
2053     putlog name= bday_unfmt= bday_orig= mmddyy10. bday_diff= date9. mm= dd= year= week= quarter=;&lt;BR /&gt;
2054  return;&lt;BR /&gt;
2055  datalines;&lt;BR /&gt;
                       &lt;BR /&gt;
   ***   ***   ***   ***   ***   ***   ***&lt;BR /&gt;
name=alan bday_orig=-3334 bday_unfmt=-3334 bday_diff=-3334 mm=11 dd=15 year=1950 week=46 quarter=4 _ERROR_=0 _N_=1&lt;BR /&gt;
name=alan bday_unfmt=-3334 bday_orig=11/15/1950 bday_diff=15NOV1950 mm=11 dd=15 year=1950 week=46 quarter=4&lt;BR /&gt;
   ***   ***   ***   ***   ***   ***   ***&lt;BR /&gt;
name=barb bday_orig=14699 bday_unfmt=14699 bday_diff=14699 mm=3 dd=30 year=2000 week=13 quarter=1 _ERROR_=0 _N_=2&lt;BR /&gt;
name=barb bday_unfmt=14699 bday_orig=03/30/2000 bday_diff=30MAR2000 mm=3 dd=30 year=2000 week=13 quarter=1&lt;BR /&gt;
   ***   ***   ***   ***   ***   ***   ***&lt;BR /&gt;
name=carl bday_orig=14760 bday_unfmt=14760 bday_diff=14760 mm=5 dd=30 year=2000 week=22 quarter=2 _ERROR_=0 _N_=3&lt;BR /&gt;
name=carl bday_unfmt=14760 bday_orig=05/30/2000 bday_diff=30MAY2000 mm=5 dd=30 year=2000 week=22 quarter=2&lt;BR /&gt;
   ***   ***   ***   ***   ***   ***   ***&lt;BR /&gt;
name=dana bday_orig=15187 bday_unfmt=15187 bday_diff=15187 mm=7 dd=31 year=2001 week=30 quarter=3 _ERROR_=0 _N_=4&lt;BR /&gt;
name=dana bday_unfmt=15187 bday_orig=07/31/2001 bday_diff=31JUL2001 mm=7 dd=31 year=2001 week=30 quarter=3&lt;BR /&gt;
   ***   ***   ***   ***   ***   ***   ***&lt;BR /&gt;
name=edna bday_orig=18581 bday_unfmt=18581 bday_diff=18581 mm=11 dd=15 year=2010 week=46 quarter=4 _ERROR_=0 _N_=5&lt;BR /&gt;
name=edna bday_unfmt=18581 bday_orig=11/15/2010 bday_diff=15NOV2010 mm=11 dd=15 year=2010 week=46 quarter=4&lt;BR /&gt;
NOTE: The data set WORK.WHAT_ARE_DATES has 5 observations and 9 variables.&lt;BR /&gt;
NOTE: DATA statement used (Total process time):&lt;BR /&gt;
      real time           0.01 seconds&lt;BR /&gt;
      cpu time            0.01 seconds&lt;BR /&gt;
[/pre]&lt;BR /&gt;
 &lt;BR /&gt;
From the PUTLOG statements, you can see that 30Mar2000 is internally stored as the single number 14699 and that 30May2000 is internally stored as the single number 14760. That's how SAS tells the 2 dates apart. But, when the number is stored as 14699 or 14760, SAS knows that it is a date, but a LAST.BDAY condition would be set to 1 for every date (if I used LAST.BDAY in the program, which I did NOT).&lt;BR /&gt;
 &lt;BR /&gt;
Note how the use of the date-related functions MONTH, DAY, YEAR, WEEK and QTR resulted in the extraction of "date" pieces from the program.&lt;BR /&gt;
 &lt;BR /&gt;
A PROC PRINT on the dataset can use different formats or no formats to display the internally stored number, but no matter what the number looks like in the PROC PRINT output -- all the date fields are INTERNALLY stored as a single number:&lt;BR /&gt;
Code:&lt;BR /&gt;
[pre]&lt;BR /&gt;
proc print data=what_are_dates;&lt;BR /&gt;
  title 'Show formatted values for some date fields';&lt;BR /&gt;
  format bday_orig weekdate. bday_diff date9.;&lt;BR /&gt;
run;&lt;BR /&gt;
[/pre]&lt;BR /&gt;
               &lt;BR /&gt;
Output:&lt;BR /&gt;
[pre]&lt;BR /&gt;
Show formatted values for some date fields&lt;BR /&gt;
                                       &lt;BR /&gt;
                                                bday_&lt;BR /&gt;
Obs    name             bday_orig               unfmt    bday_diff    mm    dd    year    week    quarter&lt;BR /&gt;
                &lt;BR /&gt;
 1     alan    Wednesday, November 15, 1950     -3334    15NOV1950    11    15    1950     46        4&lt;BR /&gt;
 2     barb        Thursday, March 30, 2000     14699    30MAR2000     3    30    2000     13        1&lt;BR /&gt;
 3     carl           Tuesday, May 30, 2000     14760    30MAY2000     5    30    2000     22        2&lt;BR /&gt;
 4     dana          Tuesday, July 31, 2001     15187    31JUL2001     7    31    2001     30        3&lt;BR /&gt;
 5     edna       Monday, November 15, 2010     18581    15NOV2010    11    15    2010     46        4&lt;BR /&gt;
[/pre]&lt;BR /&gt;
 &lt;BR /&gt;
Now, let's look at BY group processing on the above data First we have to sort:&lt;BR /&gt;
[pre]&lt;BR /&gt;
proc sort data=what_are_dates;&lt;BR /&gt;
  by descending year descending bday_orig ;&lt;BR /&gt;
run;&lt;BR /&gt;
                   &lt;BR /&gt;
proc print data=what_are_dates;&lt;BR /&gt;
  title 'After sorting -- use different formats for dates';&lt;BR /&gt;
  ** null FORMAT stmt removes all formats for display purposes;&lt;BR /&gt;
  format bday_orig ddmmyy10. bday_diff date9.;&lt;BR /&gt;
run;&lt;BR /&gt;
[/pre]&lt;BR /&gt;
 &lt;BR /&gt;
The newly sorted data (in descending order from highest to lowest) is:&lt;BR /&gt;
[pre]&lt;BR /&gt;
After sorting -- use different formats for dates&lt;BR /&gt;
            &lt;BR /&gt;
                              bday_&lt;BR /&gt;
Obs    name     bday_orig     unfmt    bday_diff    mm    dd    year    week    quarter&lt;BR /&gt;
                           &lt;BR /&gt;
 1     edna    15/11/2010     18581    15NOV2010    11    15    2010     46        4&lt;BR /&gt;
 2     dana    31/07/2001     15187    31JUL2001     7    31    2001     30        3&lt;BR /&gt;
 3     carl    30/05/2000     14760    30MAY2000     5    30    2000     22        2&lt;BR /&gt;
 4     barb    30/03/2000     14699    30MAR2000     3    30    2000     13        1&lt;BR /&gt;
 5     alan    15/11/1950     -3334    15NOV1950    11    15    1950     46        4&lt;BR /&gt;
[/pre]&lt;BR /&gt;
 &lt;BR /&gt;
OK..now we'll use FIRST. and LAST. with the above sorted data in this program:&lt;BR /&gt;
[pre]&lt;BR /&gt;
2202&lt;BR /&gt;
2203  data ck_first_last;&lt;BR /&gt;
2204    set what_are_dates;&lt;BR /&gt;
2205    by descending year descending bday_orig;&lt;BR /&gt;
2206    first_bd = first.bday_orig;&lt;BR /&gt;
2207    last_bd = last.bday_orig;&lt;BR /&gt;
2208    first_yr = first.year;&lt;BR /&gt;
2209    last_yr = last.year;&lt;BR /&gt;
2210&lt;BR /&gt;
2211    putlog _n_= name= bday_orig= first_bd= last_bd= year= first_yr= last_yr=;&lt;BR /&gt;
2212  run;&lt;BR /&gt;
                     &lt;BR /&gt;
_N_=1 name=edna bday_orig=18581 first_bd=1 last_bd=1 year=2010 first_yr=1 last_yr=1&lt;BR /&gt;
_N_=2 name=dana bday_orig=15187 first_bd=1 last_bd=1 year=2001 first_yr=1 last_yr=1&lt;BR /&gt;
_N_=3 name=carl bday_orig=14760 first_bd=1 last_bd=1 year=2000 first_yr=1 last_yr=0&lt;BR /&gt;
_N_=4 name=barb bday_orig=14699 first_bd=1 last_bd=1 year=2000 first_yr=0 last_yr=1&lt;BR /&gt;
_N_=5 name=alan bday_orig=-3334 first_bd=1 last_bd=1 year=1950 first_yr=1 last_yr=1&lt;BR /&gt;
NOTE: There were 5 observations read from the data set WORK.WHAT_ARE_DATES.&lt;BR /&gt;
NOTE: The data set WORK.CK_FIRST_LAST has 5 observations and 13 variables.&lt;BR /&gt;
NOTE: DATA statement used (Total process time):&lt;BR /&gt;
      real time           0.00 seconds&lt;BR /&gt;
      cpu time            0.00 seconds&lt;BR /&gt;
[/pre]&lt;BR /&gt;
               &lt;BR /&gt;
In the above program, BDAY_ORIG -- is showing the internally stored number for birthdate and the automatic variable LAST.BDAY_ORIG (captured into the LAST_BD variable) is set to 1 on EVERY row.&lt;BR /&gt;
 &lt;BR /&gt;
So that is a LONG answer to your question. The short answer to your question&lt;BR /&gt;
&lt;B&gt;- if SAS can distinguish between march vs may in 30/3/2000 and 30/5/2000, why the code using rep_date in last.rep_date not work? &lt;/B&gt; &lt;BR /&gt;
 &lt;BR /&gt;
is that SAS ONLY distinguishes between &lt;U&gt;14699&lt;/U&gt; and &lt;U&gt;14760&lt;/U&gt; internally -- it is your format that gives "date-meaning" to the numbers. When you used the YEAR function, you told SAS to figure out the date for the internally stored number and to extract the 4 digit year from the date into a separate variable. &lt;BR /&gt;
 &lt;BR /&gt;
cynthia</description>
    <pubDate>Sat, 04 Dec 2010 16:30:42 GMT</pubDate>
    <dc:creator>Cynthia_sas</dc:creator>
    <dc:date>2010-12-04T16:30:42Z</dc:date>
    <item>
      <title>A Proc Sort Mystery. Please help!</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/A-Proc-Sort-Mystery-Please-help/m-p/43863#M11504</link>
      <description>Hi all,&lt;BR /&gt;
&lt;BR /&gt;
I have a bunch of panel data/longitudinal observations as follows&lt;BR /&gt;
&lt;BR /&gt;
fund_id     firm_id     report_date &lt;BR /&gt;
A             4545        30/6/1998&lt;BR /&gt;
A             6688        30/12/1998&lt;BR /&gt;
&lt;B&gt;B         4545        30/3/2000&lt;BR /&gt;
B             4545        30/9/2000&lt;/B&gt;&lt;BR /&gt;
B             9535        30/3/2001&lt;BR /&gt;
C             4656        ....&lt;BR /&gt;
D            7586 &lt;BR /&gt;
D            8965&lt;BR /&gt;
&lt;BR /&gt;
As you can see, the data are reported sometimes twice per year (as I highlighted bold above). I use Proc Sort as below&lt;BR /&gt;
&lt;BR /&gt;
proc sort data=data;&lt;BR /&gt;
by fund_id firm_id &lt;B&gt;report_date;&lt;/B&gt;data keep_latest_report_date_only;&lt;BR /&gt;
set data;&lt;BR /&gt;
by fund_id firm_id report_date;&lt;BR /&gt;
&lt;B&gt;if last.report_date;&lt;/B&gt;run;&lt;BR /&gt;
&lt;BR /&gt;
BTW, in the dataset report_date is already a date format (not numeric or anything)&lt;BR /&gt;
&lt;BR /&gt;
However, the code above doesn't work. It doesn't remove the earlier report dates. BUT when I use this code just below, it works perfectly.&lt;BR /&gt;
&lt;BR /&gt;
data data;&lt;BR /&gt;
set data;&lt;BR /&gt;
&lt;B&gt;year=year(rep_date);&lt;/B&gt;run;&lt;BR /&gt;
proc sort data=data;&lt;BR /&gt;
by fund_id firm_id year;&lt;BR /&gt;
data keep_latest_report_date_only;&lt;BR /&gt;
by fund_id firm_id &lt;B&gt;year&lt;/B&gt;;&lt;BR /&gt;
&lt;B&gt;if last.year;&lt;/B&gt;run;&lt;BR /&gt;
&lt;BR /&gt;
My question: Why is this so? Why YEAR function works while not using it doesn't help remove the older report dates? &lt;BR /&gt;
&lt;BR /&gt;
Thank you very much&lt;BR /&gt;
&lt;BR /&gt;
Message was edited by: smilingmelbourne

Message was edited by: smilingmelbourne</description>
      <pubDate>Fri, 03 Dec 2010 00:00:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/A-Proc-Sort-Mystery-Please-help/m-p/43863#M11504</guid>
      <dc:creator>smilingmelbourne</dc:creator>
      <dc:date>2010-12-03T00:00:20Z</dc:date>
    </item>
    <item>
      <title>Re: A Proc Sort Mystery. Please help!</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/A-Proc-Sort-Mystery-Please-help/m-p/43864#M11505</link>
      <description>Your BY processing by 3 variables, Fund ID, Firm ID and Year. &lt;BR /&gt;
&lt;BR /&gt;
For the records indicated the Firm ID's are different so they get put into different groups and each will be the last.  &lt;BR /&gt;
&lt;BR /&gt;
Check the documentation on BY processing.  &lt;BR /&gt;
&lt;BR /&gt;
Check the proc sort with nodupkey option and sort descending options. &lt;BR /&gt;
&lt;BR /&gt;
HTH!</description>
      <pubDate>Fri, 03 Dec 2010 00:28:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/A-Proc-Sort-Mystery-Please-help/m-p/43864#M11505</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2010-12-03T00:28:54Z</dc:date>
    </item>
    <item>
      <title>Re: A Proc Sort Mystery. Please help!</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/A-Proc-Sort-Mystery-Please-help/m-p/43865#M11506</link>
      <description>Sorry that the example data I put in didn't convey my idea. What I really want to figoure out is that why for the same firm that has more than 2 report dates, proc sort with rep_date doesn't help in removing older report dates. In the original post, I didn't make clear my question when I typed in different firms and this of course would be treated as a different group.&lt;BR /&gt;
&lt;BR /&gt;
In my data sets, it looks sth like:&lt;BR /&gt;
&lt;BR /&gt;
Fund_ID   Firm_ID     Report_Date&lt;BR /&gt;
A             123           oldest date&lt;BR /&gt;
A             123           older date&lt;BR /&gt;
A             123           latest date&lt;BR /&gt;
B&lt;BR /&gt;
B&lt;BR /&gt;
B&lt;BR /&gt;
&lt;BR /&gt;
So 123 treated as a group, and if using last.report_date after proc sort by fund_id firm_id report_date, IT SHOULD keep only "latest date", but it doesn't until I use Year function for the report_date, i.e. year=year(report_date), and then replace report_date by year in the proc sort...&lt;BR /&gt;
&lt;BR /&gt;
Thanks</description>
      <pubDate>Fri, 03 Dec 2010 00:40:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/A-Proc-Sort-Mystery-Please-help/m-p/43865#M11506</guid>
      <dc:creator>smilingmelbourne</dc:creator>
      <dc:date>2010-12-03T00:40:09Z</dc:date>
    </item>
    <item>
      <title>Re: A Proc Sort Mystery. Please help!</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/A-Proc-Sort-Mystery-Please-help/m-p/43866#M11507</link>
      <description>The reason it works for year is that if you have two dates with the same year you then have two objects for the year so it takes the last.... if you have unique dates each date will be first and last so selects all of them... &lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
fund_id firm_id report_date YEAR&lt;BR /&gt;
A 4545 30/6/1998 1998&lt;BR /&gt;
A 6688 30/12/1998 1998&lt;BR /&gt;
&lt;BR /&gt;
B 4545 30/3/2000 2000 (First and Last Date)(First Year)   &lt;BR /&gt;
B 4545 30/9/2000 2000 (First and Last Date)(Last Year)&lt;BR /&gt;
&lt;BR /&gt;
B 9535 30/3/2001 2001&lt;BR /&gt;
&lt;BR /&gt;
C 4656 ....&lt;BR /&gt;
D 7586 &lt;BR /&gt;
D 8965&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
what you want to do is have &lt;BR /&gt;
&lt;BR /&gt;
Last.Firm_ID rather Last.Report_Date&lt;BR /&gt;
&lt;BR /&gt;
 which will give you the latest date for each &lt;BR /&gt;
&lt;BR /&gt;
Clear as Mud&lt;BR /&gt;
&lt;BR /&gt;
Barry&lt;BR /&gt;
&lt;BR /&gt;
Message was edited by: twocanbazza&lt;BR /&gt;
&lt;BR /&gt;
Message was edited by: twocanbazza

Message was edited by: twocanbazza</description>
      <pubDate>Fri, 03 Dec 2010 02:56:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/A-Proc-Sort-Mystery-Please-help/m-p/43866#M11507</guid>
      <dc:creator>twocanbazza</dc:creator>
      <dc:date>2010-12-03T02:56:54Z</dc:date>
    </item>
    <item>
      <title>Re: A Proc Sort Mystery. Please help!</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/A-Proc-Sort-Mystery-Please-help/m-p/43867#M11508</link>
      <description>Hi.&lt;BR /&gt;
Because your highlighted blod two observations has two different date value,&lt;BR /&gt;
So SAS treat them as two different group.You use Year() to make them have the same value,so can work.&lt;BR /&gt;
You need data such as:&lt;BR /&gt;
&lt;B&gt;B 4545 30/3/2000&lt;BR /&gt;
B 4545 30/3/2000&lt;/B&gt;</description>
      <pubDate>Fri, 03 Dec 2010 03:55:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/A-Proc-Sort-Mystery-Please-help/m-p/43867#M11508</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2010-12-03T03:55:24Z</dc:date>
    </item>
    <item>
      <title>Re: A Proc Sort Mystery. Please help!</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/A-Proc-Sort-Mystery-Please-help/m-p/43868#M11509</link>
      <description>If I use LAST.FIRM_ID, then my panel data/longitudinal data structure will be gone! :-). I must use LAST.STH, where STH is for REPORT_DATE. I've used YEAR() for REPORT_DATE and it work just perfect. My query is why it cannot sort by REPORT_DATE whereby latest date is the last in a group and oldest the first.&lt;BR /&gt;
&lt;BR /&gt;
From this:: &lt;BR /&gt;
&lt;BR /&gt;
fund_id        firm_id         report_date&lt;BR /&gt;
A                B                30/5/2000&lt;BR /&gt;
A                B                &lt;B&gt;31/12/2000&lt;/B&gt;&lt;BR /&gt;
A                B                30/3/2001&lt;BR /&gt;
A                B                30/6/2001&lt;BR /&gt;
A                B                &lt;B&gt;31/12/2001&lt;/B&gt;&lt;BR /&gt;
&lt;BR /&gt;
I want this::&lt;BR /&gt;
fund_id        firm_id         report_date&lt;BR /&gt;
A                B                &lt;B&gt;31/12/2000&lt;/B&gt;&lt;BR /&gt;
A                B                &lt;B&gt;31/12/2001&lt;/B&gt;&lt;BR /&gt;
&lt;BR /&gt;
Of course, I used YEAR() for report_date and it was all OK, but I am wondering why I can't use this code below for the same results&lt;BR /&gt;
&lt;BR /&gt;
&lt;B&gt;proc sort data=sth;&lt;BR /&gt;
by fund_id firm_id report_date;&lt;BR /&gt;
data=sth_latest_date; /*for each fund, keep only the latest obs for each of the firms that the fund invests in*/&lt;BR /&gt;
if LAST.REPORT_DATE;&lt;BR /&gt;
run;&lt;/B&gt;</description>
      <pubDate>Fri, 03 Dec 2010 04:21:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/A-Proc-Sort-Mystery-Please-help/m-p/43868#M11509</guid>
      <dc:creator>smilingmelbourne</dc:creator>
      <dc:date>2010-12-03T04:21:59Z</dc:date>
    </item>
    <item>
      <title>Re: A Proc Sort Mystery. Please help!</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/A-Proc-Sort-Mystery-Please-help/m-p/43869#M11510</link>
      <description>In your final post you have mixed steps.  SAS processes one step at a time and the steps must have clear boundaries (steps are never nested or imbedded, but are always distinct from each other).  The SET statement is used within a DATA step not a PROC step.  &lt;BR /&gt;
&lt;BR /&gt;
DATA steps start with a DATA statement&lt;BR /&gt;
[pre] data new;[/pre]&lt;BR /&gt;
&lt;BR /&gt;
PROC steps start with a PROC statement&lt;BR /&gt;
[pre] proc sort data=new out=newer;[/pre]&lt;BR /&gt;
&lt;BR /&gt;
Although not strictly necessary, good programming practice uses the RUN: or QUIT; statement to terminate a step.</description>
      <pubDate>Fri, 03 Dec 2010 07:27:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/A-Proc-Sort-Mystery-Please-help/m-p/43869#M11510</guid>
      <dc:creator>ArtC</dc:creator>
      <dc:date>2010-12-03T07:27:43Z</dc:date>
    </item>
    <item>
      <title>Re: A Proc Sort Mystery. Please help!</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/A-Proc-Sort-Mystery-Please-help/m-p/43870#M11511</link>
      <description>&lt;B&gt;BTW, in the dataset report_date is already a date format (not numeric or anything)&lt;/B&gt;&lt;BR /&gt;
&lt;BR /&gt;
I am a bit worried about this comment in the OP.  I suspect that there is a missunderstanding about what is a SAS date, which &lt;B&gt;IS&lt;/B&gt; a numeric value.  Hopefully you are using SAS dates and not a character string - which can have sorting issues for forms other than yy/mm/dd.</description>
      <pubDate>Fri, 03 Dec 2010 07:37:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/A-Proc-Sort-Mystery-Please-help/m-p/43870#M11511</guid>
      <dc:creator>ArtC</dc:creator>
      <dc:date>2010-12-03T07:37:47Z</dc:date>
    </item>
    <item>
      <title>Re: A Proc Sort Mystery. Please help!</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/A-Proc-Sort-Mystery-Please-help/m-p/43871#M11512</link>
      <description>Sorry for this confusion. Report_date is a numeric but formatted ddmmyy10.. That's why I could use YEAR() with it to extract the year as in the first program. &lt;BR /&gt;
&lt;BR /&gt;
Thanks</description>
      <pubDate>Fri, 03 Dec 2010 09:07:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/A-Proc-Sort-Mystery-Please-help/m-p/43871#M11512</guid>
      <dc:creator>smilingmelbourne</dc:creator>
      <dc:date>2010-12-03T09:07:50Z</dc:date>
    </item>
    <item>
      <title>Re: A Proc Sort Mystery. Please help!</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/A-Proc-Sort-Mystery-Please-help/m-p/43872#M11513</link>
      <description>It would be most helpful to post the actual SAS execution log, where I expect there would be diagnostic messages if a function could not execute properly.  Also, the OP would benefit from adding the following statement to help with self-diagnosis along the SAS DATA step processing (where "nn" is made unique, so multiple occurrences can be coded for debugging):&lt;BR /&gt;
&lt;BR /&gt;
PUTLOG '&amp;gt;DIAG-nn&amp;gt;' / _ALL_;&lt;BR /&gt;
&lt;BR /&gt;
Scott Barry&lt;BR /&gt;
SBBWorks, Inc.</description>
      <pubDate>Fri, 03 Dec 2010 09:10:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/A-Proc-Sort-Mystery-Please-help/m-p/43872#M11513</guid>
      <dc:creator>sbb</dc:creator>
      <dc:date>2010-12-03T09:10:34Z</dc:date>
    </item>
    <item>
      <title>Re: A Proc Sort Mystery. Please help!</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/A-Proc-Sort-Mystery-Please-help/m-p/43873#M11514</link>
      <description>Hello SmilingMelbourne,&lt;BR /&gt;
&lt;BR /&gt;
If I rearrange your code &lt;BR /&gt;
[pre]&lt;BR /&gt;
proc sort data=sth;&lt;BR /&gt;
by fund_id firm_id report_date;&lt;BR /&gt;
data=sth_latest_date; /*for each fund, keep only the latest obs for each of the firms that the fund invests in*/&lt;BR /&gt;
if LAST.REPORT_DATE;&lt;BR /&gt;
run; &lt;BR /&gt;
[/pre]&lt;BR /&gt;
in more readable one: &lt;BR /&gt;
[pre]&lt;BR /&gt;
proc sort data=sth;&lt;BR /&gt;
  by fund_id firm_id report_date;&lt;BR /&gt;
run;&lt;BR /&gt;
data=sth_latest_date;&lt;BR /&gt;
  set sth; &lt;BR /&gt;
  /*for each fund, keep only the latest obs for each of the firms that the fund invests in*/&lt;BR /&gt;
  if LAST.REPORT_DATE;&lt;BR /&gt;
run; &lt;BR /&gt;
[/pre]&lt;BR /&gt;
then it obvious that the datastep does not contain necessary BY FUND_ID FIRM_ID and instead of your IF it should be IF LAST.FIRM_ID. However, this is not what it seems to me you want. You need last date within each YEAR but this code simply gives the most recent date and eliminates the necessary observation A B 31/12/2000. &lt;BR /&gt;
&lt;BR /&gt;
Sincerely,&lt;BR /&gt;
SPR</description>
      <pubDate>Fri, 03 Dec 2010 14:58:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/A-Proc-Sort-Mystery-Please-help/m-p/43873#M11514</guid>
      <dc:creator>SPR</dc:creator>
      <dc:date>2010-12-03T14:58:29Z</dc:date>
    </item>
    <item>
      <title>Re: A Proc Sort Mystery. Please help!</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/A-Proc-Sort-Mystery-Please-help/m-p/43874#M11515</link>
      <description>Hi:&lt;BR /&gt;
  SAS dates are internally stored as number which represents an offset from 1/1/1960 -- so the FORMAT of the internally stored number has no impact on why the YEAR function worked. As long as REPORT_DATE represented a number that was a valid date value, the YEAR function was OK.&lt;BR /&gt;
&lt;BR /&gt;
  It seems to me that SPR and many others have pointed out that LAST.REPORT_DATE would never work correctly as the logic to get your desired rows output. But in order to understand WHY that is the case, it might be useful for you to run the program below and examine the values for all the FIRST. and LAST. variables that were set when you used BY group processing.&lt;BR /&gt;
 &lt;BR /&gt;
  If you examine the output, you will see that LAST.REPORT_DATE is set to 1 for EVERY row in your input data. So LAST.REPORT_DATE as an output condition would never get you the desired rows. And even though you consider the dates to be different -inside- the group, as far as BY group processing is concerned, EVERY SINGLE DATE is the LAST.REPORT_DATE (or the automatic variable LAST.REPORT_DATE = 1).&lt;BR /&gt;
 &lt;BR /&gt;
  Let's take some slightly different data to consider:&lt;BR /&gt;
[pre]&lt;BR /&gt;
FUND FIRM  REPORT_DATE&lt;BR /&gt;
A     B    30/5/2000&lt;BR /&gt;
A     B    31/12/2000   &amp;lt;-- you want this&lt;BR /&gt;
A     B    30/3/2001&lt;BR /&gt;
A     B    30/3/2001   &amp;lt;---note duplicate report date&lt;BR /&gt;
A     B    30/6/2001&lt;BR /&gt;
A     B    31/12/2001   &amp;lt;-- you want this&lt;BR /&gt;
[/pre]&lt;BR /&gt;
 &lt;BR /&gt;
Now, let's look at the value of LAST.REPORT_DATE for ONLY those rows of data (from the SAS log):&lt;BR /&gt;
[pre]&lt;BR /&gt;
_N_=1 Fund_ID=A Firm_ID=B Report_Date=30MAY2000 FIRST.Report_Date=1 LAST.Report_Date=1&lt;BR /&gt;
_N_=2 Fund_ID=A Firm_ID=B Report_Date=31DEC2000 FIRST.Report_Date=1 LAST.Report_Date=1&lt;BR /&gt;
_N_=3 Fund_ID=A Firm_ID=B Report_Date=30MAR2001 FIRST.Report_Date=1 LAST.Report_Date=0&lt;BR /&gt;
_N_=4 Fund_ID=A Firm_ID=B Report_Date=30MAR2001 FIRST.Report_Date=0 LAST.Report_Date=1&lt;BR /&gt;
_N_=5 Fund_ID=A Firm_ID=B Report_Date=30JUN2001 FIRST.Report_Date=1 LAST.Report_Date=1&lt;BR /&gt;
_N_=6 Fund_ID=A Firm_ID=B Report_Date=31DEC2001 FIRST.Report_Date=1 LAST.Report_Date=1&lt;BR /&gt;
[/pre]&lt;BR /&gt;
            &lt;BR /&gt;
Note how LAST.REPORT_DATE = 1 for all the observations EXCEPT for #3, the first obs for 03/30/2001. When you coded IF LAST.REPORT_DATE, you were asking SAS to output for every obs where LAST.REPORT_DATE=1 -- this is why LAST.REPORT_DATE logic did not give you the desired rows.&lt;BR /&gt;
 &lt;BR /&gt;
Now, let's look at the YEAR for every obs (shown here as the R_YEAR variable same obs as above shown in SAS log with R_YEAR automatic variables):&lt;BR /&gt;
[pre]&lt;BR /&gt;
_N_=1 Fund_ID=A Firm_ID=B r_year=2000 FIRST.r_year=1 LAST.r_year=0&lt;BR /&gt;
_N_=2 Fund_ID=A Firm_ID=B r_year=2000 FIRST.r_year=0 LAST.r_year=1&lt;BR /&gt;
_N_=3 Fund_ID=A Firm_ID=B r_year=2001 FIRST.r_year=1 LAST.r_year=0&lt;BR /&gt;
_N_=4 Fund_ID=A Firm_ID=B r_year=2001 FIRST.r_year=0 LAST.r_year=0&lt;BR /&gt;
_N_=5 Fund_ID=A Firm_ID=B r_year=2001 FIRST.r_year=0 LAST.r_year=0&lt;BR /&gt;
_N_=6 Fund_ID=A Firm_ID=B r_year=2001 FIRST.r_year=0 LAST.r_year=1&lt;BR /&gt;
[/pre]&lt;BR /&gt;
        &lt;BR /&gt;
Note in the above log output how LAST.R_YEAR is set to 1 for obs #2 and obs #6 -- your desired rows for output.&lt;BR /&gt;
 &lt;BR /&gt;
When you use BY group processing and the automatic variables FIRST.byvar and LAST.byvar, you must always keep in mind that the entire GROUP of rows is treated as a unit -- so the FIRST. and LAST. conditions apply to the WHOLE unit -- that's why you needed a finer level of granularity (year-level) within the group in order to select the rows you wanted.&lt;BR /&gt;
 &lt;BR /&gt;
cynthia&lt;BR /&gt;
[pre]&lt;BR /&gt;
** Make some data;&lt;BR /&gt;
data funds;&lt;BR /&gt;
  infile datalines dlm = ' ';&lt;BR /&gt;
  input Fund_ID $ Firm_ID $ Report_Date : anydtdte.;&lt;BR /&gt;
   &lt;BR /&gt;
  ** create a report year (r_year) variable;&lt;BR /&gt;
  r_year = year(report_date);&lt;BR /&gt;
  format report_date date9.;&lt;BR /&gt;
return;&lt;BR /&gt;
datalines;&lt;BR /&gt;
A B 30/5/2000&lt;BR /&gt;
A B 31/12/2000&lt;BR /&gt;
A B 30/3/2001&lt;BR /&gt;
A B 30/3/2001&lt;BR /&gt;
A B 30/6/2001&lt;BR /&gt;
A B 31/12/2001&lt;BR /&gt;
;&lt;BR /&gt;
run;&lt;BR /&gt;
                    &lt;BR /&gt;
proc sort data=funds;&lt;BR /&gt;
by fund_id firm_id r_year report_date;&lt;BR /&gt;
run;&lt;BR /&gt;
            &lt;BR /&gt;
** Create multiple datasets from this initial data;&lt;BR /&gt;
** using LAST. logic to selectively output rows;&lt;BR /&gt;
** but first, need to "grab" the values for all the automatic variables.;&lt;BR /&gt;
data what_are_auto_vars&lt;BR /&gt;
     last_yr_only&lt;BR /&gt;
     last_rd_only&lt;BR /&gt;
     last_firm_only;&lt;BR /&gt;
set funds;&lt;BR /&gt;
by fund_id firm_id r_year report_date;&lt;BR /&gt;
            &lt;BR /&gt;
** Capture the values of ALL the automatic variables;&lt;BR /&gt;
** to see how SAS is setting the FIRST. and LAST. values.;&lt;BR /&gt;
first_fnd = first.fund_id;&lt;BR /&gt;
last_fnd = last.fund_id;&lt;BR /&gt;
first_frm = first.firm_id;&lt;BR /&gt;
last_frm = last.firm_id;&lt;BR /&gt;
first_yr = first.r_year;&lt;BR /&gt;
last_yr = last.r_year;&lt;BR /&gt;
first_rd = first.report_date;&lt;BR /&gt;
last_rd = last.report_date;&lt;BR /&gt;
      &lt;BR /&gt;
&lt;BR /&gt;
putlog _n_= fund_id= firm_id= report_date= first.report_date= last.report_date=;&lt;BR /&gt;
*putlog _n_= fund_id= firm_id= r_year= first.r_year= last.r_year=;&lt;BR /&gt;
 &lt;BR /&gt;
** now, output everything and then;&lt;BR /&gt;
** output for most of the other LAST. variables; &lt;BR /&gt;
output what_are_auto_vars;&lt;BR /&gt;
if last.firm_id then output last_firm_only;&lt;BR /&gt;
if last.report_date then output last_rd_only;&lt;BR /&gt;
if last.r_year then output last_yr_only;&lt;BR /&gt;
run;&lt;BR /&gt;
                     &lt;BR /&gt;
ods listing close;&lt;BR /&gt;
ods html file='c:\temp\understand_first_last.html' style=sasweb;&lt;BR /&gt;
proc report data=what_are_auto_vars nowd;&lt;BR /&gt;
  title 'What are Automatic Variables with BY Processing?';&lt;BR /&gt;
  column ('FUND FIRST. and LAST.' fund_id first_fnd last_fnd) &lt;BR /&gt;
         ('FIRM FIRST. and LAST.' firm_id first_frm last_frm)&lt;BR /&gt;
         ('YEAR FIRST. and LAST.' r_year first_yr last_yr) &lt;BR /&gt;
         ('DATE FIRST. and LAST.' report_date first_rd last_rd)&lt;BR /&gt;
         ('Internal Number' report_date=unfmt_date);&lt;BR /&gt;
  define report_date / f=date9.;&lt;BR /&gt;
  define unfmt_date / f=best6.;&lt;BR /&gt;
run;&lt;BR /&gt;
                        &lt;BR /&gt;
proc report data=last_yr_only nowd;&lt;BR /&gt;
&lt;BR /&gt;
  title 'What is output for last.r_year -- matches desired output';&lt;BR /&gt;
  column ('FUND FIRST. and LAST.' fund_id first_fnd last_fnd) &lt;BR /&gt;
         ('FIRM FIRST. and LAST.' firm_id first_frm last_frm)&lt;BR /&gt;
         ('YEAR FIRST. and LAST.' r_year first_yr last_yr) &lt;BR /&gt;
         ('DATE FIRST. and LAST.' report_date first_rd last_rd);&lt;BR /&gt;
run;&lt;BR /&gt;
                    &lt;BR /&gt;
proc report data=last_rd_only nowd;&lt;BR /&gt;
  title 'What is output for last.report_date (wrong)';&lt;BR /&gt;
  column ('FUND FIRST. and LAST.' fund_id first_fnd last_fnd) &lt;BR /&gt;
         ('FIRM FIRST. and LAST.' firm_id first_frm last_frm)&lt;BR /&gt;
         ('YEAR FIRST. and LAST.' r_year first_yr last_yr) &lt;BR /&gt;
         ('DATE FIRST. and LAST.' report_date first_rd last_rd);&lt;BR /&gt;
run;&lt;BR /&gt;
                                    &lt;BR /&gt;
proc report data=last_firm_only nowd;&lt;BR /&gt;
  title 'What is output for last.firm_id (wrong)';&lt;BR /&gt;
  column ('FUND FIRST. and LAST.' fund_id first_fnd last_fnd) &lt;BR /&gt;
         ('FIRM FIRST. and LAST.' firm_id first_frm last_frm)&lt;BR /&gt;
         ('YEAR FIRST. and LAST.' r_year first_yr last_yr) &lt;BR /&gt;
         ('DATE FIRST. and LAST.' report_date first_rd last_rd);&lt;BR /&gt;
run;&lt;BR /&gt;
title;&lt;BR /&gt;
ods _all_ close;&lt;BR /&gt;
[/pre]</description>
      <pubDate>Fri, 03 Dec 2010 16:38:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/A-Proc-Sort-Mystery-Please-help/m-p/43874#M11515</guid>
      <dc:creator>Cynthia_sas</dc:creator>
      <dc:date>2010-12-03T16:38:25Z</dc:date>
    </item>
    <item>
      <title>Re: A Proc Sort Mystery. Please help!</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/A-Proc-Sort-Mystery-Please-help/m-p/43875#M11516</link>
      <description>Thank you all for your help. I think I should post in here again a small subset of the actual data set, the logs of the codes that I wrote that didn't work and that work. &lt;BR /&gt;
&lt;BR /&gt;
&lt;B&gt;Original Data Set&lt;/B&gt;&lt;BR /&gt;
&lt;B&gt;owner_id&lt;/B&gt;	&lt;B&gt;security_id&lt;/B&gt;	shr_held	value_held	&lt;B&gt;rep_date&lt;/B&gt;&lt;BR /&gt;
&lt;BR /&gt;
2000004	29191	0	0	31/03/2004&lt;BR /&gt;
&lt;B&gt;2000004	113563&lt;/B&gt;	666642	2266583	31/12/2004&lt;BR /&gt;
&lt;B&gt;2000004	113563&lt;/B&gt;	666642	2119922	31/03/2005&lt;BR /&gt;
&lt;B&gt;2000004	113563	0	0	&lt;U&gt;31/05/2005&lt;/U&gt;&lt;/B&gt;&lt;BR /&gt;
&lt;B&gt;2000004	113571&lt;/B&gt;	80801	1260382	30/09/2005&lt;BR /&gt;
&lt;B&gt;2000004	113571	209629	3136985	&lt;U&gt;31/12/2005&lt;/U&gt;&lt;/B&gt;&lt;BR /&gt;
&lt;B&gt;2000004	113571&lt;/B&gt;	87283	1443286	31/03/2006&lt;BR /&gt;
&lt;B&gt;2000004	113571&lt;/B&gt;	91624	1555180	30/06/2006&lt;BR /&gt;
&lt;B&gt;2000004	113571&lt;/B&gt;	89039	1493031	30/09/2006&lt;BR /&gt;
&lt;B&gt;2000004	113571&lt;/B&gt;	0	0	&lt;B&gt;&lt;U&gt;31/12/2006&lt;/U&gt;&lt;/B&gt;&lt;BR /&gt;
&lt;BR /&gt;
&lt;B&gt;Codes that did NOT work and their logs&lt;/B&gt;&lt;BR /&gt;
738  proc sort data=mf;&lt;BR /&gt;
739      by owner_id security_id rep_date;&lt;BR /&gt;
&lt;BR /&gt;
NOTE: There were 25711268 observations read from the data set WORK.MF.&lt;BR /&gt;
NOTE: The data set WORK.MF has 25711268 observations and 6 variables.&lt;BR /&gt;
NOTE: PROCEDURE SORT used (Total process time):&lt;BR /&gt;
      real time           3:14.04&lt;BR /&gt;
      cpu time            27.95 seconds&lt;BR /&gt;
&lt;BR /&gt;
740  data nodup;&lt;BR /&gt;
741      set mf;&lt;BR /&gt;
742      by owner_id security_id rep_date;&lt;BR /&gt;
743      if last.rep_date;&lt;BR /&gt;
744  run;&lt;BR /&gt;
&lt;BR /&gt;
NOTE: There were 25711268 observations read from the data set WORK.MF.&lt;BR /&gt;
NOTE: The data set WORK.NODUP has 25711268 observations and 6 variables.&lt;BR /&gt;
NOTE: DATA statement used (Total process time):&lt;BR /&gt;
      real time           24.79 seconds&lt;BR /&gt;
      cpu time            19.18 seconds&lt;BR /&gt;
&lt;BR /&gt;
&lt;B&gt;Codes that WORKED and their logs&lt;/B&gt;&lt;BR /&gt;
745  data mf;&lt;BR /&gt;
746       set mf;&lt;BR /&gt;
747       year=year(rep_date);&lt;BR /&gt;
748  run;&lt;BR /&gt;
   &lt;BR /&gt;
749  proc sort data=mf;&lt;BR /&gt;
750      by owner_id security_id year;&lt;BR /&gt;
&lt;BR /&gt;
NOTE: There were 25711268 observations read from the data set WORK.MF.&lt;BR /&gt;
NOTE: The data set WORK.MF has 25711268 observations and 6 variables.&lt;BR /&gt;
NOTE: PROCEDURE SORT used (Total process time):&lt;BR /&gt;
      real time           2:59.32&lt;BR /&gt;
      cpu time            28.57 seconds&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
751  data nodup;&lt;BR /&gt;
752      set mf;&lt;BR /&gt;
753      by owner_id security_id year;&lt;BR /&gt;
754      if last.year;&lt;BR /&gt;
755  run;&lt;BR /&gt;
&lt;BR /&gt;
NOTE: There were 25711268 observations read from the data set WORK.MF.&lt;BR /&gt;
NOTE: The data set WORK.NODUP has 11870591 observations and 6 variables.&lt;BR /&gt;
NOTE: DATA statement used (Total process time):&lt;BR /&gt;
      real time           24.23 seconds&lt;BR /&gt;
      cpu time            12.23 seconds&lt;BR /&gt;
&lt;BR /&gt;
As you can see, &lt;B&gt;the ONLY thing&lt;/B&gt; that I changed that &lt;B&gt;make the codes work&lt;/B&gt; is &lt;B&gt;throw away rep_date&lt;/B&gt; and &lt;B&gt;replace it with year&lt;/B&gt;, where year is extracted using YEAR(). &lt;BR /&gt;
&lt;BR /&gt;
&lt;B&gt;I still don't understand why:&lt;/B&gt;&lt;BR /&gt;
- if year=year(rep_date) works, i.e. it can tell the difference between e.g. 30/3/2000 and 30/5/2000, then it means that SAS knows that these 2 dates have are different in months (march vs may) and it can sort in descending order or so&lt;BR /&gt;
&lt;BR /&gt;
- if SAS can distinguish between march vs may in 30/3/2000 and 30/5/2000, why the code using &lt;B&gt;rep_date&lt;/B&gt; in &lt;B&gt;last.rep_date&lt;/B&gt; not work? For the same fund id 2000004 and the same firm id 113563, we have different dates (different in months), so logically SAS can sort from oldest dates on top and latest dates on bottom for each of the securities and for each of the firms.&lt;BR /&gt;
&lt;BR /&gt;
- Why do I NOT need to use the same code as Cynthia so as to make it work, that is, I didn't use year as a group before rep_date and it still worked.&lt;BR /&gt;
&lt;BR /&gt;
&lt;B&gt;I use this:&lt;/B&gt; and it still worked&lt;BR /&gt;
data  nodup;&lt;BR /&gt;
set mf;&lt;BR /&gt;
by owner_id security_id year;&lt;BR /&gt;
if last.year;&lt;BR /&gt;
run;&lt;BR /&gt;
&lt;BR /&gt;
&lt;B&gt;Instead of this as Cynthia recommends:&lt;/B&gt;&lt;BR /&gt;
data nodup;&lt;BR /&gt;
set mf;&lt;BR /&gt;
by owner_id security_id &lt;B&gt;year rep_date;&lt;/B&gt;if last.rep_date;&lt;BR /&gt;
run;&lt;B&gt;&lt;/B&gt;</description>
      <pubDate>Sat, 04 Dec 2010 06:34:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/A-Proc-Sort-Mystery-Please-help/m-p/43875#M11516</guid>
      <dc:creator>smilingmelbourne</dc:creator>
      <dc:date>2010-12-04T06:34:16Z</dc:date>
    </item>
    <item>
      <title>Re: A Proc Sort Mystery. Please help!</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/A-Proc-Sort-Mystery-Please-help/m-p/43876#M11517</link>
      <description>The OP needs to re-read the entire thread, especially Cynthia's explanation that makes use of the PUTLOG _ALL_;  processing to demonstrate how BY GROUP processing is behaving.  With IF LAST.rep_date specified, SAS is acknowledging each unique date occurrence and is going to output the "last" one for each unique date value --- not a date within a particular 'year-portion of any given date'.  That is where the code is flawed -- that is if the OP wants to only capture the last occurrence within a given calendar year.&lt;BR /&gt;
&lt;BR /&gt;
You will benefit from adding diagnostic PUTLOG commands to your SAS program and running it yourself with your data -- and suggest running it multiple times with different BY variable definitions -- the most important BY variable being the one you use in your IF LAST.&lt;VARNAME&gt;   specification for output.&lt;BR /&gt;
&lt;BR /&gt;
Scott Barry&lt;BR /&gt;
SBBWorks, Inc.&lt;BR /&gt;
&lt;BR /&gt;
Suggested Google advanced search argument, this topic / post:&lt;BR /&gt;
by group processing site:sas.com&lt;/VARNAME&gt;</description>
      <pubDate>Sat, 04 Dec 2010 15:05:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/A-Proc-Sort-Mystery-Please-help/m-p/43876#M11517</guid>
      <dc:creator>sbb</dc:creator>
      <dc:date>2010-12-04T15:05:12Z</dc:date>
    </item>
    <item>
      <title>Re: A Proc Sort Mystery. Please help!</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/A-Proc-Sort-Mystery-Please-help/m-p/43877#M11518</link>
      <description>Hi:&lt;BR /&gt;
  I left REPORT_DATE in my code only for teaching purposes. I wanted to show you that LAST.REPORT_DATE was inappropriate compared to LAST.R_YEAR -- in order to illustrate that point, REPORT_DATE had to be in my BY statement.&lt;BR /&gt;
 &lt;BR /&gt;
  My code was not proposed as the "solution" to your issue. My code was meant to show you that LAST.REPORT_DATE was inappropriate as a condition to get you the desired rows. If I had been solving your problem, I would have written a program that used only LAST.R_YEAR without LAST.REPORT_DATE. I would NOT have captured every single FIRST. and LAST. variable. &lt;BR /&gt;
 &lt;BR /&gt;
  It seems to me that your fundamental problem lies with this statement/belief where you say:&lt;BR /&gt;
&lt;B&gt;&lt;BR /&gt;
- if SAS can distinguish between march vs may in 30/3/2000 and 30/5/2000, why the code using rep_date in last.rep_date not work? For the same fund id 2000004 and the same firm id 113563, we have different dates (different in months), so logically SAS can sort from oldest dates on top and latest dates on bottom for each of the securities and for each of the firms.&lt;BR /&gt;
&lt;/B&gt;&lt;BR /&gt;
&lt;BR /&gt;
 &lt;BR /&gt;
OK, let's go back to fundamentals. SAS dates. What are they. SAS handles dates as an offset from January 1, 1960. What does that mean? January 1, 1960 is internally stored as 0. 15Nov1950 was 3334 days BEFORE January 1, 1960 -- so 15Nov1950 is internally stored as -3334 in a SAS dataset. As you can see, internally, the number is just a number - -it only becomes meaningful as a date when you &lt;BR /&gt;
1) use it with a SAS date function that knows what the number represents or&lt;BR /&gt;
2) display it with a SAS date format.&lt;BR /&gt;
 &lt;BR /&gt;
But, internally, your date is stored as a single number, no matter how you read it in, no matter what format you use. That single number is what is used in comparisons and conditions. If you need a date-related piece out of that single number, you must use a SAS function to get that piece.&lt;BR /&gt;
&lt;BR /&gt;
Consider THIS program and data:&lt;BR /&gt;
[pre]&lt;BR /&gt;
data what_are_dates;&lt;BR /&gt;
   infile datalines;&lt;BR /&gt;
   input name $ bday_orig : mmddyy10.;&lt;BR /&gt;
                        &lt;BR /&gt;
   bday_unfmt = bday_orig;&lt;BR /&gt;
   bday_diff = bday_orig;&lt;BR /&gt;
        &lt;BR /&gt;
   ** Extract some date information from the numeric variable;&lt;BR /&gt;
   mm = month(bday_orig);&lt;BR /&gt;
   dd = day(bday_orig);&lt;BR /&gt;
   year = year(bday_orig);&lt;BR /&gt;
   week = week(bday_orig);&lt;BR /&gt;
   quarter = qtr(bday_orig);&lt;BR /&gt;
                    &lt;BR /&gt;
   putlog '   ***   ***   ***   ***   ***   ***   ***   ';&lt;BR /&gt;
   putlog _all_;&lt;BR /&gt;
   putlog name= bday_unfmt= bday_orig= mmddyy10. bday_diff= date9. mm= dd= year= week= quarter=;&lt;BR /&gt;
return;&lt;BR /&gt;
datalines;&lt;BR /&gt;
alan 11/15/1950&lt;BR /&gt;
barb 03/30/2000&lt;BR /&gt;
carl 05/30/2000&lt;BR /&gt;
dana 07/31/2001&lt;BR /&gt;
edna 11/15/2010&lt;BR /&gt;
;&lt;BR /&gt;
run;&lt;BR /&gt;
[/pre]&lt;BR /&gt;
                             &lt;BR /&gt;
Note that the dates, in my input data are understandable as regular date values. But when I read the dates in my program with the mmddyy10. informat, I am instructing SAS to convert the dates from their conventional date representation into the internal number that represents that date's "distance" from January 1, 1960.&lt;BR /&gt;
 &lt;BR /&gt;
Here's the LOG from the above program:&lt;BR /&gt;
[pre]&lt;BR /&gt;
2041  data what_are_dates;&lt;BR /&gt;
2042     infile datalines;&lt;BR /&gt;
2043     input name $ bday_orig : mmddyy10.;&lt;BR /&gt;
2044     bday_unfmt = bday_orig;&lt;BR /&gt;
2045     bday_diff = bday_orig;&lt;BR /&gt;
2046     mm = month(bday_orig);&lt;BR /&gt;
2047     dd = day(bday_orig);&lt;BR /&gt;
2048     year = year(bday_orig);&lt;BR /&gt;
2049     week = week(bday_orig);&lt;BR /&gt;
2050     quarter = qtr(bday_orig);&lt;BR /&gt;
2051     putlog '   ***   ***   ***   ***   ***   ***   ***   ';&lt;BR /&gt;
2052     putlog _all_;&lt;BR /&gt;
2053     putlog name= bday_unfmt= bday_orig= mmddyy10. bday_diff= date9. mm= dd= year= week= quarter=;&lt;BR /&gt;
2054  return;&lt;BR /&gt;
2055  datalines;&lt;BR /&gt;
                       &lt;BR /&gt;
   ***   ***   ***   ***   ***   ***   ***&lt;BR /&gt;
name=alan bday_orig=-3334 bday_unfmt=-3334 bday_diff=-3334 mm=11 dd=15 year=1950 week=46 quarter=4 _ERROR_=0 _N_=1&lt;BR /&gt;
name=alan bday_unfmt=-3334 bday_orig=11/15/1950 bday_diff=15NOV1950 mm=11 dd=15 year=1950 week=46 quarter=4&lt;BR /&gt;
   ***   ***   ***   ***   ***   ***   ***&lt;BR /&gt;
name=barb bday_orig=14699 bday_unfmt=14699 bday_diff=14699 mm=3 dd=30 year=2000 week=13 quarter=1 _ERROR_=0 _N_=2&lt;BR /&gt;
name=barb bday_unfmt=14699 bday_orig=03/30/2000 bday_diff=30MAR2000 mm=3 dd=30 year=2000 week=13 quarter=1&lt;BR /&gt;
   ***   ***   ***   ***   ***   ***   ***&lt;BR /&gt;
name=carl bday_orig=14760 bday_unfmt=14760 bday_diff=14760 mm=5 dd=30 year=2000 week=22 quarter=2 _ERROR_=0 _N_=3&lt;BR /&gt;
name=carl bday_unfmt=14760 bday_orig=05/30/2000 bday_diff=30MAY2000 mm=5 dd=30 year=2000 week=22 quarter=2&lt;BR /&gt;
   ***   ***   ***   ***   ***   ***   ***&lt;BR /&gt;
name=dana bday_orig=15187 bday_unfmt=15187 bday_diff=15187 mm=7 dd=31 year=2001 week=30 quarter=3 _ERROR_=0 _N_=4&lt;BR /&gt;
name=dana bday_unfmt=15187 bday_orig=07/31/2001 bday_diff=31JUL2001 mm=7 dd=31 year=2001 week=30 quarter=3&lt;BR /&gt;
   ***   ***   ***   ***   ***   ***   ***&lt;BR /&gt;
name=edna bday_orig=18581 bday_unfmt=18581 bday_diff=18581 mm=11 dd=15 year=2010 week=46 quarter=4 _ERROR_=0 _N_=5&lt;BR /&gt;
name=edna bday_unfmt=18581 bday_orig=11/15/2010 bday_diff=15NOV2010 mm=11 dd=15 year=2010 week=46 quarter=4&lt;BR /&gt;
NOTE: The data set WORK.WHAT_ARE_DATES has 5 observations and 9 variables.&lt;BR /&gt;
NOTE: DATA statement used (Total process time):&lt;BR /&gt;
      real time           0.01 seconds&lt;BR /&gt;
      cpu time            0.01 seconds&lt;BR /&gt;
[/pre]&lt;BR /&gt;
 &lt;BR /&gt;
From the PUTLOG statements, you can see that 30Mar2000 is internally stored as the single number 14699 and that 30May2000 is internally stored as the single number 14760. That's how SAS tells the 2 dates apart. But, when the number is stored as 14699 or 14760, SAS knows that it is a date, but a LAST.BDAY condition would be set to 1 for every date (if I used LAST.BDAY in the program, which I did NOT).&lt;BR /&gt;
 &lt;BR /&gt;
Note how the use of the date-related functions MONTH, DAY, YEAR, WEEK and QTR resulted in the extraction of "date" pieces from the program.&lt;BR /&gt;
 &lt;BR /&gt;
A PROC PRINT on the dataset can use different formats or no formats to display the internally stored number, but no matter what the number looks like in the PROC PRINT output -- all the date fields are INTERNALLY stored as a single number:&lt;BR /&gt;
Code:&lt;BR /&gt;
[pre]&lt;BR /&gt;
proc print data=what_are_dates;&lt;BR /&gt;
  title 'Show formatted values for some date fields';&lt;BR /&gt;
  format bday_orig weekdate. bday_diff date9.;&lt;BR /&gt;
run;&lt;BR /&gt;
[/pre]&lt;BR /&gt;
               &lt;BR /&gt;
Output:&lt;BR /&gt;
[pre]&lt;BR /&gt;
Show formatted values for some date fields&lt;BR /&gt;
                                       &lt;BR /&gt;
                                                bday_&lt;BR /&gt;
Obs    name             bday_orig               unfmt    bday_diff    mm    dd    year    week    quarter&lt;BR /&gt;
                &lt;BR /&gt;
 1     alan    Wednesday, November 15, 1950     -3334    15NOV1950    11    15    1950     46        4&lt;BR /&gt;
 2     barb        Thursday, March 30, 2000     14699    30MAR2000     3    30    2000     13        1&lt;BR /&gt;
 3     carl           Tuesday, May 30, 2000     14760    30MAY2000     5    30    2000     22        2&lt;BR /&gt;
 4     dana          Tuesday, July 31, 2001     15187    31JUL2001     7    31    2001     30        3&lt;BR /&gt;
 5     edna       Monday, November 15, 2010     18581    15NOV2010    11    15    2010     46        4&lt;BR /&gt;
[/pre]&lt;BR /&gt;
 &lt;BR /&gt;
Now, let's look at BY group processing on the above data First we have to sort:&lt;BR /&gt;
[pre]&lt;BR /&gt;
proc sort data=what_are_dates;&lt;BR /&gt;
  by descending year descending bday_orig ;&lt;BR /&gt;
run;&lt;BR /&gt;
                   &lt;BR /&gt;
proc print data=what_are_dates;&lt;BR /&gt;
  title 'After sorting -- use different formats for dates';&lt;BR /&gt;
  ** null FORMAT stmt removes all formats for display purposes;&lt;BR /&gt;
  format bday_orig ddmmyy10. bday_diff date9.;&lt;BR /&gt;
run;&lt;BR /&gt;
[/pre]&lt;BR /&gt;
 &lt;BR /&gt;
The newly sorted data (in descending order from highest to lowest) is:&lt;BR /&gt;
[pre]&lt;BR /&gt;
After sorting -- use different formats for dates&lt;BR /&gt;
            &lt;BR /&gt;
                              bday_&lt;BR /&gt;
Obs    name     bday_orig     unfmt    bday_diff    mm    dd    year    week    quarter&lt;BR /&gt;
                           &lt;BR /&gt;
 1     edna    15/11/2010     18581    15NOV2010    11    15    2010     46        4&lt;BR /&gt;
 2     dana    31/07/2001     15187    31JUL2001     7    31    2001     30        3&lt;BR /&gt;
 3     carl    30/05/2000     14760    30MAY2000     5    30    2000     22        2&lt;BR /&gt;
 4     barb    30/03/2000     14699    30MAR2000     3    30    2000     13        1&lt;BR /&gt;
 5     alan    15/11/1950     -3334    15NOV1950    11    15    1950     46        4&lt;BR /&gt;
[/pre]&lt;BR /&gt;
 &lt;BR /&gt;
OK..now we'll use FIRST. and LAST. with the above sorted data in this program:&lt;BR /&gt;
[pre]&lt;BR /&gt;
2202&lt;BR /&gt;
2203  data ck_first_last;&lt;BR /&gt;
2204    set what_are_dates;&lt;BR /&gt;
2205    by descending year descending bday_orig;&lt;BR /&gt;
2206    first_bd = first.bday_orig;&lt;BR /&gt;
2207    last_bd = last.bday_orig;&lt;BR /&gt;
2208    first_yr = first.year;&lt;BR /&gt;
2209    last_yr = last.year;&lt;BR /&gt;
2210&lt;BR /&gt;
2211    putlog _n_= name= bday_orig= first_bd= last_bd= year= first_yr= last_yr=;&lt;BR /&gt;
2212  run;&lt;BR /&gt;
                     &lt;BR /&gt;
_N_=1 name=edna bday_orig=18581 first_bd=1 last_bd=1 year=2010 first_yr=1 last_yr=1&lt;BR /&gt;
_N_=2 name=dana bday_orig=15187 first_bd=1 last_bd=1 year=2001 first_yr=1 last_yr=1&lt;BR /&gt;
_N_=3 name=carl bday_orig=14760 first_bd=1 last_bd=1 year=2000 first_yr=1 last_yr=0&lt;BR /&gt;
_N_=4 name=barb bday_orig=14699 first_bd=1 last_bd=1 year=2000 first_yr=0 last_yr=1&lt;BR /&gt;
_N_=5 name=alan bday_orig=-3334 first_bd=1 last_bd=1 year=1950 first_yr=1 last_yr=1&lt;BR /&gt;
NOTE: There were 5 observations read from the data set WORK.WHAT_ARE_DATES.&lt;BR /&gt;
NOTE: The data set WORK.CK_FIRST_LAST has 5 observations and 13 variables.&lt;BR /&gt;
NOTE: DATA statement used (Total process time):&lt;BR /&gt;
      real time           0.00 seconds&lt;BR /&gt;
      cpu time            0.00 seconds&lt;BR /&gt;
[/pre]&lt;BR /&gt;
               &lt;BR /&gt;
In the above program, BDAY_ORIG -- is showing the internally stored number for birthdate and the automatic variable LAST.BDAY_ORIG (captured into the LAST_BD variable) is set to 1 on EVERY row.&lt;BR /&gt;
 &lt;BR /&gt;
So that is a LONG answer to your question. The short answer to your question&lt;BR /&gt;
&lt;B&gt;- if SAS can distinguish between march vs may in 30/3/2000 and 30/5/2000, why the code using rep_date in last.rep_date not work? &lt;/B&gt; &lt;BR /&gt;
 &lt;BR /&gt;
is that SAS ONLY distinguishes between &lt;U&gt;14699&lt;/U&gt; and &lt;U&gt;14760&lt;/U&gt; internally -- it is your format that gives "date-meaning" to the numbers. When you used the YEAR function, you told SAS to figure out the date for the internally stored number and to extract the 4 digit year from the date into a separate variable. &lt;BR /&gt;
 &lt;BR /&gt;
cynthia</description>
      <pubDate>Sat, 04 Dec 2010 16:30:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/A-Proc-Sort-Mystery-Please-help/m-p/43877#M11518</guid>
      <dc:creator>Cynthia_sas</dc:creator>
      <dc:date>2010-12-04T16:30:42Z</dc:date>
    </item>
    <item>
      <title>Re: A Proc Sort Mystery. Please help!</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/A-Proc-Sort-Mystery-Please-help/m-p/43878#M11519</link>
      <description>Hello, SmilingMelbourne,&lt;BR /&gt;
&lt;BR /&gt;
"I still don't understand why:&lt;BR /&gt;
- if year=year(rep_date) works, i.e. it can tell the difference between e.g. 30/3/2000 and 30/5/2000 , then it means that SAS knows that these 2 dates have are different in months (march vs may) and it can sort in descending order or so"&lt;BR /&gt;
&lt;BR /&gt;
In reality SAS does NOT distinguish between these dates. It selects the one that happens to be the LAST in your initial dataset for the current sort group. If I placed these dates in the order: &lt;BR /&gt;
30/3/2000 &lt;BR /&gt;
30/5/2000 &lt;BR /&gt;
then I got 30/5/2000 as the result. If I exchanged their places I got 30/3/2000.&lt;BR /&gt;
&lt;BR /&gt;
If you would like to select the LAST date of the year you have to make a bit change in your program: &lt;BR /&gt;
[pre]&lt;BR /&gt;
data i;&lt;BR /&gt;
  input owner_id security_id shr_held value_held rep_date ddmmyy10.;&lt;BR /&gt;
  format rep_date date7.;&lt;BR /&gt;
datalines;&lt;BR /&gt;
2000004 29191 0 0 31/03/2004&lt;BR /&gt;
2000004 113563 666642 2266583 31/12/2004&lt;BR /&gt;
2000004 113563 666642 2119922 31/03/2005&lt;BR /&gt;
2000004 113563 0 0 31/05/2005&lt;BR /&gt;
2000004 113571 80801 1260382 30/09/2005&lt;BR /&gt;
2000004 113571 209629 3136985 31/12/2005&lt;BR /&gt;
2000004 113571 87283 1443286 31/03/2006&lt;BR /&gt;
2000004 113571 91624 1555180 30/06/2006&lt;BR /&gt;
2000004 113571 89039 1493031 30/09/2006&lt;BR /&gt;
2000004 113571 0 0 31/12/2006&lt;BR /&gt;
2000004 113571 0 0 31/05/2000&lt;BR /&gt;
2000004 113571 0 0 31/03/2000&lt;BR /&gt;
;&lt;BR /&gt;
run;&lt;BR /&gt;
data mf;&lt;BR /&gt;
  set i;&lt;BR /&gt;
  year=year(rep_If you would like to date);&lt;BR /&gt;
run;&lt;BR /&gt;
proc sort data=mf;&lt;BR /&gt;
by owner_id security_id year REP_DATE;&lt;BR /&gt;
data nodup;&lt;BR /&gt;
  set mf;&lt;BR /&gt;
  by owner_id security_id year;&lt;BR /&gt;
  if last.year;&lt;BR /&gt;
run;&lt;BR /&gt;
[/pre]&lt;BR /&gt;
Look at additional sorting by rep_date.&lt;BR /&gt;
Sincerely,&lt;BR /&gt;
SPR</description>
      <pubDate>Mon, 06 Dec 2010 17:39:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/A-Proc-Sort-Mystery-Please-help/m-p/43878#M11519</guid>
      <dc:creator>SPR</dc:creator>
      <dc:date>2010-12-06T17:39:01Z</dc:date>
    </item>
    <item>
      <title>Re: A Proc Sort Mystery. Please help!</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/A-Proc-Sort-Mystery-Please-help/m-p/43879#M11520</link>
      <description>Thank you all very much. It is now all clear to me.</description>
      <pubDate>Thu, 09 Dec 2010 04:22:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/A-Proc-Sort-Mystery-Please-help/m-p/43879#M11520</guid>
      <dc:creator>smilingmelbourne</dc:creator>
      <dc:date>2010-12-09T04:22:05Z</dc:date>
    </item>
  </channel>
</rss>

