BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
LanMin
Fluorite | Level 6

Hi, all

I thought of using data set if last. to keep records for the most recent srcdate

Data have:


 

GVKEY


 

 

SID


 

 

SALES


 

 

datadate


 

 

srcdate


 

 

1034


 

 

6


 

 
  1. 60.806
 

 

19991231


 

 

19991231


 

 

1034


 

 

8


 

 
  1. 16.051
 

 

19991231


 

 

19991231


 

 

1034


 

 

5


 

 
  1. 303.253
 

 

19991231


 

 

19991231


 

 

1034


 

 

4


 

 
  1. 197.301
 

 

19991231


 

 

19991231


 

 

1034


 

 

7


 

 
  1. 169.194
 

 

19991231


 

 

19991231


 

 

1034


 

 

9


 

 

0


 

 

19991231


 

 

19991231


 

 

1034


 

 

99


 

 

-4.429


 

 

19991231


 

 

19991231


 

 

1034


 

 

9


 

 

0


 

 

19991231


 

 

20001231


 

 

1034


 

 

8


 

 
  1. 16.051
 

 

19991231


 

 

20001231


 

 

1034


 

 

4


 

 
  1. 197.301
 

 

19991231


 

 

20001231


 

 

1034


 

 

6


 

 
  1. 60.806
 

 

19991231


 

 

20001231


 

 

1034


 

 

7


 

 
  1. 169.194
 

 

19991231


 

 

20001231


 

 

1034


 

 

5


 

 
  1. 303.253
 

 

19991231


 

 

20001231


 

 

1034


 

 

99


 

 

-4.429


 

 

19991231


 

 

20001231


 

 

1035


 

 

9


 

 

0


 

 

20001231


 

 

20001231


 

 

1035


 

 

7


 

 
  1. 305.714
 

 

20001231


 

 

20001231


 

 

1035


 

 

5


 

 
  1. 309.296
 

 

20001231


 

 

20001231


 

 

1035


 

 

99


 

 

-5.09


 

 

20001231


 

 

20011231


 

 

1035


 

 

6


 

 
  1. 62.692
 

 

20001231


 

 

20011231


 

 

1035


 

 

8


 

 
  1. 13.903
 

 

20001231


 

 

20011231


 

Expected output:

GVKEYSIDSALESdatadatesrcdate
1034901999123120001231
1034816.0511999123120001231
10344197.3011999123120001231
1034660.8061999123120001231
10347169.1941999123120001231
10345303.2531999123120001231
103499-4.4291999123120001231
103599-5.092000123120011231
1035662.6922000123120011231
1035813.9032000123120011231

my code below did not do it. Can you help?

sort data=TestData;


gvkey srcdate;


;



want;


TestData;


gvkey srcdate;


last.gvkey;


;

Lan

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Does not sounds like you want LAST. processing at all. Instead you want to select all records that have the maximum value of SRCDATE.  I am still a little unclear on what your other grouping variables are.  It seems clear that GVKEY and DATADATE are important, but it is no clear whether STYPE is a grouping variable or not because it is a constant in your sample data.  This query should do what want.

proc sql noprint ;

create table want as

   select *

   from have

   group by gvkey, stype, datadate

   having srcdate = max(srcdate)

;

quit;

But it also selects the observation where GVKEY=1035 and SID=99 because it has the same DATADATE and SRCDATE as the one with SID=10.  Perhaps that was on oversight in your original listing of the expected output?  Or is there another rule that would exclude that observation.

View solution in original post

12 REPLIES 12
Reeza
Super User

I think your code got cutoff. Can you repost the code and your log.

Tom
Super User Tom
Super User

I thought of using data set if last. to keep records for the most recent srcdate

Variables: GVKEY SID SALES datadate srcdate

If you want the most recent SRCDATE then you probably want to test for LAST.DATADATE, or what ever the sort variable nested above SRCDATE.

LAST.SRCDATE would be used to keep only one record for each level of SRCDATE.

art297
Opal | Level 21

Your code did get cut off but, from your desired output, it appears like you were missing something anyway.  We can only guess but, from your desired output, I think you may have to add one field to your code.  Will the following do what you want?

proc sort data=TestData;

  by gvkey SID srcdate;

run;


data want;

  set TestData;

  by gvkey SID;

  if last.SID;

run;


LanMin
Fluorite | Level 6

Many Thanks to everyone who replied to my post.

My formatting was off and code was cut off. I repost my data below

dataset have : this contains company operating segment sales
, the last column srcdate(source year) indicates shows all historical reporting
data, i.e., companies could refile their operating segment data with SEC,
therefore, the data contains more than one srcdate for each firm.

I want to keep the latest source year, e.g., 

for gvkey 1034, datadate=19991231, there were 2 srcdate,
the older srcdate= 19991231 with 7 rows, and the newer srcdate=20001231 with 7 rows, I only want to keep the 7 rows for newer srcdate=20001231
.

    

for gvkey 1035 I would like the same methodology. For datadate=19991231, there are 2 rows with only 1 srcdate, I want to
keep the 2 rows. For datadate= 20001231, there are 2 srcdate, I want to keep the latest srcdate=
20011231, with corresponding 2 rows kept.

GVKEY STYPE SID SALES datadate srcdate

1034 OPSEG 4 197.301 19991231 19991231

1034 OPSEG 5 303.253 19991231 19991231

1034 OPSEG 6 60.806 19991231 19991231

1034 OPSEG 7 169.194 19991231 19991231

1034 OPSEG 8 16.051 19991231 19991231

1034 OPSEG 9 0 19991231 19991231

1034 OPSEG 99 -4.429 19991231 19991231

1034 OPSEG 4 197.301 19991231 20001231

1034 OPSEG 5 303.253 19991231 20001231

1034 OPSEG 6 60.806 19991231 20001231

1034 OPSEG 7 169.194 19991231 20001231

1034 OPSEG 8 16.051 19991231 20001231

1034 OPSEG 9 0 19991231 20001231

1034 OPSEG 99 -4.429 19991231 20001231

1035 OPSEG 10 159.079 19991231 20011231

1035 OPSEG 99 -4.429 19991231 20011231

1035 OPSEG 4 233.008 20001231 20001231

1035 OPSEG 5 309.296 20001231 20001231

1035 OPSEG 6 62.692 20001231 20001231

1035 OPSEG 7 305.714 20001231 20001231

1035 OPSEG 8 13.903 20001231 20011231

1035 OPSEG 9 0 20001231 20011231

expected results:

GVKEY STYPE SID SALES datadate srcdate

1034 OPSEG 4 197.301 19991231 20001231

1034 OPSEG 5 303.253 19991231 20001231

1034 OPSEG 6 60.806 19991231 20001231

1034 OPSEG 7 169.194 19991231 20001231

1034 OPSEG 8 16.051 19991231 20001231

1034 OPSEG 9 0 19991231 20001231

1034 OPSEG 99 -4.429 19991231 20001231

1035 OPSEG 10 159.079 19991231 20011231

1035 OPSEG 8 13.903 20001231 20011231

1035 OPSEG 9 0 20001231 20011231

I want to use something like
proc sort data=TestData;

by gvkey sid srcdate;
run;

data want;
set TestData;
by gvkey sid srcdate;
if last.???;

run;

Please help and Thanks!

Lan

art297
Opal | Level 21

You have to explain why so many of the 1035 records aren't kept.  Using if last.sid provides the desired results for the 1034 records.

LanMin
Fluorite | Level 6

Hi, Arthur and other community memebers

the reason many of of the 1035 records aren't kept is that, I want to cmpare datadate with srcdate, for each datadate, only rows of the latest srcdate rows are kept.

think of datadate as the historical year, datadate=19991231, firm 1035 has 2 operseg , and the dataserver recorded such information on  srcdate=20001231

for  firm 1035 datadate=20001231, this firm's records were recorded two times, once on  srcdate=20001231, second time, on  srcdate=20011231, which is an update. I want to keep only the updated information for each firm, keep the latest records.

Please help with coding!

Lan

art297
Opal | Level 21

Are you saying that you want the last record if, and only if, datadate is less than srcdate?

If so, then the following should work:

data want;

  set TestData;

  by gvkey sid;

  if last.sid and datadate lt srcdate;

run;

LanMin
Fluorite | Level 6

Arthur,

I want to keep the last records (based on srcdate) for each appearance of datadate

so if

obs datadate srcdate

1 19991231 20011231

2 19991231 20011231

3 20001231 20001231

4 20001231 20001231

5 20001231 20001231

6 20001231 20001231

7 20001231 20011231

8 20001231 20011231

I need tokeep obs 1,2, 7 and 8.

using my original posting data:

for firm 1035 I also want to keep these 2 rows:

1035 OPSEG 10 159.079 19991231 20011231

1035 OPSEG 99 -4.429 19991231 20011231

that is the data want should look like:

GVKEY STYPE SID SALES datadate srcdate

1034 OPSEG 4 197.301 19991231 20001231

1034 OPSEG 5 303.253 19991231 20001231

1034 OPSEG 6 60.806 19991231 20001231

1034 OPSEG 7 169.194 19991231 20001231

1034 OPSEG 8 16.051 19991231 20001231

1034 OPSEG 9 0 19991231 20001231

1034 OPSEG 99 -4.429 19991231 20001231

1035 OPSEG 10 159.079 19991231 20011231

1035 OPSEG 10 159.079 19991231 20011231

1035 OPSEG 99 -4.429 19991231 20011231

1035 OPSEG 8 13.903 20001231 20011231

1035 OPSEG 9 0 20001231 20011231

using your code, I will not have

1035 OPSEG 10 159.079 19991231 20011231

1035 OPSEG 99 -4.429 19991231 20011231

art297
Opal | Level 21

Actually, I think the code I originally suggested did give you those records, although possibly for the wrong reasons.  I still don't understand what you are trying to accomplish (i.e., I don't understand your rules) and your stated desired output has an extra record in it now (for gvkey 1035, sid 10).  Does the following do what you want?

proc sort data=TestData;

  by gvkey datadate srcdate;

run;

data want (drop=maxdate);

  do until(last.datadate);

    set TestData;

    format maxdate date9.;

    by gvkey datadate;

    if last.datadate then maxdate=srcdate;

  end;

  do until(last.datadate);

    set TestData;

    by gvkey datadate;

    if srcdate eq maxdate then output;

  end;

run;

Tom
Super User Tom
Super User

Does not sounds like you want LAST. processing at all. Instead you want to select all records that have the maximum value of SRCDATE.  I am still a little unclear on what your other grouping variables are.  It seems clear that GVKEY and DATADATE are important, but it is no clear whether STYPE is a grouping variable or not because it is a constant in your sample data.  This query should do what want.

proc sql noprint ;

create table want as

   select *

   from have

   group by gvkey, stype, datadate

   having srcdate = max(srcdate)

;

quit;

But it also selects the observation where GVKEY=1035 and SID=99 because it has the same DATADATE and SRCDATE as the one with SID=10.  Perhaps that was on oversight in your original listing of the expected output?  Or is there another rule that would exclude that observation.

LanMin
Fluorite | Level 6

Thanks a lot, Tom!

your code works great!  you are correct, stype is not a grouping variable. I am not proficient with SAS, therefore, the first thing that came to mind is to use last.group to get the latest scrdate records.

BTW, if anyone has recommendation for good sas programming book, please share. I read little sas book, but that is not enough.

Thanks to all of you,

Lan

DBailey
Lapis Lazuli | Level 10

You can actually accomplish this via sorting.

proc sort data=TestData;

  by gvkey datadate descending srcdate;

run;

proc sort in=TestData out=TestData_Srt nodupkey;

by gvkey datadate;

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1491 views
  • 1 like
  • 5 in conversation