BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hi,

I have about 100,000 observations which are divided into groups of 3-10 obervations. Each obervation has a year variable. I need to delete all observations which do not have a corresponding observation in the same group whose year is five less.

The approach I want to take it to create a set for each group that includes all of the years in that group. Then I would delete those observations for whom their year is not five greater than some year in that set. I just don't know how to create and work with such a "set."

Help? Thoughts? Thanks.

NickG
5 REPLIES 5
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Sort the file based on your "group" identifier, then using a DATA step, define a BY statement for your "group" variable, and use IF / THEN construct(s) with FIRST. and/or LAST. to "RETAIN" some desired "year" variable value to test against each observation. Only output those observations that meet your year filter criteria.

Scott Barry
SBBWorks, Inc.

Recommended Google advanced search argument for this topic/post:

data step by group processing site:sas.com
Peter_C
Rhodochrosite | Level 12
Developing Scott's idea for a data step, you have an extra challenge, since the range of "year" in a "group" would be determined by a pass through all rows for a "group". Only after that can you filter the groups.
This kind of "double pass" is most often done in sql. Here is a data step approach
imagine your data are in the "group" order[pre]
data reduced ;
set original( in=checking ) original(in= filtering ) ;
by group ;
if first.group then do ;
y1=999999999; y2=0;
end;
if checking then do ;
* collect spread info ;
y1 = min( y1, year) ;
y2 = max( y2, year) ;
end ;
if filtering ;[/pre] * first time through is only for checking the min/max year in "group" now we do some "output-ing" ;[pre] if (y2-y1) GT 5 then output ;
retain y1 y2 ;
drop y1 y2 ;
run ;[/pre]
deleted_user
Not applicable
You can try with proc sql.



data original;
input id c_date MMDDYY10.;
datalines;
1 12/01/2007
2 01/10/2003
3 02/10/1993
4 05/11/1998
5 06/03/1988
;


proc sql;
delete from original
where year(c_date) not in (select year(c_date)+5 from original)
;
quit;


proc print
data=original;
format c_date MMDDYY10.;
run;



Also the following thread might help you:
http://support.sas.com/forums/thread.jspa?threadID=5281
deleted_user
Not applicable
Thanks everyone for the help. Unfortunately, I am not advanced enough to follow your arguments. Can someone recommend a good book? I need to understand the processing flow that SAS uses as well as a bunch of functions.

NickG

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 622 views
  • 0 likes
  • 3 in conversation