Help using Base SAS procedures

Newbie Programming Question re. Cleaning Data

Reply
N/A
Posts: 0

Newbie Programming Question re. Cleaning Data

Hi,

I have about 100,000 observations which are divided into groups of 3-10 obervations. Each obervation has a year variable. I need to delete all observations which do not have a corresponding observation in the same group whose year is five less.

The approach I want to take it to create a set for each group that includes all of the years in that group. Then I would delete those observations for whom their year is not five greater than some year in that set. I just don't know how to create and work with such a "set."

Help? Thoughts? Thanks.

NickG
Super Contributor
Super Contributor
Posts: 3,174

Re: Newbie Programming Question re. Cleaning Data

Sort the file based on your "group" identifier, then using a DATA step, define a BY statement for your "group" variable, and use IF / THEN construct(s) with FIRST. and/or LAST. to "RETAIN" some desired "year" variable value to test against each observation. Only output those observations that meet your year filter criteria.

Scott Barry
SBBWorks, Inc.

Recommended Google advanced search argument for this topic/post:

data step by group processing site:sas.com
Valued Guide
Posts: 2,175

Re: Newbie Programming Question re. Cleaning Data

Developing Scott's idea for a data step, you have an extra challenge, since the range of "year" in a "group" would be determined by a pass through all rows for a "group". Only after that can you filter the groups.
This kind of "double pass" is most often done in sql. Here is a data step approach
imagine your data are in the "group" order[pre]
data reduced ;
set original( in=checking ) original(in= filtering ) ;
by group ;
if first.group then do ;
y1=999999999; y2=0;
end;
if checking then do ;
* collect spread info ;
y1 = min( y1, year) ;
y2 = max( y2, year) ;
end ;
if filtering ;[/pre] * first time through is only for checking the min/max year in "group" now we do some "output-ing" ;[pre] if (y2-y1) GT 5 then output ;
retain y1 y2 ;
drop y1 y2 ;
run ;[/pre]
N/A
Posts: 0

Re: Newbie Programming Question re. Cleaning Data

You can try with proc sql.



data original;
input id c_date MMDDYY10.;
datalines;
1 12/01/2007
2 01/10/2003
3 02/10/1993
4 05/11/1998
5 06/03/1988
;


proc sql;
delete from original
where year(c_date) not in (select year(c_date)+5 from original)
;
quit;


proc print
data=original;
format c_date MMDDYY10.;
run;



Also the following thread might help you:
http://support.sas.com/forums/thread.jspa?threadID=5281
N/A
Posts: 0

Re: Newbie Programming Question re. Cleaning Data

Thanks everyone for the help. Unfortunately, I am not advanced enough to follow your arguments. Can someone recommend a good book? I need to understand the processing flow that SAS uses as well as a bunch of functions.

NickG
N/A
Posts: 0

Re: Newbie Programming Question re. Cleaning Data

Ask a Question
Discussion stats
  • 5 replies
  • 125 views
  • 0 likes
  • 3 in conversation