<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Newbie Programming Question re. Cleaning Data in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/Newbie-Programming-Question-re-Cleaning-Data/m-p/48157#M13010</link>
    <description>Developing Scott's idea for a data step, you have an extra challenge, since the range of "year" in a "group" would be determined by a pass through all rows for a "group". Only after that can you filter the groups.&lt;BR /&gt;
This kind of "double pass" is most often done in sql. Here is a data step approach&lt;BR /&gt;
imagine your data are in the "group" order[pre]&lt;BR /&gt;
data reduced ;&lt;BR /&gt;
   set original( in=checking ) original(in= filtering ) ;&lt;BR /&gt;
   by group ;&lt;BR /&gt;
   if first.group then do ;&lt;BR /&gt;
      y1=999999999; y2=0;&lt;BR /&gt;
   end;&lt;BR /&gt;
   if checking then do ;&lt;BR /&gt;
      * collect spread info ;&lt;BR /&gt;
      y1 = min( y1, year) ;&lt;BR /&gt;
      y2 = max( y2, year) ;&lt;BR /&gt;
   end ;&lt;BR /&gt;
   if filtering ;[/pre] * first time through is only for checking the min/max year in "group" now we do some "output-ing" ;[pre]   if (y2-y1) GT 5 then output ;&lt;BR /&gt;
   retain y1 y2 ;&lt;BR /&gt;
   drop   y1 y2 ;&lt;BR /&gt;
run ;[/pre]</description>
    <pubDate>Thu, 28 Jan 2010 11:28:31 GMT</pubDate>
    <dc:creator>Peter_C</dc:creator>
    <dc:date>2010-01-28T11:28:31Z</dc:date>
    <item>
      <title>Newbie Programming Question re. Cleaning Data</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Newbie-Programming-Question-re-Cleaning-Data/m-p/48154#M13007</link>
      <description>Hi,&lt;BR /&gt;
&lt;BR /&gt;
I have about 100,000 observations which are divided into groups of 3-10 obervations.  Each obervation has a year variable.  I need to delete all observations which do not have a corresponding observation in the same group whose year is five less.&lt;BR /&gt;
&lt;BR /&gt;
The approach I want to take it to create a set for each group that includes all of the years in that group.  Then I would delete those observations for whom their year is not five greater than some year in that set.  I just don't know how to create and work with such a "set."&lt;BR /&gt;
&lt;BR /&gt;
Help?  Thoughts?  Thanks.&lt;BR /&gt;
&lt;BR /&gt;
NickG</description>
      <pubDate>Wed, 27 Jan 2010 21:55:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Newbie-Programming-Question-re-Cleaning-Data/m-p/48154#M13007</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2010-01-27T21:55:23Z</dc:date>
    </item>
    <item>
      <title>Re: Newbie Programming Question re. Cleaning Data</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Newbie-Programming-Question-re-Cleaning-Data/m-p/48155#M13008</link>
      <description>Sort the file based on your "group" identifier, then using a DATA step, define a BY statement for your "group" variable, and use IF / THEN construct(s) with FIRST.&lt;VARNAME&gt; and/or LAST.&lt;VARNAME&gt;  to "RETAIN" some desired "year" variable value to test against each observation.  Only output those observations that meet your year filter criteria.&lt;BR /&gt;
&lt;BR /&gt;
Scott Barry&lt;BR /&gt;
SBBWorks, Inc.&lt;BR /&gt;
&lt;BR /&gt;
Recommended Google advanced search argument for this topic/post:&lt;BR /&gt;
&lt;BR /&gt;
data step by group processing site:sas.com&lt;/VARNAME&gt;&lt;/VARNAME&gt;</description>
      <pubDate>Wed, 27 Jan 2010 23:01:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Newbie-Programming-Question-re-Cleaning-Data/m-p/48155#M13008</guid>
      <dc:creator>sbb</dc:creator>
      <dc:date>2010-01-27T23:01:47Z</dc:date>
    </item>
    <item>
      <title>Re: Newbie Programming Question re. Cleaning Data</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Newbie-Programming-Question-re-Cleaning-Data/m-p/48156#M13009</link>
      <description>You can try with proc sql. &lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
data original;&lt;BR /&gt;
	input id c_date MMDDYY10.;&lt;BR /&gt;
	datalines;&lt;BR /&gt;
		1 12/01/2007&lt;BR /&gt;
		2 01/10/2003&lt;BR /&gt;
		3 02/10/1993&lt;BR /&gt;
		4 05/11/1998&lt;BR /&gt;
		5 06/03/1988&lt;BR /&gt;
;&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
proc sql;&lt;BR /&gt;
	delete from original&lt;BR /&gt;
		where year(c_date) not in (select year(c_date)+5 from original)&lt;BR /&gt;
;&lt;BR /&gt;
quit;&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
proc print&lt;BR /&gt;
	data=original;&lt;BR /&gt;
	format c_date MMDDYY10.;&lt;BR /&gt;
run; &lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
Also the following thread might help you:&lt;BR /&gt;
&lt;A href="http://support.sas.com/forums/thread.jspa?threadID=5281" target="_blank"&gt;http://support.sas.com/forums/thread.jspa?threadID=5281&lt;/A&gt;</description>
      <pubDate>Wed, 27 Jan 2010 23:30:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Newbie-Programming-Question-re-Cleaning-Data/m-p/48156#M13009</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2010-01-27T23:30:25Z</dc:date>
    </item>
    <item>
      <title>Re: Newbie Programming Question re. Cleaning Data</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Newbie-Programming-Question-re-Cleaning-Data/m-p/48157#M13010</link>
      <description>Developing Scott's idea for a data step, you have an extra challenge, since the range of "year" in a "group" would be determined by a pass through all rows for a "group". Only after that can you filter the groups.&lt;BR /&gt;
This kind of "double pass" is most often done in sql. Here is a data step approach&lt;BR /&gt;
imagine your data are in the "group" order[pre]&lt;BR /&gt;
data reduced ;&lt;BR /&gt;
   set original( in=checking ) original(in= filtering ) ;&lt;BR /&gt;
   by group ;&lt;BR /&gt;
   if first.group then do ;&lt;BR /&gt;
      y1=999999999; y2=0;&lt;BR /&gt;
   end;&lt;BR /&gt;
   if checking then do ;&lt;BR /&gt;
      * collect spread info ;&lt;BR /&gt;
      y1 = min( y1, year) ;&lt;BR /&gt;
      y2 = max( y2, year) ;&lt;BR /&gt;
   end ;&lt;BR /&gt;
   if filtering ;[/pre] * first time through is only for checking the min/max year in "group" now we do some "output-ing" ;[pre]   if (y2-y1) GT 5 then output ;&lt;BR /&gt;
   retain y1 y2 ;&lt;BR /&gt;
   drop   y1 y2 ;&lt;BR /&gt;
run ;[/pre]</description>
      <pubDate>Thu, 28 Jan 2010 11:28:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Newbie-Programming-Question-re-Cleaning-Data/m-p/48157#M13010</guid>
      <dc:creator>Peter_C</dc:creator>
      <dc:date>2010-01-28T11:28:31Z</dc:date>
    </item>
    <item>
      <title>Re: Newbie Programming Question re. Cleaning Data</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Newbie-Programming-Question-re-Cleaning-Data/m-p/48158#M13011</link>
      <description>Thanks everyone for the help.  Unfortunately, I am not advanced enough to follow your arguments.  Can someone recommend a good book?  I need to understand the processing flow that SAS uses as well as a bunch of functions.&lt;BR /&gt;
&lt;BR /&gt;
NickG</description>
      <pubDate>Thu, 28 Jan 2010 20:30:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Newbie-Programming-Question-re-Cleaning-Data/m-p/48158#M13011</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2010-01-28T20:30:34Z</dc:date>
    </item>
    <item>
      <title>Re: Newbie Programming Question re. Cleaning Data</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Newbie-Programming-Question-re-Cleaning-Data/m-p/48159#M13012</link>
      <description>You might want to read this:&lt;BR /&gt;
&lt;A href="http://www.sas.com/apps/pubscat/bookdetails.jsp?catid=1&amp;amp;pc=61860" target="_blank"&gt;http://www.sas.com/apps/pubscat/bookdetails.jsp?catid=1&amp;amp;pc=61860&lt;/A&gt;</description>
      <pubDate>Thu, 28 Jan 2010 21:26:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Newbie-Programming-Question-re-Cleaning-Data/m-p/48159#M13012</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2010-01-28T21:26:25Z</dc:date>
    </item>
  </channel>
</rss>

