DATA Step, Macro, Functions and more

Selecting the right record...

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 8
Accepted Solution

Selecting the right record...

Hi guys,

I have a problem and I hope I can get a solution here :-)


I have to output a record base on a date.  The thing is that for the same h_postal, which is the key, I have 2 records.


Capture.JPG

It would be easy if I could say...

if ((&date >= birth_da) and (&date < ret_date)) then output;

else delete;


But there is a "gap" between the dates.  So what appends if my date is "20020101"...  I loose records!


What I would like to do is, example, 

if date = "20040101" then output the second record. 

if date = "20020101" then output the first record. (the one that falls into the gap)



I hope that is clear enough.  English is not my primary language :-(


Mylene


Accepted Solutions
Solution
‎08-22-2012 03:38 PM
Super User
Super User
Posts: 7,076

Re: Selecting the right record...

That's because it should using LAST.H_POSTAL instead of last.birth_da.

Here is a test.

data have;

  input (geo h_postal birth_da ret_date) ($);

cards;

1 1 19830401 20010601

1 1 20030501 19000001

run;

%let refmonth=200201;


data want ;

  set have;

  where birth_da <=: "&refmonth" ;

  by geo h_postal birth_da;

  if last.h_postal;

  put (_all_) (=);

run;

View solution in original post


All Replies
Super User
Super User
Posts: 7,076

Re: Selecting the right record...

Partly it depends on what the dates mean.

If the dates in the database represent a interval of time and your are selecting records based on whether the date in the macro variable falls within the interval then you probably DO want to lose those records.

Also it looks like your date variables are not actual dates. 19000001 is not a valid date because you cannot have month number zero.  I suspect that '19000001' is actually an indication that the interval is open ended.

So perhaps you just need to add that fact to your logic?

if ((&date >= birth_da) and (&date < ret_date or ret_date = '19000001')) then output;

Contributor
Posts: 30

Re: Selecting the right record...

Assumptions:

(1) no two records for the same key have overlapping dates (i.e. if birth_da for record 1 < birth_da for record 2 on the same key, then ret_date for record 1 < birth_da for record 2)

(2) birth_da is nonmissing

(3) missing value for ret_date indicates not yet retired (so can only occur in the last record)

So for each key, you want to return the latest birth_da that is not greater than the date?

I think the easiest way to do this in simple SAS is to reverse-sort and use LAG to find the next date:

%let date='15Jul2001'd;

data have;

input recordid h_postal $ birth_da ret_date;

informat birth_da ret_date ANYDTDTE20.;

format birth_da ret_date DATE9.;

cards;

1 A 01Jan2000 01Jun2001

2 A 01Jan2002 12Feb2003

3 B 01Jan2000 01Jun2001

4 B 01Jul2001 12Feb2003

5 C 01Jan1999 31Dec1999

6 C 01Jan2001 01Feb2001

7 C 01Jan2003 12feb2003

8 D 01Jan2005 01Jan2007

9 D 01Jan2009 .

10 E 01Jan2001 .

;

run;

proc sort data=have out=have_reverse;

by h_postal descending birth_da;

quit;

data want;

set have_reverse;

by h_postal;

next_birth_date=lag(birth_da);

if first.h_postal then next_birth_date=.;

/* because we have sorted in reverse date order, first.h_postal indicates that this is the most recent entry for this unit

(highest value of birth_da) */

if (next_birth_date<=ret_date OR missing(ret_date)) AND not missing(next_birth_date) then put "ERROR: dates overlap";

if (birth_da<=&date) AND ((next_birth_date>&date) OR missing(next_birth_date)) then output;

drop next_birth_date;

run;

Edit: before using this you'll need to convert to SAS date variables (which you should be using anyway for this sort of work) and as per Tom's comment, if a date of "190000001" indicates "not yet retired" this should probably be interpreted as missing.

Message was edited by: Geoffrey Brent

Occasional Contributor
Posts: 8

Re: Selecting the right record...

Posted in reply to GeoffreyBrent

I am so sorry guys!!  I knew it was not describe properly...  let’s try again J

I also just realized that I had to change the value of “infinity” for this to work.

Let’s look at my real code.  “date” is a cycle entered to process data.  In my program, it is reference to as “refmonth”, which is a char variable, length 6 (ex.:201108).

        data sg&georef (drop= birth retired);

        length birth $6 retired $6;

            set sgcme.sg&georef;

            birth = substr(birth_da,1,6);

                if ret_date = "19000001" then retired = "999999";

                else retired = substr(ret_date,1,6);

        if ((&refmonth >= birth) and (&refmonth < retired)) then output; else delete;

        run;

Based on my client’s specifications, if we are processing refmonth = 200201 with a postal code = “A0A2V0, he wants me to output the values of the first record.  With the piece of code above, I lose those records.

Super User
Super User
Posts: 7,076

Re: Selecting the right record...

So when the specified month falls between intervals they want the last interval before the specified month.

Try this simplified logic. 

I am assuming that your ID variables are GEO and H_POSTAL based on the sample data you posted.

I am also using : modifier on the <= operator so that the comparison will only be made up to shorter string. In this case the macro variable.  So a birth_dt of 10JAN2010 ('20100110') would be considered less than or equal to January 2010 ('201001').

data sg&georef ;

  set sgcme.sg&georef;

   where birth_da <=: &refmonth ;

   by geo h_postal birth_da;

   if last.birth_da;

run;

Occasional Contributor
Posts: 8

Re: Selecting the right record...

Still not working!!  If my refmonth = 200401, it has to pick up the second record!!

Please also keep in mind that the "birth" and "retired" variables are character, so are "refmonth",  "birth_da" and "ret_date"...  and there is nothing I can do about it!!

Solution
‎08-22-2012 03:38 PM
Super User
Super User
Posts: 7,076

Re: Selecting the right record...

That's because it should using LAST.H_POSTAL instead of last.birth_da.

Here is a test.

data have;

  input (geo h_postal birth_da ret_date) ($);

cards;

1 1 19830401 20010601

1 1 20030501 19000001

run;

%let refmonth=200201;


data want ;

  set have;

  where birth_da <=: "&refmonth" ;

  by geo h_postal birth_da;

  if last.h_postal;

  put (_all_) (=);

run;

Occasional Contributor
Posts: 8

Re: Selecting the right record...

Tom....  you are a genius!!!  Thanks a lot!!

And sorry for the other answers I got, that I haven't tried.  Maybe there were good answers, but this one is working.  Thanks to all of you anyway :-)

Super User
Posts: 5,516

Re: Selecting the right record...

Does this make sense?  If you get to the final observation for GEO/H_POSTAL, and nothing has been output yet, then output that last one regardless of the dates.  That would be relatively easy to code:

data selected;

   set have;

   by geo h_postal;

   if first.h_postal then flag=0;

   length birth retired $ 6;

   birth = birth_da;  /* substr is unnecessary when length is already set */

   if ret_date='19000001' then retired='999999';

   else retired = ret_date;

   if (birth <= &refmonth < retired) then do;

      output;

      flag + 1;

   end;

   if last.h_postal and flag=0 then output;

run;

Setting the rules is most of the battle here.

Super User
Posts: 10,044

Re: Selecting the right record...

You didn't post enough sampe data which make your explanation vague .

data have;
input recordid h_postal $ birth_da ret_date;
informat birth_da ret_date ANYDTDTE20.;
format birth_da ret_date DATE9.;
cards;
1 1 19830401 20010601
1 1 20030501 19000001
2 A 01Jan2002 12Feb2003
3 B 01Jan2000 01Jun2001
;
run;

 

%let refmonth='01jan2004'd ;

data want(drop=_: flag);
 merge have have(firstobs=2 keep=recordid birth_da rename=(recordid=_recordid birth_da=_birth_da)) ;
 retain flag;
 if  recordid ne lag(recordid) then flag=0;
 if  (recordid eq _recordid) and (&refmonth lt _birth_da) then do;flag=1;output;end;
  else if not flag and (recordid ne _recordid) then output;
run;




Ksharp

Super User
Posts: 10,044

Re: Selecting the right record...

You didn't post enough sampe data which make your explanation vague .

data have;
input recordid h_postal $ birth_da ret_date;
informat birth_da ret_date ANYDTDTE20.;
format birth_da ret_date DATE9.;
cards;
1 1 19830401 20010601
1 1 20030501 19000001
2 A 01Jan2002 12Feb2003
3 B 01Jan2000 01Jun2001
;
run;

 

%let refmonth='01jan2004'd ;

data want(drop=_: flag);
 merge have have(firstobs=2 keep=recordid birth_da rename=(recordid=_recordid birth_da=_birth_da)) ;
 retain flag;
 if  recordid ne lag(recordid) then flag=0;
 if  (recordid eq _recordid) and (&refmonth lt _birth_da) then do;flag=1;output;end;
  else if not flag and (recordid ne _recordid) then output;
run;




Ksharp

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 450 views
  • 0 likes
  • 5 in conversation