Hi guys,
I have a problem and I hope I can get a solution here 🙂
I have to output a record base on a date. The thing is that for the same h_postal, which is the key, I have 2 records.
It would be easy if I could say...
if ((&date >= birth_da) and (&date < ret_date)) then output;
else delete;
But there is a "gap" between the dates. So what appends if my date is "20020101"... I loose records!
What I would like to do is, example,
if date = "20040101" then output the second record.
if date = "20020101" then output the first record. (the one that falls into the gap)
I hope that is clear enough. English is not my primary language 😞
Mylene
That's because it should using LAST.H_POSTAL instead of last.birth_da.
Here is a test.
data have;
input (geo h_postal birth_da ret_date) ($);
cards;
1 1 19830401 20010601
1 1 20030501 19000001
run;
%let refmonth=200201;
data want ;
set have;
where birth_da <=: "&refmonth" ;
by geo h_postal birth_da;
if last.h_postal;
put (_all_) (=);
run;
Partly it depends on what the dates mean.
If the dates in the database represent a interval of time and your are selecting records based on whether the date in the macro variable falls within the interval then you probably DO want to lose those records.
Also it looks like your date variables are not actual dates. 19000001 is not a valid date because you cannot have month number zero. I suspect that '19000001' is actually an indication that the interval is open ended.
So perhaps you just need to add that fact to your logic?
if ((&date >= birth_da) and (&date < ret_date or ret_date = '19000001')) then output;
Assumptions:
(1) no two records for the same key have overlapping dates (i.e. if birth_da for record 1 < birth_da for record 2 on the same key, then ret_date for record 1 < birth_da for record 2)
(2) birth_da is nonmissing
(3) missing value for ret_date indicates not yet retired (so can only occur in the last record)
So for each key, you want to return the latest birth_da that is not greater than the date?
I think the easiest way to do this in simple SAS is to reverse-sort and use LAG to find the next date:
%let date='15Jul2001'd;
data have;
input recordid h_postal $ birth_da ret_date;
informat birth_da ret_date ANYDTDTE20.;
format birth_da ret_date DATE9.;
cards;
1 A 01Jan2000 01Jun2001
2 A 01Jan2002 12Feb2003
3 B 01Jan2000 01Jun2001
4 B 01Jul2001 12Feb2003
5 C 01Jan1999 31Dec1999
6 C 01Jan2001 01Feb2001
7 C 01Jan2003 12feb2003
8 D 01Jan2005 01Jan2007
9 D 01Jan2009 .
10 E 01Jan2001 .
;
run;
proc sort data=have out=have_reverse;
by h_postal descending birth_da;
quit;
data want;
set have_reverse;
by h_postal;
next_birth_date=lag(birth_da);
if first.h_postal then next_birth_date=.;
/* because we have sorted in reverse date order, first.h_postal indicates that this is the most recent entry for this unit
(highest value of birth_da) */
if (next_birth_date<=ret_date OR missing(ret_date)) AND not missing(next_birth_date) then put "ERROR: dates overlap";
if (birth_da<=&date) AND ((next_birth_date>&date) OR missing(next_birth_date)) then output;
drop next_birth_date;
run;
Edit: before using this you'll need to convert to SAS date variables (which you should be using anyway for this sort of work) and as per Tom's comment, if a date of "190000001" indicates "not yet retired" this should probably be interpreted as missing.
Message was edited by: Geoffrey Brent
I am so sorry guys!! I knew it was not describe properly... let’s try again J
I also just realized that I had to change the value of “infinity” for this to work.
Let’s look at my real code. “date” is a cycle entered to process data. In my program, it is reference to as “refmonth”, which is a char variable, length 6 (ex.:201108).
data sg&georef (drop= birth retired);
length birth $6 retired $6;
set sgcme.sg&georef;
birth = substr(birth_da,1,6);
if ret_date = "19000001" then retired = "999999";
else retired = substr(ret_date,1,6);
if ((&refmonth >= birth) and (&refmonth < retired)) then output; else delete;
run;
Based on my client’s specifications, if we are processing refmonth = 200201 with a postal code = “A0A2V0”, he wants me to output the values of the first record. With the piece of code above, I lose those records.
So when the specified month falls between intervals they want the last interval before the specified month.
Try this simplified logic.
I am assuming that your ID variables are GEO and H_POSTAL based on the sample data you posted.
I am also using : modifier on the <= operator so that the comparison will only be made up to shorter string. In this case the macro variable. So a birth_dt of 10JAN2010 ('20100110') would be considered less than or equal to January 2010 ('201001').
data sg&georef ;
set sgcme.sg&georef;
where birth_da <=: &refmonth ;
by geo h_postal birth_da;
if last.birth_da;
run;
Still not working!! If my refmonth = 200401, it has to pick up the second record!!
Please also keep in mind that the "birth" and "retired" variables are character, so are "refmonth", "birth_da" and "ret_date"... and there is nothing I can do about it!!
That's because it should using LAST.H_POSTAL instead of last.birth_da.
Here is a test.
data have;
input (geo h_postal birth_da ret_date) ($);
cards;
1 1 19830401 20010601
1 1 20030501 19000001
run;
%let refmonth=200201;
data want ;
set have;
where birth_da <=: "&refmonth" ;
by geo h_postal birth_da;
if last.h_postal;
put (_all_) (=);
run;
Tom.... you are a genius!!! Thanks a lot!!
And sorry for the other answers I got, that I haven't tried. Maybe there were good answers, but this one is working. Thanks to all of you anyway 🙂
Does this make sense? If you get to the final observation for GEO/H_POSTAL, and nothing has been output yet, then output that last one regardless of the dates. That would be relatively easy to code:
data selected;
set have;
by geo h_postal;
if first.h_postal then flag=0;
length birth retired $ 6;
birth = birth_da; /* substr is unnecessary when length is already set */
if ret_date='19000001' then retired='999999';
else retired = ret_date;
if (birth <= &refmonth < retired) then do;
output;
flag + 1;
end;
if last.h_postal and flag=0 then output;
run;
Setting the rules is most of the battle here.
You didn't post enough sampe data which make your explanation vague .
data have; input recordid h_postal $ birth_da ret_date; informat birth_da ret_date ANYDTDTE20.; format birth_da ret_date DATE9.; cards; 1 1 19830401 20010601 1 1 20030501 19000001 2 A 01Jan2002 12Feb2003 3 B 01Jan2000 01Jun2001 ; run; %let refmonth='01jan2004'd ; data want(drop=_: flag); merge have have(firstobs=2 keep=recordid birth_da rename=(recordid=_recordid birth_da=_birth_da)) ; retain flag; if recordid ne lag(recordid) then flag=0; if (recordid eq _recordid) and (&refmonth lt _birth_da) then do;flag=1;output;end; else if not flag and (recordid ne _recordid) then output; run;
Ksharp
You didn't post enough sampe data which make your explanation vague .
data have; input recordid h_postal $ birth_da ret_date; informat birth_da ret_date ANYDTDTE20.; format birth_da ret_date DATE9.; cards; 1 1 19830401 20010601 1 1 20030501 19000001 2 A 01Jan2002 12Feb2003 3 B 01Jan2000 01Jun2001 ; run; %let refmonth='01jan2004'd ; data want(drop=_: flag); merge have have(firstobs=2 keep=recordid birth_da rename=(recordid=_recordid birth_da=_birth_da)) ; retain flag; if recordid ne lag(recordid) then flag=0; if (recordid eq _recordid) and (&refmonth lt _birth_da) then do;flag=1;output;end; else if not flag and (recordid ne _recordid) then output; run;
Ksharp
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.