Dear,
I need to create a variable in data set 'ONE' by comparing DATE variable from dataset 'ONE' and DATE and VALUE variables from data set 'TWO' .
If there is a record in dataset TWO with VALUE='N' and DATE greater than DATE in dataset ONE, then i need to create FLAG='Y'. If no such record then FLAG is "N"
Only for id=2, there is a record in data set TWO with DATE greater than date in ONE dataset and with value ="N'. Hence only ID=2 populated with flag='Y'. Please suggest in my code. I am getting all possible records. Thank you
output need;
id date flag
1 2019-01-18 N
2 2019-02-12 Y
3 2019-03-15 N
data one;
input id date yymmdd10.;
format date yymmdd10.;
datalines;
1 2019-01-18
2 2019-02-12
3 2019-03-15
;
data two;
input id date yymmdd10. value $;
format date yymmdd10.;
datalines;
1 2019-01-01 N
1 2019-01-11 Y
1 2019-01-18 N
2 2019-02-02 N
2 2019-02-22 N
3 2019-03-05 Y
3 2019-03-25 Y
;
proc sql;
create table three as
select *,
case
when a.date lt b.date and b.value='N' then 'Y'
else 'N' end as flag
from one as a left join two as b
on a.id=b.id
order by a.id,a.date;
quit;
Are you trying to produce one record per ID with flag representing the presence of one qualifying obs from dataset two? Or do you want the traditional "one left join two" results, with one record per qualifying join?
First, thank you for providing sample data in the form of a data step. It's much easier to test proposed solutions.
Here's mine: (edited to require date from two to be greated, not just greater or equal to date from one):
data one;
input id date yymmdd10.;
format date yymmdd10.;
datalines;
1 2019-01-18
2 2019-02-12
3 2019-03-15
;
data two;
input id date yymmdd10. value $;
format date yymmdd10.;
datalines;
1 2019-01-01 N
1 2019-01-11 Y
1 2019-01-18 N
2 2019-02-02 N
2 2019-02-22 N
3 2019-03-05 Y
3 2019-03-25 Y
;
data three (keep=id flag);
set one two (where=(value='N') in=intwo);
by id date;
if last.id;
if intwo=1 and first.date=1 then flag='Y'; /*Added "first.date=1" to properly satisfy date requirement*/
else flag='N';
run;
Give the 2 datasets are sorted by ID/DATE, it's quite easy to produce what you want in a DATA step:
An sql solution
proc sql;
create table threeb as
select a.id,
case when b.date>a.date then 'Y'
else 'N'
end as flag
from one as a
left join two (where=(value='N')) as b
on a.id=b.id
group by a.id
having b.date=max(b.date) ;
quit;
Thank you very much for sql code. It gives me the output i need after adding a.date in select statement. As i need to keep DATE variable from dataset ONE.
But in the data step code, i got the correct FLAG variable values but I am not able to keep same DATE variable values from dataset ONE. After keeping DATE variable in Keep statement, I am getting DATE values from dataset TWO for id=2.
code worked
proc sql;
create table threeb as
select a.id, a.date
case when b.date>a.date then 'Y'
else 'N'
end as flag
from one as a
left join two (where=(value='N')) as b
on a.id=b.id
group by a.id
having b.date=max(b.date) ;
quit;
Translate your requirement directly as a correlated subquery:
proc sql;
create table three as
select
id,
date,
case when exists (
select * from two where id=a.id and value="N" and date > a.date)
then "Y" else "N" end as flag
from one as a;
quit;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.