Dear All,
Can I please seek your help on this by using sql?
I have a dataset containing id, date_1, date_2 and indication.
I would like to derive get, keep, date_3 and group.
Thanking you in advance.
id date_1 date_2 indication keep date_3 group 1 01-02-2020 20-02-2020 Y 1 01-02-2018 03-03-2018 Y 1 01-02-1990 20-02-1990 N Y 20-02-1990 Gen 1 01-02-1990 20-02-1990 Y Y 20-02-1990 Gen 1 01-02-1990 20-02-1990 Y Y 20-02-1990 Gen 1 01-02-1990 20-02-1990 N Y 20-02-1990 Gen 2 01-06-2009 25-08-2010 Y Y Y 2 01-06-2009 25-08-2010 N Y N 2 01-06-2009 25-08-2010 N Y N 2 01-06-2009 25-08-2010 Y Y Y 2 01-05-2018 08-06-2019 Y 2 01-05-2018 04-12-2019 Y
Please post the data in usable form and explain why you have to use proc sql. Is this homework?
Thanks @andreas_lds for your response.
It is not homework.
It is a claim data with >1billion data rows.
I was advised to use sql because it will run faster but I am not good at sql.
Thanks.
If the data is stored in a database, i agree: using proc sql with passthrough is a good idea, but if the data is stored in a dataset, i would avoid using proc sql, especially if it is large. But maybe @ChrisNZ can share his thoughts about performance.
Yes, you are correct. The data is stored in a database.
Thanks @andreas_lds for pointing out.
It is always a good idea to pass the SQL to the remote database.
The reason is not that the database is faster than SAS (It could be. Or not), but that the data transfer is always very slow, especially for the volumes mentioned.
The code I provided can be adapted:
- Write the SQL as pass-through
- Change the date-comparison test to a syntax appropriate to the database rather than using function INTNX()
- Change the IFC() function to a CASE clause
- Combine both statements into a single statement using a subclause
Merging the original data to the flagged data will also require some other logic as presumably the row number cannot be used.
This query gives the same result as the previous query, and is easier to translate :
proc sql;
select *
, month(DATE_1)=month(DATE_2) and year(DATE_1)=year(DATE_2) as EQ
, KEEP='Y' as K
, case when calculated K and calculated EQ=1 then 'Gen'
when calculated K and calculated EQ=0 then INDICATION
else ' ' end as GROUP
from HAVE
group by ID, KEEP
order by ROW;
Sorry your need is unclear.
Which date do we consider to assess same month and year?
What happens is same month but not same year, or same year but not same month?
This should get you started:
data HAVE;
ROW=_N_;
infile cards missover;
input ID DATE_1 ddmmyy10. DATE_2 : ddmmyy10. INDICATION $ KEEP $;
format DATE_1 ddmmyy10. DATE_2 : ddmmyy10.;
cards;
1 01-02-2020 20-02-2020 Y
1 01-02-2018 03-03-2018 Y
1 01-02-1990 20-02-1990 N Y 20-02-1990
1 01-02-1990 20-02-1990 Y Y 20-02-1990
1 01-02-1990 20-02-1990 Y Y 20-02-1990
1 01-02-1990 20-02-1990 N Y 20-02-1990
2 01-06-2009 25-08-2010 Y Y
2 01-06-2009 25-08-2010 N Y
2 01-06-2009 25-08-2010 N Y
2 01-06-2009 25-08-2010 Y Y
2 01-05-2018 08-06-2019 Y
2 01-05-2018 04-12-2019 Y
run;
proc sql;
create table KEEP_Y as
select *, intnx('month',min(DATE_1),0) = intnx('month',min(DATE_2),0) as EQ
from HAVE
where KEEP='Y'
group by ID;
proc sql;
create table WANT as
select h.*, ifc(EQ=1 & h.KEEP='Y','Gen',ifc(EQ=0 & h.KEEP='Y',h.INDICATION,' ')) as GROUP
from HAVE h
left join
KEEP_Y k
on h.ROW=k.ROW
order by h.ROW;
ROW | ID | DATE_1 | DATE_2 | INDICATION | KEEP | GROUP |
---|---|---|---|---|---|---|
1 | 1 | 01/02/2020 | 20/02/2020 | Y | ||
2 | 1 | 01/02/2018 | 03/03/2018 | Y | ||
3 | 1 | 01/02/1990 | 20/02/1990 | N | Y | Gen |
4 | 1 | 01/02/1990 | 20/02/1990 | Y | Y | Gen |
5 | 1 | 01/02/1990 | 20/02/1990 | Y | Y | Gen |
6 | 1 | 01/02/1990 | 20/02/1990 | N | Y | Gen |
7 | 2 | 01/06/2009 | 25/08/2010 | Y | Y | Y |
8 | 2 | 01/06/2009 | 25/08/2010 | N | Y | N |
9 | 2 | 01/06/2009 | 25/08/2010 | N | Y | N |
10 | 2 | 01/06/2009 | 25/08/2010 | Y | Y | Y |
11 | 2 | 01/05/2018 | 08/06/2019 | Y | ||
12 | 2 | 01/05/2018 | 04/12/2019 | Y |
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.