BookmarkSubscribeRSS Feed
Miracle
Barite | Level 11

Dear All,

Can I please seek your help on this by using sql?

I have a dataset containing id, date_1, date_2 and indication.

I would like to derive get, keep, date_3 and group.

  1. The rationale is to flag keep=Y for the all the data rows with the earliest dates of both date_1 and date_2 for all the ids
  2. If the data rows of keep=Y having both the same month and year i.e. id=1, then date_3=date_2 and group=Gen
  3. If the data rows of keep=Y having different month and year i.e. id=2, then group=Indication.

Thanking you in advance.

id	date_1	        date_2	       indication	keep	date_3	       group
1	01-02-2020	20-02-2020	Y			
1	01-02-2018	03-03-2018	Y			
1	01-02-1990	20-02-1990	N	        Y	       20-02-1990	Gen
1	01-02-1990	20-02-1990	Y	        Y	       20-02-1990	Gen
1	01-02-1990	20-02-1990	Y	        Y	       20-02-1990	Gen
1	01-02-1990	20-02-1990	N	        Y	       20-02-1990	Gen
2	01-06-2009	25-08-2010	Y	        Y		                        Y
2	01-06-2009	25-08-2010	N	        Y		                        N
2	01-06-2009	25-08-2010	N	        Y		                        N
2	01-06-2009	25-08-2010	Y	        Y		                        Y
2	01-05-2018	08-06-2019	Y			
2	01-05-2018	04-12-2019	Y			

 

7 REPLIES 7
andreas_lds
Jade | Level 19

Please post the data in usable form and explain why you have to use proc sql. Is this homework?

Miracle
Barite | Level 11

Thanks @andreas_lds  for your response.

It is not homework.

It is a claim data with >1billion data rows.

I was advised to use sql because it will run faster but I am not good at sql.
Thanks.

andreas_lds
Jade | Level 19

If the data is stored in a database, i agree: using proc sql with passthrough is a good idea, but if the data is stored in a dataset, i would avoid using proc sql, especially if it is large. But maybe @ChrisNZ can share his thoughts about performance.

 

Miracle
Barite | Level 11

Yes, you are correct. The data is stored in a database.
Thanks @andreas_lds  for pointing out.

ChrisNZ
Tourmaline | Level 20

It is always a good idea to pass the SQL to the remote database.

The reason is not that the database is faster than SAS (It could be. Or not), but that the data transfer is always very slow, especially for the volumes mentioned.

The code I provided can be adapted:

- Write the SQL as pass-through

- Change the date-comparison test to a syntax appropriate to the database rather than using function INTNX() 

- Change the IFC() function to a CASE clause

- Combine both statements into a single statement using a subclause

Merging the original data to the flagged data will also require some other logic as presumably the row number cannot be used. 

 

ChrisNZ
Tourmaline | Level 20

This query gives the same result as the previous query, and is easier to translate :

proc sql;
  select *
       , month(DATE_1)=month(DATE_2) and year(DATE_1)=year(DATE_2) as EQ
       , KEEP='Y' as K
       , case when calculated K and calculated EQ=1 then 'Gen'
              when calculated K and calculated EQ=0 then INDICATION
              else                                       ' '         end as GROUP
  from HAVE        
  group by ID, KEEP
  order by ROW;

 

ChrisNZ
Tourmaline | Level 20

Sorry your need is unclear.

Which date do we consider to assess same month and year?

What happens is same month but not same year, or same year but not same month?

This should get you started:

data HAVE;   
  ROW=_N_;
  infile cards missover;
  input ID DATE_1 ddmmyy10. DATE_2 : ddmmyy10.  INDICATION $ KEEP $;
  format   DATE_1 ddmmyy10. DATE_2 : ddmmyy10.;
cards;
1 01-02-2020 20-02-2020 Y   
1 01-02-2018 03-03-2018 Y   
1 01-02-1990 20-02-1990 N         Y        20-02-1990 
1 01-02-1990 20-02-1990 Y         Y        20-02-1990 
1 01-02-1990 20-02-1990 Y         Y        20-02-1990 
1 01-02-1990 20-02-1990 N         Y        20-02-1990 
2 01-06-2009 25-08-2010 Y         Y                   
2 01-06-2009 25-08-2010 N         Y                   
2 01-06-2009 25-08-2010 N         Y                   
2 01-06-2009 25-08-2010 Y         Y                   
2 01-05-2018 08-06-2019 Y   
2 01-05-2018 04-12-2019 Y   
run;
 
proc sql;    
  create table KEEP_Y as
  select *, intnx('month',min(DATE_1),0) =  intnx('month',min(DATE_2),0) as EQ  
  from HAVE  
  where KEEP='Y'  
  group by ID;
proc sql;
  create table WANT as 
  select h.*, ifc(EQ=1 & h.KEEP='Y','Gen',ifc(EQ=0 & h.KEEP='Y',h.INDICATION,' ')) as GROUP
  from HAVE        h
         left join
       KEEP_Y      k
         on  h.ROW=k.ROW
  order by h.ROW;
ROW ID DATE_1 DATE_2 INDICATION KEEP GROUP
1 1 01/02/2020 20/02/2020 Y    
2 1 01/02/2018 03/03/2018 Y    
3 1 01/02/1990 20/02/1990 N Y Gen
4 1 01/02/1990 20/02/1990 Y Y Gen
5 1 01/02/1990 20/02/1990 Y Y Gen
6 1 01/02/1990 20/02/1990 N Y Gen
7 2 01/06/2009 25/08/2010 Y Y Y
8 2 01/06/2009 25/08/2010 N Y N
9 2 01/06/2009 25/08/2010 N Y N
10 2 01/06/2009 25/08/2010 Y Y Y
11 2 01/05/2018 08/06/2019 Y    
12 2 01/05/2018 04/12/2019 Y    

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 841 views
  • 4 likes
  • 3 in conversation