BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Ronein
Meteorite | Level 14

Hello
I have a data set with 60 million rows and 850 variables.
In this data set each customerID has 24 rows (Each row represent information of specific month).
I want to select customers that meet the condition that in month Nov 2021 they belong to population 7.
Then ,for this customer list I want to check for each customer if he/she failed to pay loans in months Jul2022 till Oct2022.
Then I want to calculate how many customers failed to pay loans in months jul2022 till oct2022.( if customer fail at least in one month ).
I created sql query that calculate it.
Since the data set is very big then I created a small data set (1 million rows) and run the query and in 1 minute I got the result.
However, when I run it on full data set then I waited long time and finally got  a message that it was terminated.

Then I run it again but didnt recieve any result .

 

Here is the code I run.
Any recommendation how to run code to receive the desired result?
 

proc sql;
create table want as
select count(*) as nr
from 
(select lakoach
from  csp.twoyrs_3
where VALID_FROM_DTTM='01NOV2022'd and  pop_lomeforat_wppr=7 ) as a
inner join
(select lakoach,
       max(ind_shlili_6) as calc_ind_shlili_1_5  
from  csp.twoyrs_3
where VALID_FROM_DTTM in ('01OCT2022'd,'01SEP2022'd ,'01AUG2022'd,'01JUL2022'd,'01JUN2022'd)
group by lakoach
having ind_shlili_6=1) as b
on a.lakoach=b.lakoach
;
quit;

Here is the Log when I run and waited long time and didnt get any result

 

1                                                          The SAS System                           13:07 Tuesday, September 3, 2024

1          ;*';*";*/;quit;run;
2          OPTIONS PAGENO=MIN;
3          %LET _CLIENTTASKLABEL='Program';
4          %LET _CLIENTPROCESSFLOWNAME='IV dist';
5          %LET _CLIENTPROJECTPATH='K:\aתחום מודלים\פיתוח מודל CS\פיתוח מודלים\עוסקים_7\Project_Models_Oskim_Ron_Gibuy.egp';
6          %LET _CLIENTPROJECTPATHHOST='VSK1H103A2014';
7          %LET _CLIENTPROJECTNAME='Project_Models_Oskim_Ron_Gibuy.egp';
8          %LET _SASPROGRAMFILE='';
9          %LET _SASPROGRAMFILEHOST='';
10         
11         ODS _ALL_ CLOSE;
12         OPTIONS DEV=PNG;
13         GOPTIONS XPIXELS=0 YPIXELS=0;
14         FILENAME EGSR TEMP;
15         ODS tagsets.sasreport13(ID=EGSR) FILE=EGSR
16             STYLE=HTMLBlue
17             STYLESHEET=(URL="file:///C:/Program%20Files/SASHome/SASEnterpriseGuide/7.1/Styles/HTMLBlue.css")
18             NOGTITLE
19             NOGFOOTNOTE
20             GPATH=&sasworklocation
21             ENCODING=UTF8
22             options(rolap="on")
23         ;
NOTE: Writing TAGSETS.SASREPORT13(EGSR) Body file: EGSR
24         
25         GOPTIONS ACCESSIBLE;
26         proc sql;
27         create table ttt2a as
28         select count(*) as nr
29         from
30         (select lakoach
31         from  csp.twoyrs_3
32         where VALID_FROM_DTTM='01NOV2022'd and  pop_lomeforat_wppr=7 ) as a
33         inner join
34         (select lakoach,
35                max(ind_shlili_6) as calc_ind_shlili_1_5
36         from  csp.twoyrs_3
37         where VALID_FROM_DTTM in ('01OCT2022'd,'01SEP2022'd ,'01AUG2022'd,'01JUL2022'd,'01JUN2022'd)
38         group by lakoach
39         having ind_shlili_6=1) as b
40         on a.lakoach=b.lakoach
41         ;
NOTE: The query requires remerging summary statistics back with the original data.
NOTE: Compression was disabled for data set WORK.TTT2A because compression overhead would increase the size of the data set.
NOTE: SAS threaded sort was used.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

So you have two selection criteria.

 

I want to select customers that meet the condition that in month Nov 2021 they belong to population 7.
Then ,for this customer list I want to check for each customer if he/she failed to pay loans in months Jul2022 till Oct2022.
Then I want to calculate how many customers failed to pay loans in months jul2022 till oct2022.( if customer fail at least in one month ).

From the code you posted it looks like you have data that uses LAKOACH and VALID_FROM_DTTM as the key variables.  

 

So assuming the data is already sorted (and if it is not sorted then that might help explain why the SQL code takes so long since SQL will have to sort the data first to do the grouping you asked for) then you could do this in one pass thru the data.

 

You can wrap the SET statement in a DO loop so that all of the observations for one customer are processed in one iteration of the data step.  When you see the appropriate observations for the months in question you can then set some flags to indicate if the condition is met. 

 

If you just want to count instead of the individual customer ids then you can use SUM statements to increment counters and output only one observation.

 

data want;
pop7=0;
failure=0;
do until(last.LAKOACH)
  set csp.twoyrs_3 end=eof;
  by LAKOACH;
  if (VALID_FROM_DTTM='01NOV2022'd and pop_lomeforat_wppr=7) then POP7=1;
  if (VALID_FROM_DTTM in ('01OCT2022'd,'01SEP2022'd ,'01AUG2022'd,'01JUL2022'd,'01JUN2022'd)
    and ind_shlili_6 ne 1 then FAILURE=1;
end;
  totalN+1;
  totalpop7 + POP7;
  totalpop7failure + (pop7 and failure);
  keep total: ;
  if eof;
run;

 

View solution in original post

12 REPLIES 12
Tom
Super User Tom
Super User

Is the data in SAS dataset(s)? Or are you connecting to a remote database?  What type of libname engine did you use when defining CSP libref?

 

If the data is in a dataset then a data step solution would probably be much faster.

If the data is in a remote database then you probably do not what to write SQL code that takes advantage of SAS's ability to automatically remerge since that might cause SAS to not push the query into the database.

 

Also if the data is in a remote database then ask the admin for the database to analyze the query and see if there are tricks you can use in that database to make the query run more quickly.

Ronein
Meteorite | Level 14
It is a sas data set located in a permanent library in sas.
Can you please show data step or proc step code that perform it well?

As I understand if data set is very big then not recommended use proc sql?
Why in log it is said that have remerge?
I don't see renerge in my query
Tom
Super User Tom
Super User

Most likely the remerge is being triggered by the HAVING clause in this subquery.

select lakoach
     , max(ind_shlili_6) as calc_ind_shlili_1_5  
  from csp.twoyrs_3
  where VALID_FROM_DTTM in ('01OCT2022'd,'01SEP2022'd ,'01AUG2022'd,'01JUL2022'd,'01JUN2022'd)
  group by lakoach
  having ind_shlili_6=1
Ronein
Meteorite | Level 14
Can you show please alternate code that work better?
Ronein
Meteorite | Level 14

I can change the proc sql to this one -maybe it is better because no remerge?

Anyway I would like to see data step/proc step way to do it ?

is it also one step or two steps?

 

proc sql;
create table want as
select sum(calc_ind_shlili_1_5) as nr
from 
(select lakoach
from  csp.twoyrs_3
where VALID_FROM_DTTM='01NOV2022'd and  pop_lomeforat_wppr=7 ) as a
inner join
(select lakoach,
       max(ind_shlili_6) as calc_ind_shlili_1_5  
from  csp.twoyrs_3
where VALID_FROM_DTTM in ('01OCT2022'd,'01SEP2022'd ,'01AUG2022'd,'01JUL2022'd,'01JUN2022'd)
group by lakoach) as b
on a.lakoach=b.lakoach
;
quit;
Ronein
Meteorite | Level 14

I removed the having statement but still long run and didnt finish  run

Tom
Super User Tom
Super User

So you have two selection criteria.

 

I want to select customers that meet the condition that in month Nov 2021 they belong to population 7.
Then ,for this customer list I want to check for each customer if he/she failed to pay loans in months Jul2022 till Oct2022.
Then I want to calculate how many customers failed to pay loans in months jul2022 till oct2022.( if customer fail at least in one month ).

From the code you posted it looks like you have data that uses LAKOACH and VALID_FROM_DTTM as the key variables.  

 

So assuming the data is already sorted (and if it is not sorted then that might help explain why the SQL code takes so long since SQL will have to sort the data first to do the grouping you asked for) then you could do this in one pass thru the data.

 

You can wrap the SET statement in a DO loop so that all of the observations for one customer are processed in one iteration of the data step.  When you see the appropriate observations for the months in question you can then set some flags to indicate if the condition is met. 

 

If you just want to count instead of the individual customer ids then you can use SUM statements to increment counters and output only one observation.

 

data want;
pop7=0;
failure=0;
do until(last.LAKOACH)
  set csp.twoyrs_3 end=eof;
  by LAKOACH;
  if (VALID_FROM_DTTM='01NOV2022'd and pop_lomeforat_wppr=7) then POP7=1;
  if (VALID_FROM_DTTM in ('01OCT2022'd,'01SEP2022'd ,'01AUG2022'd,'01JUL2022'd,'01JUN2022'd)
    and ind_shlili_6 ne 1 then FAILURE=1;
end;
  totalN+1;
  totalpop7 + POP7;
  totalpop7failure + (pop7 and failure);
  keep total: ;
  if eof;
run;

 

Ronein
Meteorite | Level 14

I run the code and have an error

I also added ; after do until(last.LAKOACH)

 

data want;
pop7=0;
failure=0;
do until(last.LAKOACH);
set csp.twoyrs_3 end=eof;
by LAKOACH;
if (VALID_FROM_DTTM='01NOV2022'd and pop_lomeforat_wppr=7) then POP7=1;
if (VALID_FROM_DTTM in ('01OCT2022'd,'01SEP2022'd ,'01AUG2022'd,'01JUL2022'd,'01JUN2022'd) and ind_shlili_6=1 then FAILURE=1;
end;
totalN+1;
totalpop7 + POP7;
totalpop7failure + (pop7 and failure);
keep total: ;
if eof;
run;

 

1                                                          The SAS System                           13:07 Tuesday, September 3, 2024

1          ;*';*";*/;quit;run;
2          OPTIONS PAGENO=MIN;
3          %LET _CLIENTTASKLABEL='Program';
4          %LET _CLIENTPROCESSFLOWNAME='IV dist';
5          %LET _CLIENTPROJECTPATH='K:\aתחום מודלים\פיתוח מודל CS\פיתוח מודלים\עוסקים_7\Project_Models_Oskim_Ron_Gibuy.egp';
6          %LET _CLIENTPROJECTPATHHOST='VSK1H103A2014';
7          %LET _CLIENTPROJECTNAME='Project_Models_Oskim_Ron_Gibuy.egp';
8          %LET _SASPROGRAMFILE='';
9          %LET _SASPROGRAMFILEHOST='';
10         
11         ODS _ALL_ CLOSE;
12         OPTIONS DEV=PNG;
13         GOPTIONS XPIXELS=0 YPIXELS=0;
14         FILENAME EGSR TEMP;
15         ODS tagsets.sasreport13(ID=EGSR) FILE=EGSR
16             STYLE=HTMLBlue
17             STYLESHEET=(URL="file:///C:/Program%20Files/SASHome/SASEnterpriseGuide/7.1/Styles/HTMLBlue.css")
18             NOGTITLE
19             NOGFOOTNOTE
20             GPATH=&sasworklocation
21             ENCODING=UTF8
22             options(rolap="on")
23         ;
NOTE: Writing TAGSETS.SASREPORT13(EGSR) Body file: EGSR
24         
25         GOPTIONS ACCESSIBLE;
26         data want;
27         pop7=0;
28         failure=0;
29         do until(last.LAKOACH);
30         set csp.twoyrs_3 end=eof;
31         by LAKOACH;
32         if (VALID_FROM_DTTM='01NOV2022'd and pop_lomeforat_wppr=7) then POP7=1;
33         if (VALID_FROM_DTTM in ('01OCT2022'd,'01SEP2022'd ,'01AUG2022'd,'01JUL2022'd,'01JUN2022'd) and ind_shlili_6=1 then
                                                                                                                         ____
_                                                                                                                                   
                                                                                                                         388
79
                                                                                                                         202
33       ! FAILURE=1;
ERROR 388-185: Expecting an arithmetic operator.

ERROR 79-322: Expecting a ).

ERROR 202-322: The option or parameter is not recognized and will be ignored.

34         end;
35         totalN+1;
36         totalpop7 + POP7;
37         totalpop7failure + (pop7 and failure);
38         keep total: ;
39         if eof;
40         run;

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.WANT may be incomplete.  When this step was stopped there were 0 observations and 4 variables.
WARNING: Data set WORK.WANT was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
2                                                          The SAS System                           13:07 Tuesday, September 3, 2024

      real time           0.00 seconds
      user cpu time       0.00 seconds
      system cpu time     0.00 seconds
      memory              2237.40k
      OS Memory           30892.00k
      Timestamp           09/04/2024 07:42:12 PM
      Step Count                        2025  Switch Count  0
      Page Faults                       0
      Page Reclaims                     247
      Page Swaps                        0
      Voluntary Context Switches        12
      Involuntary Context Switches      0
      Block Input Operations            0
      Block Output Operations           0
      

41         
42         GOPTIONS NOACCESSIBLE;
43         %LET _CLIENTTASKLABEL=;
44         %LET _CLIENTPROCESSFLOWNAME=;
45         %LET _CLIENTPROJECTPATH=;
46         %LET _CLIENTPROJECTPATHHOST=;
47         %LET _CLIENTPROJECTNAME=;
48         %LET _SASPROGRAMFILE=;
49         %LET _SASPROGRAMFILEHOST=;
50         
51         ;*';*";*/;quit;run;
52         ODS _ALL_ CLOSE;
53         
54         
55         QUIT; RUN;
56         
Tom
Super User Tom
Super User

You are missing a close parentheses in the second IF statement.

 

Hint: Do not type lines of code that are longer than 70 bytes.  They are too long for humans to scan easily.  There is a reason why newspapers have multiple columns.

Ronein
Meteorite | Level 14

IF a customer fail a few times during the 5 months period (JUN2022-OCT2022) then you count is as one or more?We need to count it as one.

 totalpop7failure + (pop7 and failure);

 

Ronein
Meteorite | Level 14

Your code is working perfect 100%.

Can you please explain why your code totalpop7failure + (pop7 and failure) count only once when a customer fail a few times?

It is working perfect but I want to understand please

data have;
format ddate date9.;
input CustID ddate :date9.  pop Ind_Fail;
cards;
1 '01APR2022'd 7 0
1 '01MAY2022'd 7 0
1 '01JUN2022'd 7 0
1 '01JUL2022'd 7 0
1 '01AUG2022'd 7 0
1 '01SEP2022'd 7 0
1 '01OCT2022'd 7 0
1 '01NOV2022'd 7 0
2 '01APR2022'd 7 0
2 '01MAY2022'd 7 0
2 '01JUN2022'd 7 1
2 '01JUL2022'd 7 0
2 '01AUG2022'd 7 0
2 '01SEP2022'd 7 0
2 '01OCT2022'd 7 0
2 '01NOV2022'd 7 0
3 '01APR2022'd 7 0
3 '01MAY2022'd 7 0
3 '01JUN2022'd 7 1
3 '01JUL2022'd 7 1
3 '01AUG2022'd 7 1
3 '01SEP2022'd 7 1
3 '01OCT2022'd 7 0
3 '01NOV2022'd 7 0
4 '01APR2022'd 8 0
4 '01MAY2022'd 8 0
4 '01JUN2022'd 8 1
4 '01JUL2022'd 8 0
4 '01AUG2022'd 8 0
4 '01SEP2022'd 8 0
4 '01OCT2022'd 8 0
4 '01NOV2022'd 8 0
5 '01APR2022'd 7 0
5 '01MAY2022'd 7 0
5 '01JUN2022'd 7 1
5 '01JUL2022'd 7 0
5 '01AUG2022'd 7 0
5 '01SEP2022'd 7 0
5 '01OCT2022'd 7 0
5 '01NOV2022'd 8 0
;
Run;

 
data want;
pop7=0;
failure=0;
do until(last.CustID);
set have end=eof;
by CustID;
if (ddate='01NOV2022'd and pop=7) then POP7=1;
if ddate in ('01OCT2022'd,'01SEP2022'd ,'01AUG2022'd,'01JUL2022'd,'01JUN2022'd) and Ind_Fail=1 then FAILURE=1;
end;
totalN+1;  /***Total customers----5 customers****/
totalpop7 + POP7;  /****Base pop that is determined by belonging to pop 7 in month Nov 2022---3 customers***/
totalpop7failure + (pop7 and failure);/***number of customers from the base pop totalpop7 that fail at least in one onth----2 customers***/
keep total: ;
if eof;
run;

 

Tom
Super User Tom
Super User

Because it can only execute that statement once per customer because of the DO loop.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 994 views
  • 1 like
  • 2 in conversation