<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: SAS big data how to make the code faster in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/SAS-big-data-how-to-make-the-code-faster/m-p/816665#M322369</link>
    <description>&lt;P&gt;It will take a longer time to process 60Million records than 100 records.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Please explain what you are trying to do?&lt;/P&gt;
&lt;P&gt;It looks from your code like perhaps you have more than just the one dataset to deal with.&lt;/P&gt;
&lt;P&gt;What are the datasets? How large are they?&lt;/P&gt;
&lt;P&gt;Why do need to combine the datasets?&lt;/P&gt;
&lt;P&gt;What are the key variables? Do you have unique keys?&lt;/P&gt;
&lt;P&gt;Are there any non-key variables that have the same name in two or more datasets?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In general SAS code will run faster than SQL code, when it&amp;nbsp; is appropriate, since it will just process the records in order.&amp;nbsp; With SQL joins you run the risk of generating some huge intermediate data when you don't have one to one matching.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you are trying to gather aggregate statistics it usually helps to use the procedures that are designed to do that, like PROC SUMMARY (aka PROC MEANS).&lt;/P&gt;</description>
    <pubDate>Mon, 06 Jun 2022 16:15:03 GMT</pubDate>
    <dc:creator>Tom</dc:creator>
    <dc:date>2022-06-06T16:15:03Z</dc:date>
    <item>
      <title>SAS big data how to make the code faster</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-big-data-how-to-make-the-code-faster/m-p/816662#M322366</link>
      <description>&lt;P&gt;Hi Everyone,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a big data with 60,000,000 rows. The code is very slow, I tried to index some columns but it is still very slow.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any advice? should I use proc sql or data step for big data tables?&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Proc sql;&lt;BR /&gt;create table adherant as&lt;BR /&gt;select A.*, B.Fr_var&lt;BR /&gt;from ses_id.INFORCE_AJUST_1 as A&lt;BR /&gt;left join ses_id.FR_VARIABLE as B&lt;BR /&gt;ON A.no_garantie = B.no_garantie;&lt;BR /&gt;quit;&lt;BR /&gt;proc sql;&lt;BR /&gt;create index no_garantie&lt;BR /&gt;on adherant (no_garantie);&lt;BR /&gt;Quit;&lt;/P&gt;&lt;P&gt;/* Ajout Frais administratif fixe */&lt;/P&gt;&lt;P&gt;Proc sql;&lt;BR /&gt;create table adherant as&lt;BR /&gt;select * from adherant as A&lt;BR /&gt;left join ses_id.FR_Fixe as B&lt;BR /&gt;ON A.no_garantie=B.no_garantie;&lt;BR /&gt;quit;&lt;/P&gt;&lt;P&gt;/* On rajoute la table des scenarios economiques */&lt;BR /&gt;/* On commence par rajouter un id de 1 a 1000, qui correspond aux IDs des scenarios */&lt;/P&gt;&lt;P&gt;data adherant ;&lt;BR /&gt;set adherant;&lt;BR /&gt;do scn_id=1 to 1000;&lt;BR /&gt;output;&lt;BR /&gt;end;&lt;BR /&gt;run;&lt;BR /&gt;proc sql;&lt;BR /&gt;create index scn_id&lt;BR /&gt;on adherant (scn_id);&lt;BR /&gt;Quit;&lt;BR /&gt;/* On rajoute les scenarios economiques */&lt;BR /&gt;Proc sql;&lt;BR /&gt;create table adherant (drop=scn_id) as&lt;BR /&gt;select A.*&lt;BR /&gt;,B.RendDEX as rend1&lt;BR /&gt;,B.RendMM as rend2&lt;BR /&gt;,B.RendTSX as rend3&lt;BR /&gt;,B.RendSP500 as rend4&lt;BR /&gt;,B.RendEAFE as rend5&lt;BR /&gt;,B.TaskNum&lt;BR /&gt;,B.Time&lt;/P&gt;&lt;P&gt;from adherant as A&lt;BR /&gt;left join ses_id.Scenarios_ECN as B&lt;BR /&gt;ON A.scn_id = B.TaskNum&lt;BR /&gt;where Time&amp;gt;0 ;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;quit;&lt;/P&gt;&lt;P&gt;/* Calcul de l'age par mois de scenarios economique, l'age doit etre entier et seulement s'incrementer de 1 au 12eme mois */&lt;/P&gt;&lt;P&gt;data adherant2 ;&lt;/P&gt;&lt;P&gt;set adherant;&lt;/P&gt;&lt;P&gt;age_adh = age_adh_ori + int((time-1)/12) ;&lt;BR /&gt;Evaluation_dt = INTNX('month', &amp;amp;date_eval., (time-1), 'END');&lt;BR /&gt;duree_IE = min(max(int(YRDIF(date_acq_moyenne_IE,Evaluation_dt,'AGE'))+1,1),6);&lt;BR /&gt;lapse_duree = min(max(int(YRDIF(Issue_Date,Evaluation_dt,'AGE'))+1,1),10);&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;/*Ajout table Taux forward */&lt;/P&gt;&lt;P&gt;Proc sql;&lt;BR /&gt;create table adherant as&lt;BR /&gt;select A.* ,&lt;BR /&gt;B.facteur_act as fct_tx_fwrd&lt;BR /&gt;from adherant as A&lt;BR /&gt;left join ses_id.Scn_tx_forward as B&lt;BR /&gt;ON (A.TaskNum = B.TaskNum and A.Time = B.Time) ;&lt;BR /&gt;quit;&lt;/P&gt;</description>
      <pubDate>Mon, 06 Jun 2022 16:03:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-big-data-how-to-make-the-code-faster/m-p/816662#M322366</guid>
      <dc:creator>afsand</dc:creator>
      <dc:date>2022-06-06T16:03:34Z</dc:date>
    </item>
    <item>
      <title>Re: SAS big data how to make the code faster</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-big-data-how-to-make-the-code-faster/m-p/816665#M322369</link>
      <description>&lt;P&gt;It will take a longer time to process 60Million records than 100 records.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Please explain what you are trying to do?&lt;/P&gt;
&lt;P&gt;It looks from your code like perhaps you have more than just the one dataset to deal with.&lt;/P&gt;
&lt;P&gt;What are the datasets? How large are they?&lt;/P&gt;
&lt;P&gt;Why do need to combine the datasets?&lt;/P&gt;
&lt;P&gt;What are the key variables? Do you have unique keys?&lt;/P&gt;
&lt;P&gt;Are there any non-key variables that have the same name in two or more datasets?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In general SAS code will run faster than SQL code, when it&amp;nbsp; is appropriate, since it will just process the records in order.&amp;nbsp; With SQL joins you run the risk of generating some huge intermediate data when you don't have one to one matching.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you are trying to gather aggregate statistics it usually helps to use the procedures that are designed to do that, like PROC SUMMARY (aka PROC MEANS).&lt;/P&gt;</description>
      <pubDate>Mon, 06 Jun 2022 16:15:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-big-data-how-to-make-the-code-faster/m-p/816665#M322369</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2022-06-06T16:15:03Z</dc:date>
    </item>
    <item>
      <title>Re: SAS big data how to make the code faster</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-big-data-how-to-make-the-code-faster/m-p/816684#M322377</link>
      <description>&lt;P&gt;Two suggestions: 1. Try &lt;U&gt;PROC FEDSQL.&lt;/U&gt; 2. Only keep the necessary variables in your SQL.&lt;/P&gt;</description>
      <pubDate>Mon, 06 Jun 2022 17:21:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-big-data-how-to-make-the-code-faster/m-p/816684#M322377</guid>
      <dc:creator>Jerrya00</dc:creator>
      <dc:date>2022-06-06T17:21:14Z</dc:date>
    </item>
    <item>
      <title>Re: SAS big data how to make the code faster</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-big-data-how-to-make-the-code-faster/m-p/816685#M322378</link>
      <description>&lt;P&gt;Hi Tom,&lt;/P&gt;&lt;P&gt;I have a big table called Adherants that has 60Million of records. I have attached the table.&lt;/P&gt;&lt;P&gt;I want to join to the Adherants table some variables from 6 other tables. I have also attached the other tables.&amp;nbsp;&lt;/P&gt;&lt;P&gt;It takes so much time to calculate new variables in the adherants table and also add variables from the six other tables.&lt;/P&gt;&lt;P&gt;You have the tables attached and you can also see the complete code below. Please let me know if you need anything else.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;data adherant ;&lt;/P&gt;&lt;P&gt;set adherant;&lt;/P&gt;&lt;P&gt;age_adh = age_adh_ori + int((time-1)/12) ;&lt;BR /&gt;Evaluation_dt = INTNX('month', &amp;amp;date_eval., (time-1), 'END');&lt;BR /&gt;duree_IE = min(max(int(YRDIF(date_acq_moyenne_IE,Evaluation_dt,'AGE'))+1,1),6);&lt;BR /&gt;lapse_duree = min(max(int(YRDIF(Issue_Date,Evaluation_dt,'AGE'))+1,1),10);&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;/*Add table1*/&lt;/P&gt;&lt;P&gt;Proc sql;&lt;BR /&gt;create table adherant as&lt;BR /&gt;select A.* ,&lt;BR /&gt;B.facteur_act as fct_tx_fwrd&lt;BR /&gt;from adherant as A&lt;BR /&gt;left join ses_id.Scn_tx_forward as B&lt;BR /&gt;ON (A.TaskNum = B.TaskNum and A.Time = B.Time) ;&lt;BR /&gt;quit;&lt;/P&gt;&lt;P&gt;/*Add table2 */&lt;/P&gt;&lt;P&gt;Proc sql;&lt;BR /&gt;create table adherant as&lt;BR /&gt;select A.* ,&lt;BR /&gt;B.M as annuit_fct_M,&lt;BR /&gt;B.F as annuit_fct_F&lt;BR /&gt;from adherant as A&lt;BR /&gt;left join ses_id.Annuitization as B&lt;BR /&gt;ON (A.age_adh = B.age) ;&lt;BR /&gt;quit;&lt;/P&gt;&lt;P&gt;/* Add table3*/&lt;BR /&gt;Proc sql;&lt;BR /&gt;create table adherant as&lt;BR /&gt;select A.* ,&lt;BR /&gt;B.reset,&lt;BR /&gt;B.indReset&lt;BR /&gt;from adherant as A&lt;BR /&gt;left join ses_id.Renouv_reset as B&lt;BR /&gt;ON (A.no_garantie = B.no_garantie and A.age_adh = B.age) ;&lt;BR /&gt;quit;&lt;/P&gt;&lt;P&gt;/*Add table4 */&lt;/P&gt;&lt;P&gt;Proc sql;&lt;BR /&gt;create table adherant as&lt;BR /&gt;select A.* ,&lt;BR /&gt;B.Min_ferr&lt;BR /&gt;from adherant as A&lt;BR /&gt;left join ses_id.Min_FERR as B&lt;BR /&gt;ON (A.age_adh = B.age) ;&lt;BR /&gt;quit;&lt;/P&gt;&lt;P&gt;/*Add table5*/&lt;BR /&gt;Proc sql;&lt;BR /&gt;create table adherant as&lt;BR /&gt;select A.* ,&lt;BR /&gt;B.penalite as rachat_penalite&lt;BR /&gt;from adherant as A&lt;BR /&gt;left join ses_id.Penalite_rachat_bl as B&lt;BR /&gt;ON (A.duree_IE = B.annee) ;&lt;BR /&gt;quit;&lt;/P&gt;&lt;P&gt;/*Add table6 */&lt;/P&gt;&lt;P&gt;Proc sql;&lt;BR /&gt;create table adherant as&lt;BR /&gt;select A.* ,&lt;BR /&gt;B.tx_lapse_base&lt;BR /&gt;from adherant as A&lt;BR /&gt;left join ses_id.Lapse_tx_base as B&lt;BR /&gt;ON (A.no_garantie = B.no_garantie and A.lapse_duree = B.duree) ;&lt;BR /&gt;quit;&lt;/P&gt;</description>
      <pubDate>Mon, 06 Jun 2022 17:40:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-big-data-how-to-make-the-code-faster/m-p/816685#M322378</guid>
      <dc:creator>afsand</dc:creator>
      <dc:date>2022-06-06T17:40:03Z</dc:date>
    </item>
    <item>
      <title>Re: SAS big data how to make the code faster</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-big-data-how-to-make-the-code-faster/m-p/816695#M322385</link>
      <description>&lt;P&gt;So it looks you are trying to use different variables to match to the other dataset.&amp;nbsp; So perhaps you are trying to do some type of look up function?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Do the other tables have unique keys?&amp;nbsp; How many variables are you trying to pull from them? How large are they?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you are pulling just one variable from each perhaps you can convert the lookup table into a format (or informat) and then generated the target variable using a PUT() or INPUT() function in your first data step.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If the other tables are small enough to fit in memory use HASH objects match them.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If the other tables are too large for loading into memory as a HASH() objects then perhaps you can improve performance by creating indexes on the key variables.&amp;nbsp; That might improve the SQL performance as PROC SQL might use the index instead of doing a full cross join.&amp;nbsp; You might even be able to join using a data step with KEY= option of the SET statement.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you really are forced to use PROC SQL you might try collapsing your multiple steps into a single step, then at least you do not have to process the large dataset multiple times to match it with multiple other datasets.&lt;/P&gt;</description>
      <pubDate>Mon, 06 Jun 2022 18:21:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-big-data-how-to-make-the-code-faster/m-p/816695#M322385</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2022-06-06T18:21:13Z</dc:date>
    </item>
  </channel>
</rss>

