BookmarkSubscribeRSS Feed
dipalp
Calcite | Level 5

Hello,

For the scenario below, i am trying to use a combination of macro and sql statement to run 1 year or more of data (Daily feeds)

I have daily tables and trying to extracts the dates changes were made to specific field in a table.

Day1:5-1-2017

AccountNumber Date     Field1

123456               5-1-2017            A

123425               5-1-2017            A

123426               5-1-2017            A

123427               5-1-2017            A

Day2:5-2-2017

123456              5-2-2017            A

123425               5-2-2017            A

123426               5-2-2017            B

123427               5-2-2017            A

Day3:5-3-2017

123456              5-3-2017            C

123425               5-3-2017            A

123426               5-3-2017            B

123427               5-3-2017            A

Day4:5-4-2017

123456              5-3-2017            A

123425               5-3-2017            A

123426               5-3-2017            B

123427               5-3-2017            A

 

Result:

5-2-2017           123426               5-2-2017            B

5-3-2017            123456              5-3-2017            C

5-4-2017            123456              5-3-2017            A

 

Below is what i have so far.

 

%macro date_loop(start,end);

%let start=%sysfunc(inputn(&start,anydtdte9.));

%let end=%sysfunc(inputn(&end,anydtdte9.));

%let dif=%sysfunc(intck(day,&start,&end));

%do i=0 %to &dif;

%let date=%sysfunc(intnx(day,&start,&i,b),date9.);

%put &date;

%end;

%mend date_loop;

%date_loop(01may2017,05may2017);

proc sql;

create table xyz as

(select * from table1

where FILE_SEND_DT = &start + 1

except

select * from table1

where FILE_SEND_DT = &start);

 quit;

 

Thanks in advance for your help.

 

4 REPLIES 4
mkeintz
PROC Star

If each of your daily tables are sorted by id, then it is a very simple data step.  Assuming all your daily files are named DAILYxxx (where xxx is used to specify the date), then:

 

data want;
  set daily: ;
  by id date;
  if lag(field1)^=field1 and first.id=0;
run;

 

 

But your sample does not show the daily tables as sorted.  You could sort each of them and run the above program.  But sorting a year's worth of daily files can be expensive .  In that case, a hash object which carries the most recent value of FIELD1 to compare against the current version can be used:

 

data want (drop=rc);
  set daily: ;
  by date;
  if _n_=1 then do;
    if 0 then set daily1 (rename=(field1=oldfield1));
    declare hash h (dataset:'daily1 (rename=(field1=oldfield1))');
      h.definekey('id');
      h.definedata('oldfield1');
      h.definedone();
  end;
  call missing(oldfield1);
  rc=h.find();
  if rc^=0 then h.replace(key:id,data:field1);
  if rc=0 and field1^=oldfield1;
  rc=h.replace(key:id,data:field1);
run;

 

 The dataset DAILY1 is the name  I use for the earliest daily  data set.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
dipalp
Calcite | Level 5

Hello,

 

Thank you for the response. Each of the daily files is 1.5-1.9 million rows, therefore, i was using PROC SQL along with a macro rather than a data step (that has to get all that data into SAS).

 

For the SAS code below, if i were to run a month worth of data, are you suggesting extracting the files, naming them daily1, daily2, daily3....daily30? and then running the below code?

 

data want (drop=rc);
  set daily: ;
  by date;
  if _n_=1 then do;
    if 0 then set daily1 (rename=(field1=oldfield1));
    declare hash h (dataset:'daily1 (rename=(field1=oldfield1))');
      h.definekey('id');
      h.definedata('oldfield1');
      h.definedone();
  end;
  call missing(oldfield1);
  rc=h.find();
  if rc^=0 then h.replace(key:id,data:field1);
  if rc=0 and field1^=oldfield1;
  rc=h.replace(key:id,data:field1);
run;

Thanks!

mkeintz
PROC Star

What do you mean "extract the files"?  I had the impression that you already have a collection of daily data sets, which means no extraction is necessary.  If so what are their names?  Use those names. 

 

But I had also assumed presumed you have a naming convention in place, like   data_20170103 data_20170104 .... data_20171231, which would permit a compact way of generating a dataset name list, as opposed to entering every dataset name.

 

For example, using the convention in the above paragraph you could request all the 2017 July datasets using the expression data_201707:.

 

What naming convention, if any do your daily datasets have?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
dipalp
Calcite | Level 5

I apologize, i need to clarify the problem. The data is in a table  in the data warehouse (it has over a million rows per load date). In order to prevent extracting over 45 million rows of data per month, i wanted to use the approach below.  

 

I used a macro to create dates (within the dates mentioned), then go through 1 iteration of PROC SQL statement per date to identify field changes for entire year. I hope that clarified your question.

 

%macro date_loop(start,end);

%let start=%sysfunc(inputn(&start,anydtdte9.));

%let end=%sysfunc(inputn(&end,anydtdte9.));

%let dif=%sysfunc(intck(day,&start,&end));

%do i=0 %to &dif;

%let date=%sysfunc(intnx(day,&start,&i,b),date9.);

%put &date;

%end;

%mend date_loop;

%date_loop(01may2017,05may2017);

proc sql;

create table xyz as

(select * from table1

where FILE_SEND_DT = &start + 1

except

select * from table1

where FILE_SEND_DT = &start);

 quit;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 866 views
  • 0 likes
  • 2 in conversation