Hello,
For the scenario below, i am trying to use a combination of macro and sql statement to run 1 year or more of data (Daily feeds)
I have daily tables and trying to extracts the dates changes were made to specific field in a table.
Day1:5-1-2017
AccountNumber Date Field1
123456 5-1-2017 A
123425 5-1-2017 A
123426 5-1-2017 A
123427 5-1-2017 A
Day2:5-2-2017
123456 5-2-2017 A
123425 5-2-2017 A
123426 5-2-2017 B
123427 5-2-2017 A
Day3:5-3-2017
123456 5-3-2017 C
123425 5-3-2017 A
123426 5-3-2017 B
123427 5-3-2017 A
Day4:5-4-2017
123456 5-3-2017 A
123425 5-3-2017 A
123426 5-3-2017 B
123427 5-3-2017 A
Result:
5-2-2017 123426 5-2-2017 B
5-3-2017 123456 5-3-2017 C
5-4-2017 123456 5-3-2017 A
Below is what i have so far.
%macro date_loop(start,end);
%let start=%sysfunc(inputn(&start,anydtdte9.));
%let end=%sysfunc(inputn(&end,anydtdte9.));
%let dif=%sysfunc(intck(day,&start,&end));
%do i=0 %to &dif;
%let date=%sysfunc(intnx(day,&start,&i,b),date9.);
%put &date;
%end;
%mend date_loop;
%date_loop(01may2017,05may2017);
proc sql;
create table xyz as
(select * from table1
where FILE_SEND_DT = &start + 1
except
select * from table1
where FILE_SEND_DT = &start);
quit;
Thanks in advance for your help.
If each of your daily tables are sorted by id, then it is a very simple data step. Assuming all your daily files are named DAILYxxx (where xxx is used to specify the date), then:
data want;
set daily: ;
by id date;
if lag(field1)^=field1 and first.id=0;
run;
But your sample does not show the daily tables as sorted. You could sort each of them and run the above program. But sorting a year's worth of daily files can be expensive . In that case, a hash object which carries the most recent value of FIELD1 to compare against the current version can be used:
data want (drop=rc);
set daily: ;
by date;
if _n_=1 then do;
if 0 then set daily1 (rename=(field1=oldfield1));
declare hash h (dataset:'daily1 (rename=(field1=oldfield1))');
h.definekey('id');
h.definedata('oldfield1');
h.definedone();
end;
call missing(oldfield1);
rc=h.find();
if rc^=0 then h.replace(key:id,data:field1);
if rc=0 and field1^=oldfield1;
rc=h.replace(key:id,data:field1);
run;
The dataset DAILY1 is the name I use for the earliest daily data set.
Hello,
Thank you for the response. Each of the daily files is 1.5-1.9 million rows, therefore, i was using PROC SQL along with a macro rather than a data step (that has to get all that data into SAS).
For the SAS code below, if i were to run a month worth of data, are you suggesting extracting the files, naming them daily1, daily2, daily3....daily30? and then running the below code?
data want (drop=rc);
set daily: ;
by date;
if _n_=1 then do;
if 0 then set daily1 (rename=(field1=oldfield1));
declare hash h (dataset:'daily1 (rename=(field1=oldfield1))');
h.definekey('id');
h.definedata('oldfield1');
h.definedone();
end;
call missing(oldfield1);
rc=h.find();
if rc^=0 then h.replace(key:id,data:field1);
if rc=0 and field1^=oldfield1;
rc=h.replace(key:id,data:field1);
run;
Thanks!
What do you mean "extract the files"? I had the impression that you already have a collection of daily data sets, which means no extraction is necessary. If so what are their names? Use those names.
But I had also assumed presumed you have a naming convention in place, like data_20170103 data_20170104 .... data_20171231, which would permit a compact way of generating a dataset name list, as opposed to entering every dataset name.
For example, using the convention in the above paragraph you could request all the 2017 July datasets using the expression data_201707:.
What naming convention, if any do your daily datasets have?
I apologize, i need to clarify the problem. The data is in a table in the data warehouse (it has over a million rows per load date). In order to prevent extracting over 45 million rows of data per month, i wanted to use the approach below.
I used a macro to create dates (within the dates mentioned), then go through 1 iteration of PROC SQL statement per date to identify field changes for entire year. I hope that clarified your question.
%macro date_loop(start,end);
%let start=%sysfunc(inputn(&start,anydtdte9.));
%let end=%sysfunc(inputn(&end,anydtdte9.));
%let dif=%sysfunc(intck(day,&start,&end));
%do i=0 %to &dif;
%let date=%sysfunc(intnx(day,&start,&i,b),date9.);
%put &date;
%end;
%mend date_loop;
%date_loop(01may2017,05may2017);
proc sql;
create table xyz as
(select * from table1
where FILE_SEND_DT = &start + 1
except
select * from table1
where FILE_SEND_DT = &start);
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.