01-25-2016 06:10 PM
Hello. I have written the following program which is essentially the creation of 12+datasets querying the same table once for each of the last N periods. I would like to create a dynamic macro that will loop through the "datelist" dataset values (which are the top N distinct statp values from table1 STATP="YYYYMM") and create N datasets with the same keep statement (except for the first iteration: see bellow "current"). Then I would like to use that macro loop, or an additional loop, to left join each of these to my base table, renaming the EDMM-from the STATP to (three letter abreviation of the month)ED"YY" (see base0 table).
I know some will say this can be done in proc report but I found this method to be infinitely faster than the across proc report. However, I would like to be able to semiautimate it with a macro so I don't have to painstakingly edit all the prefix aliases and so that I can use the technique in different applications. The attached article on page 5 seemed to document a similar concept, and I have pasted my attempt at the bottom, but it does not compile due to syntax errors and I'm not exactly sure wheather it is the output I desire. Please help provide code as YOU would do this, as I am still trying to learn and am not confident what I was trying to do would accomplish my objectives efficiently. It would also help me learn, as I am new to sas and proc sql, but have exposure to object oriented programming in java and python. I've marked commentary about what I'm trying to in the code with ****************** to make it stand out.
%currentStat=201511; libname ****; data editrej; set folder1.editrej; Uniquekey=cat(ID,ITEM); keep ID Item Evntflg Error; run; data notes; set folder1.rnotes; keep ID STATP Item notesrce; where statp='201511'; run; Data Who; set metrics.Table2; keep ID _Who; where substr(PRODCYCLE,5,2)= '00' and STATP="¤t"; run; data datelist; set metrics.Table1; Keep STATP Where STATP>'201400'; run; proc sort data=datelist noduplicates; by descending STATP; run; ****I'd like to have 12 observations from this list without having to do STATP>201400****** *********STATP is a string with date YYYYMM and this is the only thing that will change between data sets below***** *******All of the following are the same except the most recent period which will have RPDATA and EDDATA************** %Let standard=SURVEY STATP ID ITEM EDDATA; Data current; *******Only the current month (top of sorted datelist with nodups but not actually current calandar month) has RPDATA and DATAFLAG***** Set metrics.Table1; Keep &standard RPDATA Dataflag; Where STATP= "¤tStat" AND substr(PRODCYCLE,5,2)= '00'; run; Data monthly10; Set metrics.Table1; Where STATP= '201510' AND substr(PRODCYCLE,5,2)= '00'; KEEP &standard; Rename EDDATA=ED10; run; *****For everything except the "current" data set I want the EDDATA column renamed to be EDMM from the YYYYMM of STATP*** run; . .******Right now I have an individual datastep for each period but want to replace all of this with a looped macro!!!!***** . Data monthly1114; Set metrics.Table1; Where STATP= '201411' AND substr(PRODCYCLE,5,2)= '00';/*Loop stat period month, data name, and rename of EDDATA*/ KEEP SURVEY STATP ID ITEM EDDATA;*******Or use &standard Rename EDDATA=NovED14; ****Next I do a proc sort of each created dataset but ideally this would be part of the same looped macro as the creation ****Of the datasets PROC SORT DATA=current NODUPLICATES;/*loop this with do while &i at the end of stat period in macro*/ BY _all_; run; ******************************************Two Other tables part of the join are omitted here to cut down length********* data prods; set metrics.Table1; keep ITEM; where dataflag not in ('S','D') and Prodcycle='201500' and Statp>'201500' and substr(ID,1,3) ne '999' and Substr(Item,1,2) not in ('SH','UL','GR','BS'); run; data IDs; set metrics.Table1; keep ID; where statP="¤tStat"; run;
proc sort data=prods noduplicates; by _all_; run; proc sort data=IDs noduplicates; by _all_; ******Cartesianing all reportable ID and Product combinations as base table to left join to*************** proc sql noprint; create table base as select * from IDs,Prods; run; *******************Left Joining all renamed EDDATA columns from each monthly dataset created in the loop *******************And left join of each of the 4 other tables that are distinct and only need to run on the first iteration *******************Or can be left outside of the loop; *******************Then after all joins are done I'm filtering out any observations that are completely blank across all periods ******I'd like my loop, or another loop if required, to produce the Select statment "A.ID,A.Item,N.NovED14 as Nov14... ******Essentially I want the EDMM column that I was creating in the datasets to be (three char Month abr)EDYY ******Corresponding to the YYYYMM of the STATP variable associated with each monthly table proc sql noprint; create table base0 as select A.ID,A.Item,N.NovED14 as Nov14,M.DecED14 as Dec14,L.ED01 as Jan15,K.ED02 as Feb15,J.ED03 as Mar15,I.ED04 as Apr15, H.ED05 as May15, G.ED06 as Jun15,F.ED07 as Jul15, E.ED08 as Aug15,D.ED09 as Sep15,C.ED10 as Oct15,B.EDDATA,B.RPDATA,p.EDDATA as MFWSUB,q.EDDATA as MFWPUB,o.Error,s.notesrce,r._who as who from base a left join current b on A.id=B.id and A.item=B.item left join monthly10 c on A.id=c.id and A.item=c.item left join monthly09 d on A.id=d.id and A.item=d.item left join monthly08 e on A.id=e.id and A.item=e.item left join monthly07 f on A.id=f.id and A.item=f.item left join monthly06 g on A.id=g.id and A.item=g.item left join monthly05 h on A.id=h.id and A.item=h.item left join monthly04 i on A.id=i.id and A.item=i.item left join monthly03 j on a.id=j.id and a.item=j.item left join monthly02 k on A.id=k.id and A.item=k.item left join monthly01 l on A.id=l.id and A.item=l.item left join monthly1214 m on A.id=m.id and A.item=m.item left join monthly1114 n on a.id=n.id and a.item=n.item left join editrej o on A.id=o.id and A.item=o.item left join sub p on a.id=p.id and a.item=p.prod left join pub q on a.id=q.id and a.item=q.prod left join Who r on a.id=r.id left join notes s on a.id=s.id and a.item=s.item Where M.DecED14 ne . and m.DecED14 ne 0 or n.NovED14 ne . and n.NovED14 ne 0 or l.ED01 ne . and l.ED01 ne 0 or k.ED02 ne . and k.ED02 ne 0 or j.ED03 ne . and j.ED03 ne 0 or i.ED04 ne . and i.ED04 ne 0 or h.ED05 ne . and h.ED05 ne 0 or g.ED06 ne . and g.ED06 ne 0 or f.ED07 ne . and f.ED07 ne 0 or e.ED08 ne . and e.ED08 ne 0 or d.ED09 ne . and d.ED09 ne 0 or c.ED10 ne . and c.ED10 ne 0 or b.EDDATA ne . and b.EDDATA ne 0; Quit; %Let prcntch=NovfromOct; proc sql; Create Table Report815 as Select ID,ITEM,Nov14,Dec14,Jan15,Feb15,Mar15,Apr15,May15,Jun15,Jul15,Aug15,Sep15,Oct15,EDDATA,RPDATA,MFWPUB,MFWSUB,round(Mean(EDDATA,Oct15,Sep15,Aug15,Jul15,Jun15,May15,Apr15,Mar15,Feb15,Jan15,Dec14,Nov14)) as Avg, abs(EDDATA-MFWPUB) as DiffMFW, (case when EDDATA=Oct15 and EDDATA not eq . then 'true' else 'false' end) as dupe, abs(round((EDDATA-Oct15)/Oct15))*100 as Prcnt_change_&prcntch, /*,(case when (EDDATA>(round(Mean(EDDATA,Oct15,Sep15,Aug15,Jul15,Jun15,May15,Apr15,Mar15,Feb15,Jan15,Dec14,Nov14))+2*abs(STD(Jan15,Feb15,Mar15,Apr15,May15,Jun15,Jul15,Aug15,Sep15,Oct15,EDDATA)))) or EDDATA<(round(Mean(EDDATA,Oct15,Sep15,Aug15,Jul15,Jun15,May15,Apr15,Mar15,Feb15,Jan15,Dec14,Nov14))-2*abs(STD(Jan15,Feb15,Mar15,Apr15,May15,Jun15,Jul15,Aug15,Sep15,Oct15,EDDATA)))then "outlier" end) as outlier*/ ERROR,notesrce,Who from base0; quit; /*Deleting everything but base0 so it only outputs one thing*/ proc sort data=Report815; by Decending error Prcnt_change_&prcntch; run; proc datasets nolist library=Work; Save Report815; run;
My attempt but not entirely sure that it would output everything i want:
%Macro Datasets; data Temp&I; set metrics.table1; Where STATP="&&Macvar&I" AND substr(PRODCYCLE,5,2)= '00'; KEEP SURVEY STATP ID ITEM EDDATA; Rename EDDATA=substr("ED&&Macvar&I",6,4); run; %Mend Datasets; Data _Null_; SET datelist; call Symput('Macvar'||Left(Trim(_N_)), STATP); Call symput('Totobs',_N_); run;
01-25-2016 06:39 PM
You should provide some example data and what the output for that example data would be. Otherwise there is a lot of guessing as to what you intent may be. And without input cannot test anything related to the code you show.
Also, when I see a value like 201511 I suspect this may be a year/month value. At which point I start thinking that instead of character values like '201511' being hardcoded that the values should be SAS date values as then it is possible to "loop" or possibly even just use a FORMAT to create groups of values. And then seeing data sets named "monthly" that practically confirms that dates should be used.
01-25-2016 08:08 PM
Is there an efficient way to format the STATP column as a datetime when All the table has is a YYYYMM string for each report period? Mind you we are talking Trillions of rows because this is monthly survey data going back over 20 years. Formatting the table is not an option at my access level and would take an insanely long processing time. I'm able to quickly query the last N reporting periods in the database and want to loop through these creating N distinct data sets. Then I want to left join the EDDATA columns as I've done the long way in my code, based on the ID and ITEM columns matching, but I want to do this using a looped macro. See ***** in my code block for explanations of what I'm trying to do and comments about exactly which queries I'm trying to loop through.
As for the table data I cannot post it but I've already written working queries. I just need to know where to put the loop, and how to write the macro.
The table data input is essentially trillions of lines with the following variables:
ID (10 digit ID) ITEM (AlphaNumeric) EDDATA (int) RPDATA (INT) STATP (‘YYYYMM’ string incompatible with SAS datetime format)
I’m trying to group the top n by distinct STATP. N is usually 12 months of data.
01-25-2016 08:38 PM
1. You need to generate your month variables through a loop. I recommend using a data step and then call execute to invoke your macro with required parameters
2. Write your macro to work based on the monthly value
Assume your macro is called
%macro monthly_reports(month=); *code to repeat here; %mend; *loops through the 12 months of reporting; data want; start_date="01Jan2014"d; do i=1 to 12; *12 months; month=intnx("month", start_date, i-1, 'b'); month_var=put(month, yymmn6.); Str= catt('%monthly_reports(month=', month_var, ');'); call execute(str); output; end; run;
01-26-2016 01:15 AM
Just a minor issue:
data editrej; set folder1.editrej; Uniquekey=cat(ID,ITEM); keep ID Item Evntflg Error; run;
In this step you create a new variable, but don't include it in the keep list. Either you missed putting it in the keep list (which will cause you ".... is unitialized" messages and peculiar fails later in the chain), or it is not needed, in which case you should delete the statement
or at least comment it out.