11-06-2014 02:56 PM
I have a large set of data called HIST. Each record in the HIST dataset has a period_date. I have another set of data, a table of 33 period_dates called PDATES, associated with the HIST data.
I would like to create a macro that looks at the first date in the PDATES table and creates a separate dataset from the HIST dataset where the period_Date in the HIST table is equal to the period_Date in the PDATES table. The goal is to have 33 datasets when the macro completes.
I've looked at various articles online about %DO, etc, and just can't seem to translate my thoughts to code.
I appreciate any assistance!
11-06-2014 03:22 PM
And you want the new files to only contain the dates specified in the other file? Or, are you looking for ranges of dates on or between the dates specified in the one file?
11-06-2014 03:39 PM
Each data set would contain all the values from the HIST table where the period_dates are equal.
For example the output of period_date 03/21/2014 could have 5345 rows (and associated columns), period_date 04/04/2014 could have 1255 rows (and associated columns), etc.
Like if I was to write
if period_date = 03/21/2014;
11-06-2014 03:18 PM
If you have a "large data set", why would you design a program that reads the data set 33 times? SAS can produce all 33 data sets in a single DATA step. Wouldn't that be a better approach?
11-06-2014 04:17 PM
Another approach is to write the sas code to write the code needed to split the HIST data using the dates dataset PDATES. Automates the writing of the code Ody suggests.
set pdates ;
file "c:\SASCodeToRun" ;
line="data "||compress("hist_"||put(datefield,yymmddn8.))||" ; " ;
line =" set HIST (where=(datefield='"||put(datefield,date9.)||"'d )) ; " ;
line = "run; " ;
I suggest you review the code written in the fie before you run it ... I did not test my code .... but I use this technique a lot.
11-06-2014 04:23 PM
Its generally not recommended to split your file up. SAS has BY group processing that is quite efficient in most circumstances. Can you explain what the next step in your process would be that you would want this separated for?
Here's how you'd get all data from in one file
create table selected_dates as
from table1 where date in (select date from table2)
order by date;
11-07-2014 08:43 AM
I need to loop the data for each period_date through an existing macro, chronologically. I'm certain it's possible to do without breaking the data into chunks by just using the HIST table but I'm not familiar enough with how to write the logic to do it and haven't had much success researching it online. I figured if i had an example of breaking it out into separate datasets I could adapt that to my needs.
11-07-2014 07:37 PM
Can you post an excerpt from the macro with the relevant code?
Given that you're constrained by the functionality of your existing macro, views may be a good option.
Here's a wild guess as to what your macro could be doing, and a possible approach. Pretend sashelp.zipcode is a large dataset, so you want to use views for better performance. The below code should work as is on your system, so just cut-and-paste and see if it can be modified to suit your needs.
BUT, if your macro can be modified to use BY processing, that would probably perform better.
%* pretend this is your exising macro with some complex logic ;
%* and for some unknown reason cannot use BY processing ;
proc print data=&data (obs=20);
%* this is a "helper" macro to create the view and call my_existing_macro ;
data myview / view=myview;
%* create a list of items to loop over ;
proc sql noprint;
select distinct statecode into :mylist separated by " " from sashelp.zipcode
where statecode <= "FL"; %* to limit the output for testing ;
%* loop over each item in the list ;
%* by default the loop macro calls the macro "%code" ;
%* the output should be a invocation of proc print ;
%* using the myview view, which filters for each ;
%* item in the list. during each iteration of %loop, ;
%* that item is in the reserved macro variable &word ;
11-10-2014 09:13 AM
Thanks all for the many replies.
The reason I require the data to be separated is due to how it needs to be processed. The HIST data, as you might guess, is historical data. It needs to be added chronologically by period_date to an Oracle table.
The existing macro for current data is as follows:
%let X = %sysfunc(fileexist(&dir2)); %put &X;
%if &X = 0 %then %do;
%put ERROR: File Missing. Contact Department;
OUT = Employee
DATAFILE = &dir2
DBMS = xlsx Replace;
SHEET = "Test";
GETNAMES = YES;
DELETE FROM odbc AMM.AMM_Table;
INSERT INTO odbcAMM.AMM_Table
(sasdatefmt = (
PERIOD_DT = 'MMDDYY10.',
ACTION_DT = 'MMDDYY10.',
EFFECTIVE_DT = 'MMDDYY10.',
TERM_DT = 'MMDDYY10.',
Load_DATE = 'MMDDYY10.' ) )
connect to odbc(dsn = &dsnName);
execute (call AMM.UPSERT_Table()) by odbc;
%ImportData(dir2= file location);
Due to the logic in the call procedure [AMM.UPSERT_Table], I need to add the HIST data by period, chronologically. I won't go into the details of that but creating a macro to take the HIST data and loop through the period_dates so that I can fit it into the existing macro would save me a lot of time.That's my overall plan. If I can not figure out a way to do that I will parse out the data in the HIST table period_date by period_date and run it through this macro a section at a time. It's less elegant for sure, but would still achieve my objective.
11-07-2014 07:08 PM
Yes I agree with Reeza. I (and others) have posted approaches to split your dataset. What we (and you) should be asking is "why am I splitting my dataset, and do I want to do so?'
11-07-2014 09:15 AM
I think you can accomplish what you want with something like:
proc sql noprint;
select catt('if period_date=',period_date,' then output _',
into : selects separated by ' ',
utlist separated by ' '
11-07-2014 07:05 PM
Download the %loop macro from SAS/loop.sas at master · scottbass/SAS · GitHub
Download the %parmv macro from SAS/parmv.sas at master · scottbass/SAS · GitHub (or else comment out the calls to %parmv in %loop)
Download the %hash_split_dataset from SAS/hash_split_dataset.sas at master · scottbass/SAS · GitHub (optional, see below)
Read the macro headers - there are lots of use cases in each macro.
If you're SURE PDATES has all periods, then:
proc sql noprint;
select distinct period_dates format=yymmddn8. into : periods separated by " " from pdates;
%put &periods; * should be your 33 periods ;
Otherwise, use your source data to ensure you cover all the periods:
proc sql noprint;
select distinct period_dates format=yymmddn8. into : periods separated by " " from hist;
%put &periods; * should be your 33 periods but may differ if pdates is not in sync ;
if period_date=input(&word,yymmdd8.) then output mylib.data_&word;
Pay attention to the absence of semi-colon in the dsnames macro.
If your large HIST dataset will fit into memory, then review the macro header for %hash_split_dataset for a way to split a dataset via the hash object.