Hello all,
I have a large set of data called HIST. Each record in the HIST dataset has a period_date. I have another set of data, a table of 33 period_dates called PDATES, associated with the HIST data.
I would like to create a macro that looks at the first date in the PDATES table and creates a separate dataset from the HIST dataset where the period_Date in the HIST table is equal to the period_Date in the PDATES table. The goal is to have 33 datasets when the macro completes.
I've looked at various articles online about %DO, etc, and just can't seem to translate my thoughts to code.
I appreciate any assistance!
It would help if you'd provide an example of what you call the period_dates that exist in each file.
The period dates are MMDDYYYYs10. format.
Example:
03/21/2014
04/04/2014
07/18/2014
And you want the new files to only contain the dates specified in the other file? Or, are you looking for ranges of dates on or between the dates specified in the one file?
Each data set would contain all the values from the HIST table where the period_dates are equal.
For example the output of period_date 03/21/2014 could have 5345 rows (and associated columns), period_date 04/04/2014 could have 1255 rows (and associated columns), etc.
Like if I was to write
data EXAMPLE;
set HIST;
if period_date = 03/21/2014;
run;
If you have a "large data set", why would you design a program that reads the data set 33 times? SAS can produce all 33 data sets in a single DATA step. Wouldn't that be a better approach?
Another approach is to write the sas code to write the code needed to split the HIST data using the dates dataset PDATES. Automates the writing of the code Ody suggests.
data _null_;
set pdates ;
file "c:\SASCodeToRun" ;
line="data "||compress("hist_"||put(datefield,yymmddn8.))||" ; " ;
put line;
line =" set HIST (where=(datefield='"||put(datefield,date9.)||"'d )) ; " ;
put line;
line = "run; " ;
put line;
run;
%include("c:\SASCodeToRun");
I suggest you review the code written in the fie before you run it ... I did not test my code .... but I use this technique a lot.
Its generally not recommended to split your file up. SAS has BY group processing that is quite efficient in most circumstances. Can you explain what the next step in your process would be that you would want this separated for?
Here's how you'd get all data from in one file
proc sql;
create table selected_dates as
select *
from table1 where date in (select date from table2)
order by date;
quit;
I need to loop the data for each period_date through an existing macro, chronologically. I'm certain it's possible to do without breaking the data into chunks by just using the HIST table but I'm not familiar enough with how to write the logic to do it and haven't had much success researching it online. I figured if i had an example of breaking it out into separate datasets I could adapt that to my needs.
Can you post an excerpt from the macro with the relevant code?
Given that you're constrained by the functionality of your existing macro, views may be a good option.
Here's a wild guess as to what your macro could be doing, and a possible approach. Pretend sashelp.zipcode is a large dataset, so you want to use views for better performance. The below code should work as is on your system, so just cut-and-paste and see if it can be modified to suit your needs.
BUT, if your macro can be modified to use BY processing, that would probably perform better.
%macro my_existing_macro(data=);
%* pretend this is your exising macro with some complex logic ;
%* and for some unknown reason cannot use BY processing ;
proc print data=&data (obs=20);
run;
%mend;
%macro code;
%* this is a "helper" macro to create the view and call my_existing_macro ;
data myview / view=myview;
set sashelp.zipcode;
where statecode="&word";
run;
%my_existing_macro(data=myview);
%mend;
%* create a list of items to loop over ;
proc sql noprint;
select distinct statecode into :mylist separated by " " from sashelp.zipcode
where statecode <= "FL"; %* to limit the output for testing ;
quit;
%put &mylist;
%* loop over each item in the list ;
%* by default the loop macro calls the macro "%code" ;
options mprint;
%loop(&mylist)
%* the output should be a invocation of proc print ;
%* using the myview view, which filters for each ;
%* item in the list. during each iteration of %loop, ;
%* that item is in the reserved macro variable &word ;
Thanks all for the many replies.
The reason I require the data to be separated is due to how it needs to be processed. The HIST data, as you might guess, is historical data. It needs to be added chronologically by period_date to an Oracle table.
The existing macro for current data is as follows:
%macro ImportData(dir2=);
%let X = %sysfunc(fileexist(&dir2)); %put &X;
%if &X = 0 %then %do;
%put ERROR: File Missing. Contact Department;
%goto exit;
%end;
%else %do;
PROC IMPORT
OUT = Employee
DATAFILE = &dir2
DBMS = xlsx Replace;
SHEET = "Test";
GETNAMES = YES;
PROC SQL;
DELETE FROM odbc AMM.AMM_Table;
PROC SQL;
INSERT INTO odbcAMM.AMM_Table
(sasdatefmt = (
PERIOD_DT = 'MMDDYY10.',
ACTION_DT = 'MMDDYY10.',
EFFECTIVE_DT = 'MMDDYY10.',
TERM_DT = 'MMDDYY10.',
Load_DATE = 'MMDDYY10.' ) )
Select
MANY VARIABLES.....
FROM Employee;
PROC SQL;
connect to odbc(dsn = &dsnName);
execute (call AMM.UPSERT_Table()) by odbc;
%end;
%exit: %mend;
%ImportData(dir2= file location);
Due to the logic in the call procedure [AMM.UPSERT_Table], I need to add the HIST data by period, chronologically. I won't go into the details of that but creating a macro to take the HIST data and loop through the period_dates so that I can fit it into the existing macro would save me a lot of time.That's my overall plan. If I can not figure out a way to do that I will parse out the data in the HIST table period_date by period_date and run it through this macro a section at a time. It's less elegant for sure, but would still achieve my objective.
Yes I agree with Reeza. I (and others) have posted approaches to split your dataset. What we (and you) should be asking is "why am I splitting my dataset, and do I want to do so?'
I think you can accomplish what you want with something like:
proc sql noprint;
select catt('if period_date=',period_date,' then output _',
put(period_date,date9.),';'),
catt('_',put(period_date,date9.))
into : selects separated by ' ',
:outlist separated by ' '
from pdates
;
quit;
data &outlist.;
set hist;
&selects.;
run;
Download the %loop macro from SAS/loop.sas at master · scottbass/SAS · GitHub
Download the %parmv macro from SAS/parmv.sas at master · scottbass/SAS · GitHub (or else comment out the calls to %parmv in %loop)
Download the %hash_split_dataset from SAS/hash_split_dataset.sas at master · scottbass/SAS · GitHub (optional, see below)
Read the macro headers - there are lots of use cases in each macro.
If you're SURE PDATES has all periods, then:
proc sql noprint;
select distinct period_dates format=yymmddn8. into : periods separated by " " from pdates;
quit;
%put &periods; * should be your 33 periods ;
Otherwise, use your source data to ensure you cover all the periods:
proc sql noprint;
select distinct period_dates format=yymmddn8. into : periods separated by " " from hist;
quit;
%put &periods; * should be your 33 periods but may differ if pdates is not in sync ;
%macro dsnames;
mylib.data_&word
%mend;
%macro output;
if period_date=input(&word,yymmdd8.) then output mylib.data_&word;
%mend;
options mprint;
data
%loop(&periods,mname=dsnames)
;
set hist;
%loop(&periods,mname=output)
run;
Pay attention to the absence of semi-colon in the dsnames macro.
If your large HIST dataset will fit into memory, then review the macro header for %hash_split_dataset for a way to split a dataset via the hash object.
HTH...
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.