BookmarkSubscribeRSS Feed
Ody
Quartz | Level 8 Ody
Quartz | Level 8

Hello all,

I have a large set of data called HIST. Each record in the HIST dataset has a period_date. I have another set of data, a table of 33 period_dates called PDATES, associated with the HIST data.

I would like to create a macro that looks at the first date in the PDATES table and creates a separate dataset from the HIST dataset where the period_Date in the HIST table is equal to the period_Date in the PDATES table. The goal is to have 33 datasets when the macro completes.

I've looked at various articles online about %DO, etc, and just can't seem to translate my thoughts to code.

I appreciate any assistance!

13 REPLIES 13
art297
Opal | Level 21

It would help if you'd provide an example of what you call the period_dates that exist in each file.

Ody
Quartz | Level 8 Ody
Quartz | Level 8

The period dates are MMDDYYYYs10. format.

Example:

03/21/2014

04/04/2014

07/18/2014

art297
Opal | Level 21

And you want the new files to only contain the dates specified in the other file? Or, are you looking for ranges of dates on or between the dates specified in the one file?

Ody
Quartz | Level 8 Ody
Quartz | Level 8

Each data set would contain all the values from the HIST table where the period_dates are equal.

For example the output of period_date  03/21/2014 could have 5345 rows (and associated columns), period_date 04/04/2014 could have 1255 rows (and associated columns), etc.


Like if I was to write


data EXAMPLE;

set HIST;

if period_date = 03/21/2014;

run;

Astounding
PROC Star

If you have a "large data set", why would you design a program that reads the data set 33 times?  SAS can produce all 33 data sets in a single DATA step.  Wouldn't that be a better approach?

MumSquared
Calcite | Level 5

Another approach is to write the sas code to write the code needed to split the HIST  data using the dates dataset PDATES. Automates the writing of the code Ody suggests.

data _null_;

     set pdates ;

     file "c:\SASCodeToRun" ;

     line="data "||compress("hist_"||put(datefield,yymmddn8.))||" ; " ;

     put line;

     line ="   set HIST (where=(datefield='"||put(datefield,date9.)||"'d )) ; " ;

     put line;

     line = "run; " ;

     put line;

run;


%include("c:\SASCodeToRun");


I suggest you review the code written in the fie before you run it ... I did not test my code .... but I use this technique a lot.

Reeza
Super User

Its generally not recommended to split your file up. SAS has BY group processing that is quite efficient in most circumstances. Can you explain what the next step in your process would be that you would want this separated for?

Here's how you'd get all data from in one file

proc sql;

     create table selected_dates as

     select *

     from table1 where date in (select date from table2)

     order by date;

quit;

Ody
Quartz | Level 8 Ody
Quartz | Level 8

I need to loop the data for each period_date through an existing macro, chronologically. I'm certain it's possible to do without breaking the data into chunks by just using the HIST table but I'm not familiar enough with how to write the logic to do it and haven't had much success researching it online. I figured if i had an example of breaking it out into separate datasets I could adapt that to my needs.

ScottBass
Rhodochrosite | Level 12

Can you post an excerpt from the macro with the relevant code?

Given that you're constrained by the functionality of your existing macro, views may be a good option.

Here's a wild guess as to what your macro could be doing, and a possible approach.  Pretend sashelp.zipcode is a large dataset, so you want to use views for better performance.  The below code should work as is on your system, so just cut-and-paste and see if it can be modified to suit your needs.

BUT, if your macro can be modified to use BY processing, that would probably perform better.

%macro my_existing_macro(data=);

  %* pretend this is your exising macro with some complex logic ;

  %* and for some unknown reason cannot use BY processing ;

  proc print data=&data (obs=20);

  run;

%mend;

%macro code;

  %* this is a "helper" macro to create the view and call my_existing_macro ;

  data myview / view=myview;

    set sashelp.zipcode;

    where statecode="&word";

  run;

  %my_existing_macro(data=myview);

%mend;

%* create a list of items to loop over ;

proc sql noprint;

  select distinct statecode into :mylist separated by " " from sashelp.zipcode

  where statecode <= "FL";  %* to limit the output for testing ;

quit;

%put &mylist;

%* loop over each item in the list ;

%* by default the loop macro calls the macro "%code" ;

options mprint;

%loop(&mylist)

%* the output should be a invocation of proc print ;

%* using the myview view, which filters for each ;

%* item in the list.  during each iteration of %loop, ;

%* that item is in the reserved macro variable &word ;


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
Ody
Quartz | Level 8 Ody
Quartz | Level 8

Thanks all for the many replies.

The reason I require the data to be separated is due to how it needs to be processed. The HIST data, as you might guess, is historical data. It needs to be added chronologically by period_date to an Oracle table.

The existing macro for current data is as follows:

%macro ImportData(dir2=);

%let X = %sysfunc(fileexist(&dir2)); %put &X;

%if &X = 0 %then %do;

  %put ERROR: File Missing. Contact Department;

  %goto exit;

  %end;

  %else %do;

  PROC IMPORT

  OUT = Employee

  DATAFILE = &dir2

  DBMS = xlsx Replace;

  SHEET = "Test";

  GETNAMES = YES;

  PROC SQL;

  DELETE FROM odbc AMM.AMM_Table;

  PROC SQL;

  INSERT INTO odbcAMM.AMM_Table

  (sasdatefmt =  (

  PERIOD_DT = 'MMDDYY10.',

  ACTION_DT = 'MMDDYY10.',

  EFFECTIVE_DT = 'MMDDYY10.',

  TERM_DT = 'MMDDYY10.',

  Load_DATE = 'MMDDYY10.'  )  )

  Select

  MANY VARIABLES.....

  FROM Employee;

  PROC SQL;

  connect to odbc(dsn = &dsnName);

  execute (call AMM.UPSERT_Table()) by odbc;

  %end;

%exit: %mend;

%ImportData(dir2= file location);

Due to the logic in the call procedure [AMM.UPSERT_Table], I need to add the HIST data by period, chronologically. I won't go into the details of that but creating a macro to take the HIST data and loop through the period_dates so that I can fit it into the existing macro would save me a lot of time.That's my overall plan. If I can not figure out a way to do that I will parse out the data in the HIST table period_date by period_date and run it through this macro a section at a time. It's less elegant for sure, but would still achieve my objective.

ScottBass
Rhodochrosite | Level 12

Yes I agree with Reeza.  I (and others) have posted approaches to split your dataset.  What we (and you) should be asking is "why am I splitting my dataset, and do I want to do so?'


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
art297
Opal | Level 21

I think you can accomplish what you want with something like:

proc sql noprint;

  select catt('if period_date=',period_date,' then output _',

              put(period_date,date9.),';'),

         catt('_',put(period_date,date9.))

    into : selects separated by ' ',

         :outlist separated by ' '

      from pdates

  ;

quit;

data &outlist.;

  set hist;

  &selects.;

run;

ScottBass
Rhodochrosite | Level 12

Download the %loop macro from SAS/loop.sas at master · scottbass/SAS · GitHub

Download the %parmv macro from SAS/parmv.sas at master · scottbass/SAS · GitHub (or else comment out the calls to %parmv in %loop)

Download the %hash_split_dataset from SAS/hash_split_dataset.sas at master · scottbass/SAS · GitHub (optional, see below)

Read the macro headers - there are lots of use cases in each macro.

If you're SURE PDATES has all periods, then:

proc sql noprint;

  select distinct period_dates format=yymmddn8. into : periods separated by " " from pdates;

quit;

%put &periods;  * should be your 33 periods ;

Otherwise, use your source data to ensure you cover all the periods:

proc sql noprint;

  select distinct period_dates format=yymmddn8. into : periods separated by " " from hist;

quit;

%put &periods;  * should be your 33 periods but may differ if pdates is not in sync ;

%macro dsnames;

mylib.data_&word

%mend;

%macro output;

if period_date=input(&word,yymmdd8.) then output mylib.data_&word;

%mend;

options mprint;

data

%loop(&periods,mname=dsnames)

;

set hist;

%loop(&periods,mname=output)

run;


Pay attention to the absence of semi-colon in the dsnames macro.


If your large HIST dataset will fit into memory, then review the macro header for %hash_split_dataset for a way to split a dataset via the hash object.


HTH...



Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 2131 views
  • 0 likes
  • 6 in conversation