Re: Trying to create a macro to get different dataset created

NitishA0505 · Posted 02-18-2024 04:34 AM

Hell everyone,

I am trying to create a macro to get different datasets created by the make variable

/* Obtain distinct make list */

proc sql noprint;

select distinct make into :make_list separated by ',' from sashelp.cars;

quit;

/* Count the number of distinct makes */

%let make_count = %sysfunc(countw(&make_list));

/* Loop through each distinct make */

%do i = 1 %to &make_count

; %let make = %scan(&make_list, &i, ,);

data &make.;

set sashelp.cars;

where make="&make.";

run;

%end;

giving me error: ERROR: The function COUNTW referenced by the %SYSFUNC or %QSYSFUNC macro function has too many arguments.

Patrick · Posted 02-18-2024 05:07 AM

That's because your macro variable &make_list uses commas as separators that the countw() function then interpretes as separators for arguments.

Create either a string with different word separators or use a macro quoting function.

/* option 1: quoting */
proc sql noprint;
  select distinct make into :make_list separated by ',' 
  from sashelp.cars;
quit;

%let make_count = %sysfunc(countw(%nrbquote(&make_list)));
%put &=make_count;

/* option 2: blank as word separator */
proc sql noprint;
  select distinct make into :make_list separated by ' ' 
  from sashelp.cars;
quit;

%let make_count = %sysfunc(countw(%nrbquote(&make_list)));
%put &=make_count;

Tom · Posted 02-18-2024 09:51 AM

There is no need to re-count, SQL will do that for you.

But it will be easier to use a different delimiter. And you should use NLITERAL() function to make sure your values are SAS names.

proc sql noprint;
 select distinct make into :make_list separated by '|' from sashelp.cars;
%let make_count=&sqlobs;
quit;
%do i = 1 %to &make_count;
  %let make = %qscan(&make_list, &i, |);
data %sysfunc(nliteral(&make.));
  set sashelp.cars;
  where make="&make";
run;
%end;

But that will not fix Mercedes-Benz since even with VALIDMEMNAME=EXTEND you cannot have - in member name.

So you will need to work a little harder to make valid member names.

Do the names really need to be related to the content? You can put the model value into the LABEL of the dataset.

data make&i (label="&make");

Amir · Posted 02-18-2024 05:17 AM

Hi,

Try changing the:

separated by ','

to (a space in quotes):

separated by ' '

This is so that make_list does not end up with multiple commas which causes the error when invoking the countw() function, as it can only accept a maximum of 3 comma separated arguments according to its documentation.

Thanks & kind regards,

Amir.

PaigeMiller · Posted 02-18-2024 05:51 AM

@NitishA0505 It is rarely necessary in SAS to split a data set like this. Whatever your intentions, most SAS procedures have a BY statement so you can do analyses on all of the different groups. In addition, if you want to perform an analysis on just one group, let's say you want to analyze only the Mazda data, you can do it with a WHERE statement, like this example:

proc means data=sashelp.cars;
    where make='Mazda';
    /* optionally, other PROC MEANS statements */
run;

So unless you are in one of those very rare cases where you do need separate data sets, the best solution here, as well as the easiest solution, is to NOT split the data set into smaller data sets by variable MAKE (or by any other variable).

--
Paige Miller

yabwon · Posted 02-18-2024 06:00 AM

First to all, your idea will fail because values like "Mercedes-Benz", which will blow up your processing:

NOTE: Line generated by the macro variable "MAKE".
1     Mercedes-Benz
              -
              22
              200

ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string, (, /, ;, _DATA_, _LAST_, _NULL_.

ERROR 200-322: The symbol is not recognized and will be ignored.

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.MERCEDES may be incomplete.  When this step was stopped there were 0 observations and 15 variables.
WARNING: Data set WORK.MERCEDES was not replaced because this step was stopped.
WARNING: The data set WORK.BENZ may be incomplete.  When this step was stopped there were 0 observations and 15 variables.
WARNING: Data set WORK.BENZ was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

To handle the problem you have to replace all "no-alpha, no-numeric, and no-underscore" characters with something acceptable by SAS, e.g. by an "_".

A lazy programmer would do it, for example, this way:

proc sql noprint;
  create table temp as
  select distinct 
    make as makeVal /*value*/
  , compress(make,"_","KAD") as makeVar /*variable*/
  from 
    sashelp.cars
  ;
quit;
proc print;
run;

Of course here we are working on and example data, but it easy to imagine that the real data could have values like "AB-C" and "A-BC" which will both be "compressed" to "ABC", so I would add one more step to make data sets names "unique":

data temp;
  length makeVar $ 32;
  set temp;
  makeVar = cats("_",_N_,"_",makeVar);
run;
proc print;
run;

Next thing is that looping over your way will force SAS to read data as many times as there are iterations in the loop, more I/O-efficient would be to use multiple "output" statement inside data step's "select" clause, e.g.:

DATA _1_A _2_B _3_C;
  set inputData;

  select(variable);
    when ("A") output _1_A;
    when ("B") output _1_B;
    when ("C") output _1_C;
    otherwise put "ERROR: Unknown value:" variable;
  end;
run;

Now you can play with macro loops which you have to embed inside macros, but me personally, I would use the MacroArray SAS Package and do it this way:

filename packages "C:\SAS_WORK\SAS_PACKAGES";
%include packages(SPFinit);

%loadPackage(MacroArray);

%array(ds=temp, vars=makeVal makeVar, macArray=Y);

/* just for preview */
%put %do_over(makeVal);
%put %do_over(makeVar);

resetline;
data %do_over(makeVar);

  set sashelp.CARS;

  select(MAKE);
    %do_over(makeVar, phrase=%nrstr(
      when ("%makeVal(&_i_.)") output %makeVar(&_i_.);
    ))
    otherwise put "ERROR: Unknown value:" MAKE;
  end;
run;

%deleteMacArray(makeVal makeVar, macArray=Y)

Log from the data step:

1    data %do_over(makeVar);
2
3      set sashelp.CARS;
4
5      select(MAKE);
6        %do_over(makeVar, phrase=%nrstr(
7          when ("%makeVal(&_i_.)") output %makeVar(&_i_.);
8        ))
9        otherwise put "ERROR: Unknown value:" MAKE;
10     end;
11   run;

NOTE: There were 428 observations read from the data set SASHELP.CARS.
NOTE: The data set WORK._1_ACURA has 7 observations and 15 variables.
NOTE: The data set WORK._2_AUDI has 19 observations and 15 variables.
NOTE: The data set WORK._3_BMW has 20 observations and 15 variables.
NOTE: The data set WORK._4_BUICK has 9 observations and 15 variables.
NOTE: The data set WORK._5_CADILLAC has 8 observations and 15 variables.
NOTE: The data set WORK._6_CHEVROLET has 27 observations and 15 variables.
NOTE: The data set WORK._7_CHRYSLER has 15 observations and 15 variables.
NOTE: The data set WORK._8_DODGE has 13 observations and 15 variables.
NOTE: The data set WORK._9_FORD has 23 observations and 15 variables.
NOTE: The data set WORK._10_GMC has 8 observations and 15 variables.
NOTE: The data set WORK._11_HONDA has 17 observations and 15 variables.
NOTE: The data set WORK._12_HUMMER has 1 observations and 15 variables.
NOTE: The data set WORK._13_HYUNDAI has 12 observations and 15 variables.
NOTE: The data set WORK._14_INFINITI has 8 observations and 15 variables.
NOTE: The data set WORK._15_ISUZU has 2 observations and 15 variables.
NOTE: The data set WORK._16_JAGUAR has 12 observations and 15 variables.
NOTE: The data set WORK._17_JEEP has 3 observations and 15 variables.
NOTE: The data set WORK._18_KIA has 11 observations and 15 variables.
NOTE: The data set WORK._19_LANDROVER has 3 observations and 15 variables.
NOTE: The data set WORK._20_LEXUS has 11 observations and 15 variables.
NOTE: The data set WORK._21_LINCOLN has 9 observations and 15 variables.
NOTE: The data set WORK._22_MINI has 2 observations and 15 variables.
NOTE: The data set WORK._23_MAZDA has 11 observations and 15 variables.
NOTE: The data set WORK._24_MERCEDESBENZ has 26 observations and 15 variables.
NOTE: The data set WORK._25_MERCURY has 9 observations and 15 variables.
NOTE: The data set WORK._26_MITSUBISHI has 13 observations and 15 variables.
NOTE: The data set WORK._27_NISSAN has 17 observations and 15 variables.
NOTE: The data set WORK._28_OLDSMOBILE has 3 observations and 15 variables.
NOTE: The data set WORK._29_PONTIAC has 11 observations and 15 variables.
NOTE: The data set WORK._30_PORSCHE has 7 observations and 15 variables.
NOTE: The data set WORK._31_SAAB has 7 observations and 15 variables.
NOTE: The data set WORK._32_SATURN has 8 observations and 15 variables.
NOTE: The data set WORK._33_SCION has 2 observations and 15 variables.
NOTE: The data set WORK._34_SUBARU has 11 observations and 15 variables.
NOTE: The data set WORK._35_SUZUKI has 8 observations and 15 variables.
NOTE: The data set WORK._36_TOYOTA has 28 observations and 15 variables.
NOTE: The data set WORK._37_VOLKSWAGEN has 15 observations and 15 variables.
NOTE: The data set WORK._38_VOLVO has 12 observations and 15 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.03 seconds

Bart

P.S. To install and use macroArray package do:

Enable the framework [first time only]:

filename SPFinit url "https://raw.githubusercontent.com/yabwon/SAS_PACKAGES/main/SPF/SPFinit.sas";
%include SPFinit; /* enable the framework */

Install the framework and the package on your machine in the folder you created:

filename packages "</your/directory/for/packages/>"; 
%installPackage(SPFinit macroArray)

From now on, in your SAS session just run it like this:

filename packages "</your/directory/for/packages/>";
%include packages(SPFinit.sas);

%loadPackage(packageName)

Link to details.

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

PaigeMiller · Posted 02-18-2024 06:51 AM

@yabwon and @NitishA0505 and others

In this example, the values of MAKE are less than 32 characters. In some other real world case, the value of the variable to split the data sets on could be more than 32 characters. What then? Modify the macro to handle this additional complication — or simply not split the data sets and use the method I proposed above, which requires much less programming than splitting the data set into many smaller data sets, and works even with a character value of the split value that is >32 characters.

@NitishA0505 please explain in words why you need separate data sets.

--
Paige Miller

yabwon · Posted 02-18-2024 08:19 AM

If it's about splitting the data, the rule "don't need it - don't do it" I 100% agree.

About the >32 character case, this part in my code:

data temp;
  length makeVar $ 32;
  set temp;
  makeVar = cats("_",_N_,"_",makeVar);
run;

already handles it.

[EDIT:]

Of course we can make it:

proc sql noprint;
  create table temp as
  select distinct 
    make as makeVal /*value*/
  , put(sha256(make),$hex16.) as makeVar /*variable*/
  from 
    sashelp.cars
  ;
quit;

to ensure 1-to-1 correspondence between the value and the dataset name.

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

Kurt_Bremser · Posted 02-18-2024 09:07 AM

To give you some perspective: in my whole professional work with SAS, which spanned more than two decades, I did something like this exactly twice.

Reason for #1 was that we had to to split the data along regional lines, per company policy. Insurance company sales politics at work: "you" must not see "our" customers.

Reason for #2 was that I wanted a piece of code to be runnable for testing by the "normal" users, who had restrictions about their WORK and UTILLOC space which the user under which the scheduled jobs ran did not have.

The particular piece of code was a PROC SORT that cracked their UTILLOC quota. So I split the dataset along the top-level BY variable, sorted each part, and created an ordered concatenated view for the next step. The whole operation was slightly less performant than the do-it-all-at-once method possible for the scheduler user, so we tolerated it.

But, as already said, once in 20+ years in a heap of more than 1000 jobs, with approx. a million LOC all in all.

So don't expect to do this often.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

PaigeMiller · Posted 02-18-2024 10:22 AM

Thanks, @Kurt_Bremser . The only situation I have run into in my experience where you might want to split the data into portions using some identifying variable as requested, was here in the forums, where someone wanted to use a SAS PROC that did not have a BY statement. (What PROC, you ask? PROC OPTMODEL) So there are a small number of PROCs where maybe splitting the data is the right thing to do, but then you would write the macro very differently, and all of these issues about strings longer than 32 characters and issues about characters that cannot be in a data set name go away (you create a data set using a WHERE statement and use that in PROC OPTMODEL, the name of this data set can be any legal SAS name unrelated to the value of the split variable; and can even be the same name through each iteration of the loop). But context is everything, and even in the case where you are using PROC OPTMODEL, a very different macro is called for, and @NitishA0505 has provided no context, and again I ask him/her to provide some understanding of why splitting the data this is necessary.

--
Paige Miller

RobPratt · Posted 02-19-2024 11:52 AM

You might be interested in this SAS Note:

https://support.sas.com/kb/42/332.html

yabwon · Posted 02-18-2024 12:09 PM

@Kurt_Bremser to your two examples (which btw. are almost exactly the same I had in my practice) I would add: #3 when I'm showing the method to my students when teaching macro programming.

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

mkeintz · Posted 02-18-2024 01:22 PM

@Kurt_Bremser

On the other hand ...

Splitting datasets was a constant advantage in work I did for about 15 years. Every market trading day there is a new data set for stock trades (millions of obs/day for about 8,900 stock tickers) and for stock quotes (billions daily). If you regularly need to examine particular subsets of those tickers (say near market opening and market close, or for complimentary groups of stocks), there's a great advantage to dividing the dataset.

With splitting, the resource savings of accessing one (or a few) of the smaller data sets, easily outstrips the benefits of managing only one data set. Of course, the benefit of splitting is more likely to occur when historical records don't change, as in my situation above.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Kurt_Bremser · Posted 02-18-2024 02:08 PM

We did create a lot of "split" data in the first place, either for the same reason (lots of transaction data, e.g. from accounting, or to keep a series of states for comparisons).

Which means most of our datasets were timestamped in their names.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

mkeintz · Posted 02-18-2024 03:28 PM

If splitting is truly desirable, then there are ways to do it very efficiently. And you most certainly do not need to do a complete pass of the original data set for each of the splits.

You can use a data step and a hash object. One of the advantages of the hash object is that you can use the "output" method (i.e. write data) to a dynamically named data set, so you can use the value of a variable (i.e. "make") to name the data set.

In the case of sashelp.cars, which is sorted by make, you need only one hash object:

data _null_;
  set sashelp.cars;
  by make;
  if _n_=1 then do;
    declare hash h (dataset:'sashelp.cars (obs=0)',multidata:'Y');

    h.definekey('make');
    h.definedata(all:'yes');
    h.definedone();
  end;
  h.add();

  if last.make;
  length dsname $60;
  dsname='work.' || translate(trim(make),'_',' ','_','-');
  h.output(dataset:dsname);
  h.clear();
run;

The hash object is populated with all obs for the current value of make. At the last obs for the make, the output method (h.output) writes the hash contents to a dataset with a customized name. The object is then cleared for use with the next make. The translate function replaces all internal blanks and all dashes with '_' for the data set name. So "Land Rover" yields data set "LAND_ROVER" and "Mercedez-Benz" yields "MERCEDEZ_BENZ".

If the dataset is not sorted, then you need a hash object for each make, and you have to hold them all until the end of the data. At the end of the data, the individual hash objects are accessed one by one via a "hash of hashes" (hoh), and each is output to a unique name, as below:

data _null_;
  set sashelp.cars end=end_of_data;

  if _n_=1 then do;
    declare hash one_make_h;  /*Reserve this name for hashes, not variables*/

    declare hash hoh ();
      hoh.definekey('make');
      hoh.definedata('make','one_make_h');
      hoh.definedone();
    declare hiter hohi ('hoh'); /*To iterate over HOH, to access each make and its hash*/
  end;

  if hoh.find()^=0 then do;  /* New Make? */
    one_make_h=_new_ hash (dataset:'sashelp.cars (obs=0)',multidata:'Y');
    one_make_h.definekey('make');
    one_make_h.definedata(all:'Y');
    one_make_h.definedone();

    hoh.add();
  end;
  one_make_h.add();

  if end_of_data then do while (hohi.next()=0);
    length dsname $60;
    dsname='work.' || translate(trim(make),'_',' ','_','-');
    one_make_h.output(dataset:dsname);
  end;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Call for Content

Classroom Training Available!