Macros for creating datasets based on variable values

Reply
Super Contributor
Posts: 426

Macros for creating datasets based on variable values

Appreciate if someone help me with the following interview questions which I came across yesterday.

 

I've a dataset called countries and it has variable 'Country'. It has values as follows.

 

Country

 

US

US

UK

UK

IND

IND

JAP

JAP

 

Now I need to create a dataset for each country. In this case I need to create a datasets like Country_US , Country_UK , Country_IND and Country_JAP. Tomorrow other countries like CHN , RUS can be added to the country variable. Then I need to create datasets like Country_CHN, Country_RUS. Since it is a production system, I need to write one macro which would suite for 'n' number of countries. I should not hard code any of the country values in my macro.

 

Thanks in advance for any help you offer me.

Super Contributor
Posts: 406

Re: Macros for creating datasets based on variable values

[ Edited ]

Of course there is more than one way to do this. Here is one approach, using SQL:

 

%macro createm;
proc sql noprint;
    select distinct country into :country1-:country99
    from countries;
quit;
%let cnum=&sqlobs;
%do i=1 %to &cnum;
%let dsname=country_&&country&i;
%if not %dsexists(dsname) %then %do; 
    data country_&&country&i;
        /* furtther statements */
    run;
    %end;
%end;
%mend;
%createm

The macro dsexists is not given but is easy to do. Ask if you want help here.

 

There is a shortcut for :country1-:country999 in 9.4 but I would have to look it up.

 

Hope this helps,

- Jan.

 

EDIT: %dsexist could be replaced with

 

 %sysfunc(exist(&dsn)) 
Valued Guide
Posts: 505

Re: Macros for creating datasets based on variable values

Macros for creating datasets based on variable values [ New ]

https://goo.gl/qFJSsL
https://communities.sas.com/t5/General-SAS-Programming/Macros-for-creating-datasets-based-on-variable-values/m-p/292969

HAVE

Up to 40 obs from class total obs=19

Obs    NAME       SEX    AGE    HEIGHT    WEIGHT

  1    Alice       F      13     56.5       84.0
  2    Barbara     F      13     65.3       98.0
  3    Carol       F      14     62.8      102.5
  4    Jane        F      12     59.8       84.5
  5    Janet       F      15     62.5      112.5
  6    Joyce       F      11     51.3       50.5
  7    Judy        F      14     64.3       90.0
  8    Louise      F      12     56.3       77.0
  9    Mary        F      15     66.5      112.0
 10    Alfred      M      14     69.0      112.5
 11    Henry       M      14     63.5      102.5
 12    James       M      12     57.3       83.0
 13    Jeffrey     M      13     62.5       84.0
 14    John        M      12     59.0       99.5
 15    Philip      M      16     72.0      150.0
 16    Robert      M      12     64.8      128.0
 17    Ronald      M      15     67.0      133.0
 18    Thomas      M      11     57.5       85.0
 19    William     M      15     66.5      112.0

WANT


Up to 40 obs from M total obs=10

Obs    NAME       SEX    AGE    HEIGHT    WEIGHT

  1    Alfred      M      14     69.0      112.5
  2    Henry       M      14     63.5      102.5
  3    James       M      12     57.3       83.0
  4    Jeffrey     M      13     62.5       84.0
  5    John        M      12     59.0       99.5
  6    Philip      M      16     72.0      150.0
  7    Robert      M      12     64.8      128.0
  8    Ronald      M      15     67.0      133.0
  9    Thomas      M      11     57.5       85.0
 10    William     M      15     66.5      112.0


Up to 40 obs from F total obs=9

Obs    NAME       SEX    AGE    HEIGHT    WEIGHT

 1     Alice       F      13     56.5       84.0
 2     Barbara     F      13     65.3       98.0
 3     Carol       F      14     62.8      102.5
 4     Jane        F      12     59.8       84.5
 5     Janet       F      15     62.5      112.5
 6     Joyce       F      11     51.3       50.5
 7     Judy        F      14     64.3       90.0
 8     Louise      F      12     56.3       77.0
 9     Mary        F      15     66.5      112.0


SOLUTION

%symdel SEX;
%put &=sex;

* Probably not the most efficient method;

proc sort data=sashelp.class out=class;
by sex;
;run;quit;

data _null_;
  set class;
  by sex;
  if first.sex then do;
    call symputx('sex',sex);
     rc=dosubl(resolve('
      data &sex;
        set class(where=(sex="&sex"));
      run;quit;
    '));
  end;
;run;quit;


Esteemed Advisor
Posts: 6,734

Re: Macros for creating datasets based on variable values

First collect the countries into a dataset, then create the tables dynamically from that:

proc sort
  data=countries (keep=country)
  out=cnt
  nodupkey
;
by country;
run;

data _null_;
set cnt;
call execute('data country_' !! trim(country) !! '; set countries; where country="' !! trim(country) !! '";run;');
run;

 

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super Contributor
Posts: 426

Re: Macros for creating datasets based on variable values

Could you please tell me why you've placed double quotes and single quotes in where clause?

Esteemed Advisor
Posts: 6,734

Re: Macros for creating datasets based on variable values


Babloo wrote:

Could you please tell me why you've placed double quotes and single quotes in where clause?


When you look at it closely, you will find that the argument to call execute is one string, built from character literals and the country name in the variable.

Since within the where condition in the call execute the country name is a literal, it needs to be enclosed in quotes. The best method to put (literal) quotes in a string variable is to use one type (single or double) on the "outside" and the other in the "inside".

Which quotes you use on the outside depends on the presence of macro triggers. If you want to mask them (what is often the case with call execute), you have to use single quotes on the outside; if you need them to be active, use double ones, as they also disable the macro masking effect of the single quotes you will use in the inside.

 

Another method would be to use functions to create the quotes inside the string, but that reduces readability IMO.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Esteemed Advisor
Esteemed Advisor
Posts: 7,262

Re: Macros for creating datasets based on variable values

You have your answers above, data _null_ call execute would be my choice.  However there would also be a few questions I would ask before even setting down to do any code.  

Firstly what is the reasoning behind having a dataset for each country, to me this doesn't make any sense, SAS i built with by group processing in mind, so keeping all the data in one dataset and then processing using by groups would be far easier to work with and more efficient.

Secondly, why is there two rows per country?  That would worry me, that I am missing something from the question.

Thirdly what happens on dirty data in that list?  You could have USA/US, or maybe CH - does this mean Switzerland or China?

Personally my main question would be, give me the detailed sotware development lifecycle plan on this piece of work and then I can suggest some techinical implementation.  Unfortunately most of the people giving interviews either do no know about these, or have a set of fixed nonsense questions designed to see if you can remember some syntax rather than how to properly develop software - which is probably why every job I go into has gigabytes of garbage macro code lying about the place with no documentation or plan.

(Sorry, that ended up as a bit of a rant Smiley Surprised)

Trusted Advisor
Posts: 1,216

Re: Macros for creating datasets based on variable values

I'm a big fan of using hash tables for such dynamic splitting (recognizing that it's a rarely a good idea to split such data, as others have mentioned).

 

Below uses a DOW loop and a hash table.  You read all the records for one country into a hash table, then output the data in the hash table to a dataset.  And the hash output() method can use the value of a datastep variable to determine the name of the output datasets.  This still feels like magic, after so many years of being required to name datasets during DATA step compilation time.

 

I learned this approach from one of Paul Dorfman's many many excellent papers on hash, perhaps:

http://www.lexjansen.com/nesug/nesug09/hw/HW04.pdf

 

 

137  proc sort data=sashelp.prdsale out=sale;
138    by country;
139  run;

NOTE: There were 1440 observations read from the data set SASHELP.PRDSALE.
NOTE: The data set WORK.SALE has 1440 observations and 10 variables.

140
141  data _null_ ;
142    if _n_=1 then do;
143      declare hash h () ;
144      h.definekey ("_n_") ;
145      h.definedata ("ACTUAL","PREDICT","COUNTRY","REGION");
146      h.definedone () ;
147    end;
148    do _n_ = 1 by 1 until ( last.country ) ;
149      set sale ;
150      by country ;
151      h.add() ;
152    end ;
153    h.output (dataset: cats("OUT_",compress(country,".")) ) ;
154    h.clear() ;
155  run ;

NOTE: The data set WORK.OUT_CANADA has 480 observations and 4 variables.
NOTE: The data set WORK.OUT_GERMANY has 480 observations and 4 variables.
NOTE: The data set WORK.OUT_USA has 480 observations and 4 variables.
NOTE: There were 1440 observations read from the data set WORK.SALE.

 

For an interview question, if they ask for a macro solution I might show them a macro solution , and also show a non-macro alternative like this. I think it's fair for a potential employer to say "write a little macro" just to see if you can do it, and your style, etc.  Just about any macro question has a non-macro solution.

 

I also would raise the points mentioned by @RW9.  When interviewing as a candidate, pushing back a bit on the problem definition is often useful.  Often such problems are designed with mistakes/poor specifications/etc, to see how you would respond.  And even if this was not defined as such, it's helpful to see how your potential employers react to constructive feedback.

Valued Guide
Posts: 505

Re: Macros for creating datasets based on variable values

NEW SOLUTION
* if you know just the upper limit on the number of levels;
data _null_;
  retain index 1;
  array BeenThere[32] $1 _temporary_ ;
  set sashelp.class;
  if whichc(sex,of BeenThere[*])=0 then do;
       BeenThere[index]=sex;
       if index=32 then do;
          putlog "More than 32 levels increase the dimension of BeenThere";
          stop;
       end;
       Index=Index+1;
       call symputx('sex',sex);
       rc=dosubl(resolve('
        data &sex;
          set sashelp.class(where=(sex="&sex"));
        run;quit;
        '));
  end;
;run;quit;
Valued Guide
Posts: 505

Re: Macros for creating datasets based on variable values

Minimum code?

It fun to build on others ideas, sometimes the last post is the best. Not here though.

Variation of an earlier post. I am not a huge fan of call execute. At one time it did not scale.
Also the quoting can give you a headache.

Dosubl executes and finishes between interations of the parent datastep.
Code is not stacked and run after the dataset. DOSUBL can also post back to
the parent(see some of my old SAS-L posts)
A hash without a sort would be interesting.

proc sort data=sashelp.class(keep=sex) out=unique nodupkey; by sex; ;run;quit; data _null_; set unique; call symputx('sex',sex); rc=dosubl(resolve(' data &sex; set class(where=(sex="&sex")); run;quit; ')); ;run;quit;
Ask a Question
Discussion stats
  • 9 replies
  • 1314 views
  • 4 likes
  • 6 in conversation