DATA Step, Macro, Functions and more

different table names based on variables.

Reply
Regular Contributor
Posts: 215

different table names based on variables.

Hi All,

I have the following code which generate a table call 'employee'. I want to create country specific table where the table name will be the country name itself. In this case there will be 3 tables (Asia, Europe and America).  

 

My real database has 90 countries and I don't want to use IF, Else IF statement.

 

Can someone help me with a better/shot and efficient code to solve my problem. 

 

Thank You,

 

 

DATA Employee;
INPUT Country_Name $ Country_Code Region $;
DATALINES;
Japan 107 Asia
USA 109 America
France 103 Europe
Canada 111 America
India 105 Asia
Nepal 115 Asia
Italy 102 Europe
;
RUN;

Output Table:

 

 

Table Name: Asia

Japan 107 Asia
India 105 Asia
Nepal 115 Asia

 

Table Name: America

USA 109 America
Canada 111 America

 

Table Name: Europe

France 103 Europe
Italy 102 Europe

Super User
Posts: 17,774

Re: different table names based on variables.

Please try and search your questions before you post.

 

 

Don't.
This is what BY group processing is for, if absolutely required split the dataset at the last step.

That being said, this is an FAQ since it's how people tend to think if they haven't worked with SAS a lot.
Here's several posts with sample code and examples on how to automate this so you don't need to know the number of levels prior to running the program.

http://blogs.sas.com/content/sasdummy/2015/01/26/how-to-split-one-data-set-into-many/
http://www.sascommunity.org/wiki/Split_Data_into_Subsets
https://gist.github.com/statgeek/4bfb7574713bedf4e011

For info on how BY group processing works:
https://support.sas.com/documentation/cdl/en/lrcon/69852/HTML/default/viewer.htm#n138da4gme3zb7n1nif...

Valued Guide
Posts: 505

Re: different table names based on variables.

Create four region datasets from one master dataset


1. Hash method  Martin, Vincent (STATCAN) via listserv.uga.edu
2. Dosubl method

3 Collection of other solutions;

HAVE
====

Up to 40 obs WORK.EMPLOYEE total obs=7

       COUNTRY_
Obs      NAME      COUNTRY_CODE    REGION

 1     Japan                107    Asia
 2     USA                  109    America
 3     France               103    Europe
 4     Canada               111    America
 5     India                105    Asia
 6     Nepal                115    Asia
 7     Italy                102    Europe


WANT
===

             Member   Obs, Entries                  File
#  Name      Type      or Indexes   Vars  Label     Size  Last Modified

1  AMERICA   DATA           2        3            131072  02/02/2017 12:43:52
2  ASIA      DATA           3        3            131072  02/02/2017 12:43:52
3  EMPLOYEE  DATA           7        3            131072  02/02/2017 12:43:52
4  EUROPE    DATA           2        3            131072  02/02/2017 12:43:52

WORKING CODE
============

HASH     do while(hi.next()=0);
            h.output(dataset: region||"(where=(region="||quote(region)||"))" );
         end;

DOSUBL   call symputx('region',region);
           rc=dosubl('
             data &region;
               set employee(where=(region="&region"));
             run;quit;
          ');


HASH SOLUTION
=============

* create some data;
DATA Employee;
INPUT Country_Name $ Country_Code Region $;
cards4;
Japan 107 Asia
USA 109 America
France 103 Europe
Canada 111 America
India 105 Asia
Nepal 115 Asia
Italy 102 Europe
;;;;
run;quit;


Martin, Vincent (STATCAN) via listserv.uga.edu

data _null_;
      if 0 then set employee;

            declare hash h(dataset: "work.employee");
            h.definekey("country_name");
            h.definedata(all: "y");
            h.definedone();

            declare hash outctrl(dataset: "work.employee(keep=region)");
            declare hiter hi("outctrl");
            outctrl.definekey("region");
            outctrl.definedata("region");
            outctrl.definedone();

      do while(hi.next()=0);
            h.output(dataset: region||"(where=(region="||quote(region)||"))" );
      end;

run;

DOSUBL SOLUTION
===============

proc datasets lib=work kill;
run;quit;
proc sort data=employee out=have;
by region;
run;quit;

data _null_;

  set have;
  by region;

  if last.region then do;
     call symputx('region',region);
     rc=dosubl('
        data &region;
           set employee(where=(region="&region"));
        run;quit;
     ');
  end;

run;quit;

proc contents data=work._all_;
run;quit;


             Member   Obs, Entries                  File
#  Name      Type      or Indexes   Vars  Label     Size  Last Modified

1  AMERICA   DATA           2        3            131072  02/02/2017 12:43:52
2  ASIA      DATA           3        3            131072  02/02/2017 12:43:52
3  EMPLOYEE  DATA           7        3            131072  02/02/2017 12:43:52
4  EUROPE    DATA           2        3            131072  02/02/2017 12:43:52



/* T007960 SAS and R solutions for Splitting datasets

SAS and R solutions for Splitting sashelp.class into
two datasets, one with females and one with males.
The code below will work for any dataset and
splitting variable.

LARGE DATASET AND SMALL DATASET SOLUTIONS ARE GIVEN

1. R soluton is one line of code. This splits class
   into dataframes res$F and res$F where res is a list

   res<-split(class, list(class$SEX), drop = TRUE);

2. SAS HASH solution. Like R this is an in memory solution
   good for smaller datasets. First the female records
   are loaded into memory and then sent out to the
   F dataset, then the males areloaded and output.

3. This is a solution for large datasets.
   First we create a macro string containing all
   the levels. Just male and female for outr example.
   The we use a datastep to build the following program.

   Only one line has to changed on the program to do another dataset

   data F M;
      set sashelp.class;
      select (sex);
         when ('F')  output F;
         when ('M')  output M;
         otherwise;
        end;
   run;

   /* change dataset and sex for other data */
   sas=';set sashelp.class; select (sex);';

*********************************
*********************************
* LARGE DATASET SOLUTION   *
*********************************
*********************************;

* put the levels of sex into &sec = "F" "M";
proc sql;
  select distinct quote(strip(sex)) into :sex separated by ' ' from sashelp.class
;quit;

%put &=sex;    /* the levels "F" "M";
%put &=sqlobs; /* then number of levels */

data _null_;
   length sas $200;
   array arysex[&sqlobs] $32 (&sex);
   sas='data ';
   call execute(sas);

   do i=1 to &sqlobs;
      sas=arysex[i];
      call execute(sas);
   end;

   /* change dataset and sex for other data */
   sas=';set sashelp.class; select (sex);';
   call execute(sas);

   do i=1 to &sqlobs;
      sas=catx(' ',cats("when ('",arysex[i],"')"),'output',arysex[i],';');
      call execute(sas);
      put sas;
   end;

   sas='otherwise;end;run;quit;';
   call execute(sas);
run;quit;



/* Generated code
NOTE: CALL EXECUTE generated line.
1    + data
2    + F
3    + M
4    + ;set sashelp.class; select (sex);
5    + when ('F') output F ;
6    + when ('M') output M ;
7    + otherwise;end;run;
*.

*********************************
*********************************
* SMALL DATASET HASH SOLUTION   *
*********************************
*********************************;

* for small datasets this should be a very fast way to split
  a dataset. However the data needs to be indexed or
  sortted on the split variable;

proc sort data=sashelp.class out=class noequals;
by sex;
run;quit;

data _null_;

  * establist the PDV fr both males and females;
  if 0 then set class;

  declare hash h();

    * name is the primary key;
    * if you do not have a primary key you can;
    * use the method below;

    h.definekey('name');
    h.definedata('name','sex','age','height','weight');
    h.definedone();

    * note this is loading the records into memory -
      but can consume memory;
    do until ( last.sex ) ;
      set class;
      by sex ;
      h.add () ;
    end ;

    * output dataset will change with next sex;
    h.output (dataset: sex) ;

run ;quit;

**************
**************
* R SOLUTION *
**************
**************

data "c:\temp\class.sas7bdat"(outrep=WINDOWS_32 compress=no);
  set sashelp.class;
run;quit;

%utl_submit_R
 (%nrbquote
  (
   library(sas7bdat);
   class<-read.sas7bdat("c:\\temp\\class.sas7bdat");
   /* res is a list of two dataframes */
   res<-split(class, list(class$SEX), drop = TRUE);
   res$F;
   res$M;
   /* if you want just the names */
   res$F$NAME;
   res$F[1];
  )
 );

/*
RES$F

      NAME SEX AGE HEIGHT WEIGHT
2    Alice   F  13   56.5   84.0
3  Barbara   F  13   65.3   98.0
4    Carol   F  14   62.8  102.5
7     Jane   F  12   59.8   84.5
8    Janet   F  15   62.5  112.5
11   Joyce   F  11   51.3   50.5
12    Judy   F  14   64.3   90.0
13  Louise   F  12   56.3   77.0
14    Mary   F  15   66.5  112.0

RES$M
      NAME SEX AGE HEIGHT WEIGHT
1   Alfred   M  14   69.0  112.5
5    Henry   M  14   63.5  102.5
6    James   M  12   57.3   83.0
9  Jeffrey   M  13   62.5   84.0
10    John   M  12   59.0   99.5
15  Philip   M  16   72.0  150.0
16  Robert   M  12   64.8  128.0
17  Ronald   M  15   67.0  133.0
18  Thomas   M  11   57.5   85.0
19 William   M  15   66.5  112.0
*/

Ask a Question
Discussion stats
  • 2 replies
  • 150 views
  • 3 likes
  • 3 in conversation