DATA Step, Macro, Functions and more

Create Unique Data Sets

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 10
Accepted Solution

Create Unique Data Sets

Hello, 

 

The code below creates a different data set for each zipcode in my primary data set.  However, it overwrites/deletes all of the datasets except for the one created during its last iteration.  I would like for SAS to generate a unique data set for each different zipcode and save them all.  Any thoughts???

 

data _null_;

    dcl hash hzipcode (ordered:'a');

    hzipcode.definekey ('zipcode', 'state', 'city', 'year',

                        'observedclosings','quarter',

                        'finishedvacant','finishedsupply',  

                        'housinginventory','housingsupply','vdlinventory','vdlsupply',

                        'underconstruction','Modelinventory','avgclosingprice','_n_');

     hzipcode.definedata ('zipcode', 'state', 'city', 'year', 'observedclosings',

                          'quarter'finishedvacant','finishedsupply',

                          'housinginventoryy','housingsupply','vdlinventory',

                          'vdlsupply','underconstruction','Modelinventory',

                          'avgclosingprice');

     hzipcode.definedone();

 

     do _n_ = 1 by 1 until (last.zipcode);

        set SASUSER.Ivntrysort;

        by zipcode;

        hzipcode.add();

     end;

     hzipcode.output (dataset: 'out' || put(zipcode, best.-L));

run;

 

Mike


Accepted Solutions
Solution
‎05-30-2017 10:05 AM
Occasional Contributor
Posts: 10

Re: Create Unique Data Sets

[ Edited ]

My format (Best.-L) was numeric and I needed character.  After that correction, worked perfectly.  Thanks for verifying the code was correct (almost).

View solution in original post


All Replies
PROC Star
Posts: 7,492

Re: Create Unique Data Sets

Posted in reply to mgarrison

I'm surprised it would run without aborting due to critical errors. There were two typos in your code. They are corrected in the following:

data _null_;
    dcl hash hzipcode (ordered:'a');
    hzipcode.definekey ('zipcode', 'state', 'city', 'year',
                        'observedclosings','quarter',
                        'finishedvacant','finishedsupply',  
                        'housinginventory','housingsupply','vdlinventory','vdlsupply',
                        'underconstruction','Modelinventory','avgclosingprice','_n_');
     hzipcode.definedata ('zipcode', 'state', 'city', 'year', 'observedclosings',
/*                          'quarter', finishedvacant','finishedsupply',*/
                          'quarter', 'finishedvacant','finishedsupply',
                          'housinginventory','housingsupply','vdlinventory',
/*                          'housinginventoryy','housingsupply','vdlinventory',*/
                          'vdlsupply','underconstruction','Modelinventory',
                          'avgclosingprice');
     hzipcode.definedone();
 
     do _n_ = 1 by 1 until (last.zipcode);
        set sasuser.Ivntrysort;
        by zipcode;
        hzipcode.add();
     end;
     hzipcode.output (dataset: 'out' || put(zipcode, best.-L));
run;

Art, CEO, AnalystFinder.com

Respected Advisor
Posts: 4,173

Re: Create Unique Data Sets

[ Edited ]

@art297

I believe even after your fixes the code wouldn't run because the hash gets defined before any variable definitions.

The data step would also iterate only once and though only output a table for the very first zipcode.

 

Below (untested) code should do what the OP is asking for:

data _null_;
  set sasuser.Ivntrysort;
  by zipcode;

  if _n_=1 then
    do;
      length n_obs 8;
      dcl hash hzipcode (ordered:'a');
      hzipcode.definekey ('zipcode','n_obs');
      hzipcode.definedata ('zipcode', 'state', 'city', 'year', 'observedclosings',
        /*                          'quarter', finishedvacant','finishedsupply',*/
        'quarter', 'finishedvacant','finishedsupply',
        'housinginventory','housingsupply','vdlinventory',
        /*                          'housinginventoryy','housingsupply','vdlinventory',*/
        'vdlsupply','underconstruction','Modelinventory',
        'avgclosingprice');
      hzipcode.definedone();
    end;

  n_obs+1;
  _rc=hzipcode.add();

  if last.zipcode then
    do;
      hzipcode.output (dataset: 'out' || put(zipcode, best.-L));
      n_obs=0;
    end;
run;

Like usual one is tempted to ask the question:

Why to split up the data into multiple tables when SAS offers by group processing?

 

PROC Star
Posts: 7,492

Re: Create Unique Data Sets

[ Edited ]

@Patrick: I tested my suggested changes to the code on a file that contained three sets of zip codes. The code created all three files. Here is the code I ran (including a test dataset):

data Ivntrysort;
  input zipcode (state city year observedclosings quarter
         finishedvacant finishedsupply housinginventory housingsupply
         vdlinventory vdlsupply underconstruction Modelinventory
         avgclosingprice) ($);
  cards;
11111 a a a a a a a a a a a a a a
11111 a a a a a a a a a a a a a a
11111 a a a a a a a a a a a a a a
22222 b b b b b b b b b b b b b b
22222 b b b b b b b b b b b b b b
22222 b b b b b b b b b b b b b b
33333 c c c c c c c c c c c c c c
33333 c c c c c c c c c c c c c c
33333 c c c c c c c c c c c c c c
;

data _null_;
    dcl hash hzipcode (ordered:'a');
    hzipcode.definekey ('zipcode', 'state', 'city', 'year',
                        'observedclosings','quarter',
                        'finishedvacant','finishedsupply',  
                        'housinginventory','housingsupply','vdlinventory','vdlsupply',
                        'underconstruction','Modelinventory','avgclosingprice','_n_');
     hzipcode.definedata ('zipcode', 'state', 'city', 'year', 'observedclosings',
/*                          'quarter', finishedvacant','finishedsupply',*/
                          'quarter', 'finishedvacant','finishedsupply',
                          'housinginventory','housingsupply','vdlinventory',
/*                          'housinginventoryy','housingsupply','vdlinventory',*/
                          'vdlsupply','underconstruction','Modelinventory',
                          'avgclosingprice');
     hzipcode.definedone();
 
     do _n_ = 1 by 1 until (last.zipcode);
        set Ivntrysort;
        by zipcode;
        hzipcode.add();
     end;
     hzipcode.output (dataset: 'out' || put(zipcode, best.-L));
run;

 

Art, CEO, AnalystFinder.com

 

Respected Advisor
Posts: 4,173

Re: Create Unique Data Sets

@art297

Oh, I see! I was actually astonished that you post something that's not working.

 

I understand now why the DS interates more than once. The syntax "confused" me and I had direct access using the point= option in mind.

 

And the hash definition only happens during execution time and though the variables already exist in the pdv even though the SET statement comes after the hash declaration.

 

Learned something which I actually knew already - but things got just "re-wired" in my brain. Thanks for that!

 

Super User
Posts: 10,044

Re: Create Unique Data Sets

Patrick,

You forgot  

hzipcode.clear() ;

after output a dataset. 

Occasional Contributor
Posts: 10

Re: Create Unique Data Sets

I added it.  What would be the consequence if I didn't clear?

Trusted Advisor
Posts: 1,022

Re: Create Unique Data Sets

Posted in reply to mgarrison

The "clear" method empties the hash object.  Ordinarilly this would be done to read the hash table for re-use (i.e. for the next5 zip code).  However, while memory would be saved by clearing the hash object, it is NOT reused.  A separate hash object is still generated and populated for each zip code.

 

This is superfluous, and can be fixed by putting the hash object declare statement and associated definekey and definedata methods in an   "if _n_=1 then do; " group.  Then using the clear method will be neccessary, because the hash object will be re-used.

Occasional Contributor
Posts: 10

Re: Create Unique Data Sets

Thanks!

Occasional Contributor
Posts: 10

Re: Create Unique Data Sets

Hi Art,

 

Thanks for the corrections.  My program's code did not have the errors or typos.  My copy/paste/reformat exercise for the SAS post had the errors.  My apologies.

 

Best,

 

Mike

PROC Star
Posts: 7,492

Re: Create Unique Data Sets

Posted in reply to mgarrison

I'm confused. The code I posted ran correctly. If it doesn't work with your actual data, you'd have to post an example of your actual data for which it doesn't run as desired.

 

Art, CEO, AnalystFinder.com

Solution
‎05-30-2017 10:05 AM
Occasional Contributor
Posts: 10

Re: Create Unique Data Sets

[ Edited ]

My format (Best.-L) was numeric and I needed character.  After that correction, worked perfectly.  Thanks for verifying the code was correct (almost).

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 222 views
  • 4 likes
  • 5 in conversation