BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mgarrison
Fluorite | Level 6

Hello, 

 

The code below creates a different data set for each zipcode in my primary data set.  However, it overwrites/deletes all of the datasets except for the one created during its last iteration.  I would like for SAS to generate a unique data set for each different zipcode and save them all.  Any thoughts???

 

data _null_;

    dcl hash hzipcode (ordered:'a');

    hzipcode.definekey ('zipcode', 'state', 'city', 'year',

                        'observedclosings','quarter',

                        'finishedvacant','finishedsupply',  

                        'housinginventory','housingsupply','vdlinventory','vdlsupply',

                        'underconstruction','Modelinventory','avgclosingprice','_n_');

     hzipcode.definedata ('zipcode', 'state', 'city', 'year', 'observedclosings',

                          'quarter'finishedvacant','finishedsupply',

                          'housinginventoryy','housingsupply','vdlinventory',

                          'vdlsupply','underconstruction','Modelinventory',

                          'avgclosingprice');

     hzipcode.definedone();

 

     do _n_ = 1 by 1 until (last.zipcode);

        set SASUSER.Ivntrysort;

        by zipcode;

        hzipcode.add();

     end;

     hzipcode.output (dataset: 'out' || put(zipcode, best.-L));

run;

 

Mike

1 ACCEPTED SOLUTION

Accepted Solutions
mgarrison
Fluorite | Level 6

My format (Best.-L) was numeric and I needed character.  After that correction, worked perfectly.  Thanks for verifying the code was correct (almost).

View solution in original post

11 REPLIES 11
art297
Opal | Level 21

I'm surprised it would run without aborting due to critical errors. There were two typos in your code. They are corrected in the following:

data _null_;
    dcl hash hzipcode (ordered:'a');
    hzipcode.definekey ('zipcode', 'state', 'city', 'year',
                        'observedclosings','quarter',
                        'finishedvacant','finishedsupply',  
                        'housinginventory','housingsupply','vdlinventory','vdlsupply',
                        'underconstruction','Modelinventory','avgclosingprice','_n_');
     hzipcode.definedata ('zipcode', 'state', 'city', 'year', 'observedclosings',
/*                          'quarter', finishedvacant','finishedsupply',*/
                          'quarter', 'finishedvacant','finishedsupply',
                          'housinginventory','housingsupply','vdlinventory',
/*                          'housinginventoryy','housingsupply','vdlinventory',*/
                          'vdlsupply','underconstruction','Modelinventory',
                          'avgclosingprice');
     hzipcode.definedone();
 
     do _n_ = 1 by 1 until (last.zipcode);
        set sasuser.Ivntrysort;
        by zipcode;
        hzipcode.add();
     end;
     hzipcode.output (dataset: 'out' || put(zipcode, best.-L));
run;

Art, CEO, AnalystFinder.com

Patrick
Opal | Level 21

@art297

I believe even after your fixes the code wouldn't run because the hash gets defined before any variable definitions.

The data step would also iterate only once and though only output a table for the very first zipcode.

 

Below (untested) code should do what the OP is asking for:

data _null_;
  set sasuser.Ivntrysort;
  by zipcode;

  if _n_=1 then
    do;
      length n_obs 8;
      dcl hash hzipcode (ordered:'a');
      hzipcode.definekey ('zipcode','n_obs');
      hzipcode.definedata ('zipcode', 'state', 'city', 'year', 'observedclosings',
        /*                          'quarter', finishedvacant','finishedsupply',*/
        'quarter', 'finishedvacant','finishedsupply',
        'housinginventory','housingsupply','vdlinventory',
        /*                          'housinginventoryy','housingsupply','vdlinventory',*/
        'vdlsupply','underconstruction','Modelinventory',
        'avgclosingprice');
      hzipcode.definedone();
    end;

  n_obs+1;
  _rc=hzipcode.add();

  if last.zipcode then
    do;
      hzipcode.output (dataset: 'out' || put(zipcode, best.-L));
      n_obs=0;
    end;
run;

Like usual one is tempted to ask the question:

Why to split up the data into multiple tables when SAS offers by group processing?

 

art297
Opal | Level 21

@Patrick: I tested my suggested changes to the code on a file that contained three sets of zip codes. The code created all three files. Here is the code I ran (including a test dataset):

data Ivntrysort;
  input zipcode (state city year observedclosings quarter
         finishedvacant finishedsupply housinginventory housingsupply
         vdlinventory vdlsupply underconstruction Modelinventory
         avgclosingprice) ($);
  cards;
11111 a a a a a a a a a a a a a a
11111 a a a a a a a a a a a a a a
11111 a a a a a a a a a a a a a a
22222 b b b b b b b b b b b b b b
22222 b b b b b b b b b b b b b b
22222 b b b b b b b b b b b b b b
33333 c c c c c c c c c c c c c c
33333 c c c c c c c c c c c c c c
33333 c c c c c c c c c c c c c c
;

data _null_;
    dcl hash hzipcode (ordered:'a');
    hzipcode.definekey ('zipcode', 'state', 'city', 'year',
                        'observedclosings','quarter',
                        'finishedvacant','finishedsupply',  
                        'housinginventory','housingsupply','vdlinventory','vdlsupply',
                        'underconstruction','Modelinventory','avgclosingprice','_n_');
     hzipcode.definedata ('zipcode', 'state', 'city', 'year', 'observedclosings',
/*                          'quarter', finishedvacant','finishedsupply',*/
                          'quarter', 'finishedvacant','finishedsupply',
                          'housinginventory','housingsupply','vdlinventory',
/*                          'housinginventoryy','housingsupply','vdlinventory',*/
                          'vdlsupply','underconstruction','Modelinventory',
                          'avgclosingprice');
     hzipcode.definedone();
 
     do _n_ = 1 by 1 until (last.zipcode);
        set Ivntrysort;
        by zipcode;
        hzipcode.add();
     end;
     hzipcode.output (dataset: 'out' || put(zipcode, best.-L));
run;

 

Art, CEO, AnalystFinder.com

 

Patrick
Opal | Level 21

@art297

Oh, I see! I was actually astonished that you post something that's not working.

 

I understand now why the DS interates more than once. The syntax "confused" me and I had direct access using the point= option in mind.

 

And the hash definition only happens during execution time and though the variables already exist in the pdv even though the SET statement comes after the hash declaration.

 

Learned something which I actually knew already - but things got just "re-wired" in my brain. Thanks for that!

 

Ksharp
Super User

Patrick,

You forgot  

hzipcode.clear() ;

after output a dataset. 

mgarrison
Fluorite | Level 6

I added it.  What would be the consequence if I didn't clear?

mkeintz
PROC Star

The "clear" method empties the hash object.  Ordinarilly this would be done to read the hash table for re-use (i.e. for the next5 zip code).  However, while memory would be saved by clearing the hash object, it is NOT reused.  A separate hash object is still generated and populated for each zip code.

 

This is superfluous, and can be fixed by putting the hash object declare statement and associated definekey and definedata methods in an   "if _n_=1 then do; " group.  Then using the clear method will be neccessary, because the hash object will be re-used.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mgarrison
Fluorite | Level 6

Thanks!

mgarrison
Fluorite | Level 6

Hi Art,

 

Thanks for the corrections.  My program's code did not have the errors or typos.  My copy/paste/reformat exercise for the SAS post had the errors.  My apologies.

 

Best,

 

Mike

art297
Opal | Level 21

I'm confused. The code I posted ran correctly. If it doesn't work with your actual data, you'd have to post an example of your actual data for which it doesn't run as desired.

 

Art, CEO, AnalystFinder.com

mgarrison
Fluorite | Level 6

My format (Best.-L) was numeric and I needed character.  After that correction, worked perfectly.  Thanks for verifying the code was correct (almost).

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1277 views
  • 4 likes
  • 5 in conversation