DATA Step, Macro, Functions and more

Finding Like Values and Assigning them to a DATASET

Accepted Solution Solved
Reply
Contributor
Posts: 59
Accepted Solution

Finding Like Values and Assigning them to a DATASET

What is the easiest way to parse a data set and create multiple data sets based on a given value?

 

Example: 

Let’s use telephone numbers and look for all values found in PHONENUMBER beginning with

631 and put them in data set A

516 and put them in data set B

212 and put them in data set C

 

Data:

631 555 5555

631 666 6666

516 999 9999

516 888 8888

212 444 4444

212 444 5555

 

Problem: 

You don’t know the first 3 values – If I did – I would write a substring/do loop to read the first 3 of the field “PHONENUMBER” and tell it to read through the table and when it finds a matching value “substr (phonenumber,1,3) write it out to the associated table.

 

What function would you use to search the field PHONENUMBER and look for all like values from a specific position and place those values in a separate data set?

 

This is without defining the values to look for since the list of area codes is quite large.

 

All help is greatly appreciated – I haven’t been in the SAS world for over 2 years so I’m a little rusty – Thanks in advance

 

 


Accepted Solutions
Solution
‎09-12-2017 11:47 AM
Regular Contributor
Posts: 164

Re: Finding Like Values and Assigning them to a DATASET

Posted in reply to Dsrountree

 

Hi,

 

I won't use hash for this, since the assignment is not known. This is not a classical look up situation.

Instead I would just program where to output the rows and use a simple call execute to perform the job:

 

data test;
infile datalines ;
input phonenumber $1-20 @;
datalines;
631 555 5555
631 666 6666
516 999 9999
516 888 8888
212 444 4444
212 444 5555
;
run;

data test1;
   set test;
   first3=substr(phonenumber,1,3);
run;
proc sort data=test1;
   by first3 phonenumber;
run;

data test2;
   set test1;
   length ds $10;
   by first3;
   retain count .;
   if first.first3 then count+1;
   if count>26 then ds=repeat('a',int(count/26));
   ds=strip(ds)||byte(96+count);  
run;

data _NULL_;
   set test2;
   by first3;
   if first.first3;
   call execute('data '||strip(ds)||'(drop=first3 count ds); set test2; where ds eq "'||strip(ds)||'"; run;');
run;


 

 

________________________

- Cheers -

View solution in original post


All Replies
Super User
Posts: 6,935

Re: Finding Like Values and Assigning them to a DATASET

Posted in reply to Dsrountree

The most elegant way would be to use hashing.  Since I'm out of my element there, here's a link to a similar question ... refer to the final suggested solution.

 

https://communities.sas.com/t5/Base-SAS-Programming/create-a-new-Dataset-for-every-unique-Country-an...

 

Your situation may be mildly more complex, since the area code itself would not be a valid name for a SAS data set.

 

It could be done without hashing, using macro language to extract all the area codes, place a set of names on the DATA statement, and construct a series of IF/THEN statements to control outputting. 

Contributor
Posts: 59

Re: Finding Like Values and Assigning them to a DATASET

Posted in reply to Astounding

If I was only looking at a distinct value like zipcode then I could use code like that.

The problem is I need to evalute the value based on a given position and separate accordingly.

 

Any suggestions...

Super User
Posts: 6,935

Re: Finding Like Values and Assigning them to a DATASET

Posted in reply to Dsrountree

You'll have to cross that bridge, no matter what the approach.  It might be easier to plan:

 

if the phone number begins with 631, put it into data set named _631

if the phone number begins with 516, put it into data set named _516

 

So if using hashing, you might have to construct the key to the hash table, concatenating an underscore and the area code (i think again out of my element here).  But if you were using macro language, you would need to do something similar to get the names of the data sets, for example:

 

proc sql noprint;

select distinct '_' || substr(phone, 1, 3) into : data_set_list separated by ' ' from have;

quit;

 

Then the macro language approach would use:

 

data &data_set_list;

 

 

Solution
‎09-12-2017 11:47 AM
Regular Contributor
Posts: 164

Re: Finding Like Values and Assigning them to a DATASET

Posted in reply to Dsrountree

 

Hi,

 

I won't use hash for this, since the assignment is not known. This is not a classical look up situation.

Instead I would just program where to output the rows and use a simple call execute to perform the job:

 

data test;
infile datalines ;
input phonenumber $1-20 @;
datalines;
631 555 5555
631 666 6666
516 999 9999
516 888 8888
212 444 4444
212 444 5555
;
run;

data test1;
   set test;
   first3=substr(phonenumber,1,3);
run;
proc sort data=test1;
   by first3 phonenumber;
run;

data test2;
   set test1;
   length ds $10;
   by first3;
   retain count .;
   if first.first3 then count+1;
   if count>26 then ds=repeat('a',int(count/26));
   ds=strip(ds)||byte(96+count);  
run;

data _NULL_;
   set test2;
   by first3;
   if first.first3;
   call execute('data '||strip(ds)||'(drop=first3 count ds); set test2; where ds eq "'||strip(ds)||'"; run;');
run;


 

 

________________________

- Cheers -

Contributor
Posts: 59

Re: Finding Like Values and Assigning them to a DATASET

Nicely done....

But quick question on data _NULL_;

 

If I have several other columns,

IE: First Name, Last Name, Address.

 

Will that information also carry over to table A, B, C?

Regular Contributor
Posts: 164

Re: Finding Like Values and Assigning them to a DATASET

Posted in reply to Dsrountree

sure, it's just droping the unwanted variables, the other variables remain where they are.

________________________

- Cheers -

Super User
Posts: 6,935

Re: Finding Like Values and Assigning them to a DATASET

Posted in reply to Dsrountree

Looks like nobody is stepping up with a hashing solution.  Here's how I would approach a CALL EXECUTE solution:

 

proc freq data=have;

   tables phone_number / noprint out=counts (keep=phone_number);

   format phone_number $3.;

run;

 

data _null_;

   length phone_number $ 3;

   call execute ('data ');

   do until (done1);

      set counts end=done1;

      call execute('_' || phone_number);

   end;

   call execute('; set have; select (substr(phone_number, 1, 3));');

   do until (done2);

      set counts end=done2;

      call execute('when ("' || phone_number || '") output _' || phone_number || ';');

   end;

   call execute('end; run;');

   stop;

run;

 

It's untested, but looks like a viable approach.

PROC Star
Posts: 1,472

Re: Finding Like Values and Assigning them to a DATASET

Posted in reply to Dsrountree

I like the hash approach for dynamic splitting, see e.g. Dorfman et al

http://www.lexjansen.com/nesug/nesug09/hw/HW04.pdf.

 

Below uses a single hash table and a DoW loop, so data must be sorted by the key. The paper also describes a Hash-of-Hashes approach if the data can't be sorted.

 

Process below is:

  1. Assign a key (first three digits of phone number or whatever you want to split by).
  2. Sort by the key.
  3. Use a DoW loop to read all data for a by-group, loading it into a hash table.
  4. Output the hash table to a dataset.
  5. Clear the hash table.
  6. Goto #3

 

data have;
  infile datalines ;
  input phonenumber $1-20 ;

  *define a key;
  mykey=substr(phonenumber,1,3);

datalines;
631 555 5555
631 666 6666
516 999 9999
516 888 8888
212 444 4444
212 444 5555
111 111 1111
999 999 9999
;
run;

proc sort data=have;
  by mykey;
run;

data _null_ ;
  if 0 then set have;
  if _n_=1 then do;
    declare hash h();
    h.definekey ("_n_") ; 
    h.definedata ("phonenumber" ) ;
    h.definedone () ;
  end;
  do _n_ = 1 by 1 until ( last.mykey ) ;
    set have ;
    by mykey ;
    h.add() ;
  end ;
  h.output (dataset: cats("OUT_",mykey) ) ;
  h.clear() ;
run ;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 231 views
  • 3 likes
  • 4 in conversation