DATA Step, Macro, Functions and more

help from an array expert

Reply
Contributor
Posts: 43

help from an array expert

Can the below code be accomplished using an array?  Or is what I have coded the only way to go about this:

data temp;
  set items;
  if substr(curr_cat,1,1) = 'A' then prod_cat =substr(curr_cat,1,2);
  else prod_cat=substr(curr_cat,1,1);
  retain pcat_aa pcat_ab pcat_ac pcat_ad pcat_ae pcat_af pcat_ag pcat_ah pcat_ai pcat_aj pcat_ak pcat_al
         pcat_am pcat_an pcat_ao pcat_ap pcat_aq pcat_ar pcat_as pcat_at pcat_au pcat_av pcat_aw pcat_ax
         pcat_ay pcat_az pcat_b  pcat_c  pcat_d  pcat_e  pcat_f  pcat_g  pcat_h  pcat_i  pcat_j  pcat_k
         pcat_l  pcat_m  pcat_n  pcat_o  pcat_p  pcat_q  pcat_r  pcat_s  pcat_t  pcat_u  pcat_v  pcat_w
         pcat_x  pcat_y  pcat_z;
  by cust_edp_id;
  if first.cust_edp_id then do;
    pcat_aa = 0;
    pcat_ab = 0;
    pcat_ac = 0;
    pcat_ad = 0;
    pcat_ae = 0;
    pcat_af = 0;
    pcat_ag = 0;
    pcat_ah = 0;
    pcat_ai = 0;
    pcat_aj = 0;
    pcat_ak = 0;
    pcat_al = 0;
    pcat_ao = 0;
    pcat_ap = 0;
    pcat_aq = 0;
    pcat_ar = 0;
    pcat_as = 0;
    pcat_at = 0;
    pcat_au = 0;
    pcat_av = 0;
    pcat_aw = 0;
    pcat_ax = 0;
    pcat_ay = 0;
    pcat_az = 0;
    pcat_b  = 0;
    pcat_c  = 0;
    pcat_d  = 0;
    pcat_e  = 0;
    pcat_f  = 0;
    pcat_g  = 0;
    pcat_h  = 0;
    pcat_i  = 0;
    pcat_j  = 0;
    pcat_k  = 0;
    pcat_l  = 0;
    pcat_m  = 0;
    pcat_n  = 0;
    pcat_o  = 0;
    pcat_p  = 0;
    pcat_q  = 0;
    pcat_r  = 0;
    pcat_s  = 0;
    pcat_t  = 0;
    pcat_u  = 0;
    pcat_v  = 0;
    pcat_w  = 0;
    pcat_x  = 0;
    pcat_y  = 0;
    pcat_z  = 0;
  end;
  select(prod_cat);
    when('AA') pcat_aa = 1;
    when('AB') pcat_ab = 1;
    when('AC') pcat_ac = 1;
    when('AD') pcat_ad = 1;
    when('AE') pcat_ae = 1;
    when('AF') pcat_af = 1;
    when('AG') pcat_ag = 1;
    when('AH') pcat_ah = 1;
    when('AI') pcat_ai = 1;
    when('AJ') pcat_aj = 1;
    when('AK') pcat_ak = 1;
    when('AL') pcat_al = 1;
    when('AM') pcat_am = 1;
    when('AN') pcat_an = 1;
    when('AO') pcat_ao = 1;
    when('AP') pcat_ap = 1;
    when('AQ') pcat_aq = 1;
    when('AR') pcat_ar = 1;
    when('AS') pcat_as = 1;
    when('AT') pcat_at = 1;
    when('AU') pcat_au = 1;
    when('AV') pcat_av = 1;
    when('AW') pcat_aw = 1;
    when('AX') pcat_ax = 1;
    when('AY') pcat_ay = 1;
    when('AZ') pcat_az = 1;
    when('B') pcat_b = 1;
    when('C') pcat_c = 1;
    when('D') pcat_d = 1;
    when('E') pcat_e = 1;
    when('F') pcat_f = 1;
    when('G') pcat_g = 1;
    when('H') pcat_h = 1;
    when('I') pcat_i = 1;
    when('J') pcat_j = 1;
    when('K') pcat_k = 1;
    when('L') pcat_l = 1;
    when('M') pcat_m = 1;
    when('N') pcat_n = 1;
    when('O') pcat_o = 1;
    when('P') pcat_p = 1;
    when('Q') pcat_q = 1;
    when('R') pcat_r = 1;
    when('S') pcat_s = 1;
    when('T') pcat_t = 1;
    when('U') pcat_u = 1;
    when('V') pcat_v = 1;
    when('W') pcat_w = 1;
    when('X') pcat_x = 1;
    when('Y') pcat_y = 1;
    when('Z') pcat_z = 1;
    otherwise;
  end;
  if last.cust_edp_id then output;
run;
  
Super User
Posts: 19,869

Re: help from an array expert

Are you trying to create an indicator variable for a categorical variable?

Have you tried any of the automated methods so far?

 

Here's a link that may be helpful:

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-dummy-variables-Categorical-Var...

Contributor
Posts: 43

Re: help from an array expert

I am trying to collapse multiple purchase records into a single record per customer reflecting all the various product categories purchased from -then I am going to attempt to cluster them to determine if we have distinct groups of customers that always purchase the same types of items.


Super User
Posts: 19,869

Re: help from an array expert

How can you tell if a customer purchases the same item if the dataset is reduced to one record per customer?

 

Is this related to Market Basket Analysis at all?

SAS Employee
Posts: 27

Re: help from an array expert

Are you wishing to know the number of times each of the categories were purchased.

Instead of just flagging the variable with 1, just sum up the occurences.

The WHEN expression would look a little like this ..

     when .... pcat_aa+1;

     when...  pcat_ab+1;

      ...

     when   pcat_z+1;

 

With regards to the suggested TRANPOSE code, I was assuming just a flag was needed, FOUND.   To deal with summarization, you need to  pre-process the table with PROC MEANS or SQL...  

    class id prod_cat ;  output nobs=FOUND;   

    or  

   select id, prod_cat, count(*) as FOUND from temp group by 1,2;  

and then transpose. 

 

The advantage of using PROC MEANS, you can use a CLASSDATA table to fully rank all class levels, AA-AZ, B-Z for each ID.  In my FORMAT example farther below, you could  modify the "LIST" table to be used as a CLASSDATA table.  Using a CLASSDATA table you would avoid having to reprocess to change missing values to zeros.

Contributor
Posts: 43

Re: help from an array expert

I don't care if they purchased the same item -the goal is to determine which product categories they are purchasing from, so for example customer A might only purchase from category AA and B, but customer B might purchase across all of the groups. Marketing would want to talk to customer A and anyone that looks like him, differently than Customer B.
SAS Employee
Posts: 27

Re: help from an array expert

[ Edited ]

You didn't post a CSV file for input, so I was tasked to writing and posting the code to generate the ITEM table.  So when you inserted either of my code approched, you didn't get the results you wanted, hmmm? 
Was there an error?  Or were the results not what you envisioned?

For those who may have not run the code, I've include the result below. 

 

From the supplied code that created the seeded-randomly generated table ITEM, the final table shown below is the summarization of the  ITEM table, where the cust_edp_id's prod_cat are concatenated into the column ALL_CATS.   You can compare the ALL_CATS list with the dummy variables on each row.  Could you please help me understand how my result differ from the result you desire?

 

 

 

 

cust_edp_id all_cats pcat_AA pcat_AB pcat_AC pcat_AD pcat_AE pcat_AF pcat_AG pcat_AH pcat_AI pcat_AJ pcat_AK pcat_AL pcat_AM pcat_AN pcat_AO pcat_AP pcat_AQ pcat_AR pcat_AS pcat_AT pcat_AU pcat_AV pcat_AW pcat_AX pcat_AY pcat_AZ pcat_B pcat_C pcat_D pcat_E pcat_F pcat_G pcat_H pcat_I pcat_J pcat_K pcat_L pcat_M pcat_N pcat_O pcat_P pcat_Q pcat_R pcat_S pcat_T pcat_U pcat_V pcat_W pcat_X pcat_Y pcat_Z
1 AV-V-AG-AH-AX-AD-Q-J 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0
2 AM-C-N-AG-P-J-AV-G 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0
3 AP-AK-AV-AP-AZ-F-AL-AQ-AD-AD-S-C-O-AL-AG-AX-L-N-AE-AL-F 0 0 0 1 1 0 1 0 0 0 1 1 0 0 0 1 1 0 0 0 0 1 0 1 0 1 0 1 0 0 1 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 0 0 0 0 0
4 X-AN-Y-AB-K-H-AR-AK-AW-AG-E-F-AD-AG-AH-AJ-AI-AB-O-AS 0 1 0 1 0 0 1 1 1 1 1 0 0 1 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0
5 F-I-Z-AG-J-AC-L-R 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1
6 T-AL-D-AZ-J-U-K-B-E-AP-AO-AC-AS-AU-S-AZ-T-J-AY 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 0 1 0 0 0 1 1 1 0 1 1 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0
7 G-U-V-Z-AJ-I-W 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1
8 J-Q-I-G-G-V-T-AN-Y-C 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 0
9 T-AW-AU-S-X-AC-AK-J-Y-S 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 1 1 0
10 AG-AG-L-AG-AV-E-E-K-X-I-AH-R-AR-AS 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0
11 AH-AW-Y-AG-AV-AC-K-AS-AO-C-AA-AQ-F-B-Y-C-AA-Y-Z-AT-C-L-AM 1 0 1 0 0 0 1 1 0 0 0 0 1 0 1 0 1 0 1 1 0 1 1 0 0 0 1 1 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1
12 C-AT-M-H-AZ-T-AR-Y-Y-B-T-W-AJ-T-AX-AC-Q-Y 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 1 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0 1 0
13 V-J-AK-Z-V-AW-B-W 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1
14 Y-AV-AM-AC-L-AU-AB-AR-B-L-R-AX-N-X-Z-AV-X-K-AA-AN 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 1 1 0 1 0 0 1 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 1 0 0 0 0 0 1 1 1
15 Z-AW-AF-AX-AT-AO-V-V-B-AZ-Q-T-AA-M-AP-AY-AR-AO-O-AO-B-Q-AQ 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 1 0 1 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 0 0 0 1
16 AE-AS-Y-B-J-AK-AK-AQ-AT-Y 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
17 N-H-AR-S-O-AJ-K-S-AY-K-AW-Q-AJ-V-F-AE-AZ-AZ-V-AV-AU-AL 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 1 1 1 0 1 1 0 0 0 0 1 0 1 0 0 1 0 0 1 1 0 1 0 1 0 0 1 0 0 0 0
18 AB-AF-Z-E-AC-AP-W-I-O-J-V-AG-Q-S-L-R-I-U 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 1 0 0 1 0 1 1 1 0 1 1 1 0 0 1
19 AX-AB-R-X-AJ-N-AN-F-AJ 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0
20 H-R-AO-AU-AR-AT-Y-Y-W-X-AD-AF-Y-S-AR-Z 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 1 1
SAS Employee
Posts: 27

Re: help from an array expert

As you previously posted a PROC TRANSPOSE question, I thought I would give you two examples, using PROC TRANSPOSE and then using ARRAYs.    How many rows do you want for each ID?   The code that you show appears that you are intend to collapse the multiple ID entries into on row, tranposing the Prod_CAT value into column names.   Could you supply some data and how you want the result to appear?

Super User
Posts: 11,343

Re: help from an array expert

Here is an example with fewer variables because I don't see a need to type 30 to show how. The trick is the two array definitions.

data items;
   input cust_edp_id  prod_cat $;
datalines ;
1 AA
1 AB
1 AD
2 AC
2 AD
;
run;

data temp;
   set items;
   by cust_edp_id;
   array p                    pcat_aa pcat_ab pcat_ac pcat_ad ;
   array v {4} $ 2 _temporary_('AA'   'AB'    'AC'    'AD');  
   /* ITEM value in the temporary array for each variable. The order MUST match*/
   retain pcat_aa pcat_ab pcat_ac pcat_ad ;
   if first.cust_edp_id then do i = 1 to dim(p);
      p[i]=0;
   end;
   p[whichc(prod_cat,of v(*))]=1;
   if last.cust_edp_id;
   drop i;
RUN;
SAS Employee
Posts: 27

Re: help from an array expert

You could use PROC TRANSPOSE, but it's not as efficient as it does some re-read of the data. 

But it might be simple to understand...

 

data temp; 
  length prod_Cat $ 10; 
  set item ; 
  if substr(curr_cat,1,1) = 'A' then prod_cat =substr(curr_cat,1,2);
  else prod_cat=substr(curr_cat,1,1);
  prod_cat=cats('pcat_',prod_cat); 
  found=1; 
run; 


proc sort data=temp(keep=cust_edp_id prod_cat found) out=sortTemp nodupkey;
  by cust_edp_id Prod_cat ; 
run;

proc transpose data=sortTemp out=wideTemp ;
	by cust_edp_id;
	id prod_Cat;
	var found;
run;  


*Generate a row of zeros; 
data row0; 
  array cvar{51} &name_order (51*0); 
  output;
run; 

data wideTemp_filled;
   if 0 then set row0;  *sets the variable order; 
   set widetemp;
   array pc{*} $ pcat_: ; 
   do i=1 to dim(pc); 
      if pc(i)=. then pc{i}=0; 
   end; drop i;
run;
SAS Employee
Posts: 27

Re: help from an array expert

Posted in reply to RogerSpeas

I used the following to generate the ITEM data set...

 

*Generated Random data;   
data item; 
  do cust_edp_id=1 to 20; 
     _occur=ceil(20*ranuni(6)) + 5; 
	 
	 length curr_Cat $ 2; 
     do _i=1 to _occur; 
	    _letter= input( put (
                   ceil(26*ranuni(6))+ 64 ,  hex2.) ,
				    $hex2.) ;  *Generate a random letter... 65=A 66=B ;     
	    if ranuni(6)<.5 then           
		     curr_cat=cats('A', _letter);     
        else do;
		     _letter1= input( put (
                   ceil(25*ranuni(6))+ 65 ,  hex2.) , 
				    $hex2.) ;  *Generate a random letter... B-Z; 
             curr_cat=cats(_letter1,_letter);
             end; 
		output;
      end;
   end;
  drop _:; 
run; 
SAS Employee
Posts: 27

Re: help from an array expert

And you could use a format to do a fast position lookup in the array.

You can create kind of a vlookup table with the data step and load the letter to position table into a format.  AA=1 AB=2... Z=51.

You could build a temporary array full of zero to intialize the FIRST row, or you can do it the lazy way and build a single row table with zero values.

 

 


*Generate variable list pcat_AA--pcat_AZ pcat_B-pcat_Z
  and a DATA set with with positional lookup values AA=1 AB=2..... Z=51; 
data list ; 
   length name $ 500 start $ 2;
   retain fmtname '$findit'; 
   do i =65 to 90 ; 
     letter= input( put( i, hex2.), $hex2.); 
	 name=catx(' ', name, cats('pcat_A', letter));  
	 n+1;
	 start=cats('A',letter); 
	 label=put(n,2.);
	 output; 
   end; 
   do i =66 to 90 ; 
     letter= input(put( i, hex2.), $hex2.); 
	 name=catx(' ',name, cats('pcat_', letter));  
	 n+1;
	 start=cats(letter); 
     label=put(n,2.);
	 output; 
   end; 
   put name=; 
   call symputx('name_order',name);
   drop i name letter; 
run; 

proc format cntlin=list fmtlib;
  select $findit; 
run; 

*Generate a row of zeros; 
data row0; 
  array cvar{51} &name_order (51*0); 
  output;
run; 

data temp2;

   set item;
   by cust_edp_id;
   if substr(curr_cat,1,1) = 'A' then prod_cat =substr(curr_cat,1,2);
   else prod_cat=substr(curr_cat,1,1);
   
  
   length all_cats $ 300;  *may contain duplicates; 
   all_cats=catx('-',all_Cats,prod_Cat);  retain all_cats; 

   if 0 then set row0;  *sets the variable order; 
   array pc{*} &name_order; 

   
   if first.cust_edp_id=1 then do;
      _row=1; 
      set row0 point=_row;  *initialize to zero; 
   end; 

   _pos=input(   put(prod_cat,$findit.), 2.)  ;
   pc{_pos}=1; 

   if last.cust_edp_id=1 then do; 
      output; 
	  all_cats=' '; 
   end;
   *drop curr_cat prod_cat;
   drop _pos; 
run; 
Contributor
Posts: 43

Re: help from an array expert

Posted in reply to RogerSpeas

Sorry for the long period of silence -EOM processing was taking a front seat this morning.   -The code I inserted works fine, but since I will be doing this on a regular basis, on around 50 million obs, I was hoping for something that is more efficient. 

 

I am having trouble attaching a SAS dataset -how can I put a sample of the data here?

SAS Employee
Posts: 27

Re: help from an array expert

The DATA step array code would be the most efficient...as it requires only one pass of the table.   The PROC TRANPOSE code require that you dedup the id-curr_cat combination before tranposing (multiple categories create multiple rows for the id). 

 

PROC PRINT is probably the easiest way to create a CSV file...

 

ods csv file='c:\item_sample.txt' ;

proc print data=item(obs=100) noobs;

    var cust_edp_id curr_cat;

    *where ranuni(6)>.70 ;  

        *if necessary, you can add this to get a 30% sample of the first 100, adjust OBS= and percentage accordingly;

run;

ods csv close;

Contributor
Posts: 43

Re: help from an array expert

Posted in reply to RogerSpeas

attached is the items data

Ask a Question
Discussion stats
  • 15 replies
  • 243 views
  • 2 likes
  • 4 in conversation