Can the below code be accomplished using an array? Or is what I have coded the only way to go about this:
data temp; set items; if substr(curr_cat,1,1) = 'A' then prod_cat =substr(curr_cat,1,2); else prod_cat=substr(curr_cat,1,1); retain pcat_aa pcat_ab pcat_ac pcat_ad pcat_ae pcat_af pcat_ag pcat_ah pcat_ai pcat_aj pcat_ak pcat_al pcat_am pcat_an pcat_ao pcat_ap pcat_aq pcat_ar pcat_as pcat_at pcat_au pcat_av pcat_aw pcat_ax pcat_ay pcat_az pcat_b pcat_c pcat_d pcat_e pcat_f pcat_g pcat_h pcat_i pcat_j pcat_k pcat_l pcat_m pcat_n pcat_o pcat_p pcat_q pcat_r pcat_s pcat_t pcat_u pcat_v pcat_w pcat_x pcat_y pcat_z; by cust_edp_id; if first.cust_edp_id then do; pcat_aa = 0; pcat_ab = 0; pcat_ac = 0; pcat_ad = 0; pcat_ae = 0; pcat_af = 0; pcat_ag = 0; pcat_ah = 0; pcat_ai = 0; pcat_aj = 0; pcat_ak = 0; pcat_al = 0; pcat_ao = 0; pcat_ap = 0; pcat_aq = 0; pcat_ar = 0; pcat_as = 0; pcat_at = 0; pcat_au = 0; pcat_av = 0; pcat_aw = 0; pcat_ax = 0; pcat_ay = 0; pcat_az = 0; pcat_b = 0; pcat_c = 0; pcat_d = 0; pcat_e = 0; pcat_f = 0; pcat_g = 0; pcat_h = 0; pcat_i = 0; pcat_j = 0; pcat_k = 0; pcat_l = 0; pcat_m = 0; pcat_n = 0; pcat_o = 0; pcat_p = 0; pcat_q = 0; pcat_r = 0; pcat_s = 0; pcat_t = 0; pcat_u = 0; pcat_v = 0; pcat_w = 0; pcat_x = 0; pcat_y = 0; pcat_z = 0; end; select(prod_cat); when('AA') pcat_aa = 1; when('AB') pcat_ab = 1; when('AC') pcat_ac = 1; when('AD') pcat_ad = 1; when('AE') pcat_ae = 1; when('AF') pcat_af = 1; when('AG') pcat_ag = 1; when('AH') pcat_ah = 1; when('AI') pcat_ai = 1; when('AJ') pcat_aj = 1; when('AK') pcat_ak = 1; when('AL') pcat_al = 1; when('AM') pcat_am = 1; when('AN') pcat_an = 1; when('AO') pcat_ao = 1; when('AP') pcat_ap = 1; when('AQ') pcat_aq = 1; when('AR') pcat_ar = 1; when('AS') pcat_as = 1; when('AT') pcat_at = 1; when('AU') pcat_au = 1; when('AV') pcat_av = 1; when('AW') pcat_aw = 1; when('AX') pcat_ax = 1; when('AY') pcat_ay = 1; when('AZ') pcat_az = 1; when('B') pcat_b = 1; when('C') pcat_c = 1; when('D') pcat_d = 1; when('E') pcat_e = 1; when('F') pcat_f = 1; when('G') pcat_g = 1; when('H') pcat_h = 1; when('I') pcat_i = 1; when('J') pcat_j = 1; when('K') pcat_k = 1; when('L') pcat_l = 1; when('M') pcat_m = 1; when('N') pcat_n = 1; when('O') pcat_o = 1; when('P') pcat_p = 1; when('Q') pcat_q = 1; when('R') pcat_r = 1; when('S') pcat_s = 1; when('T') pcat_t = 1; when('U') pcat_u = 1; when('V') pcat_v = 1; when('W') pcat_w = 1; when('X') pcat_x = 1; when('Y') pcat_y = 1; when('Z') pcat_z = 1; otherwise; end; if last.cust_edp_id then output; run;
Are you trying to create an indicator variable for a categorical variable?
Have you tried any of the automated methods so far?
Here's a link that may be helpful:
How can you tell if a customer purchases the same item if the dataset is reduced to one record per customer?
Is this related to Market Basket Analysis at all?
Are you wishing to know the number of times each of the categories were purchased.
Instead of just flagging the variable with 1, just sum up the occurences.
The WHEN expression would look a little like this ..
when .... pcat_aa+1;
when... pcat_ab+1;
...
when pcat_z+1;
With regards to the suggested TRANPOSE code, I was assuming just a flag was needed, FOUND. To deal with summarization, you need to pre-process the table with PROC MEANS or SQL...
class id prod_cat ; output nobs=FOUND;
or
select id, prod_cat, count(*) as FOUND from temp group by 1,2;
and then transpose.
The advantage of using PROC MEANS, you can use a CLASSDATA table to fully rank all class levels, AA-AZ, B-Z for each ID. In my FORMAT example farther below, you could modify the "LIST" table to be used as a CLASSDATA table. Using a CLASSDATA table you would avoid having to reprocess to change missing values to zeros.
You didn't post a CSV file for input, so I was tasked to writing and posting the code to generate the ITEM table. So when you inserted either of my code approched, you didn't get the results you wanted, hmmm?
Was there an error? Or were the results not what you envisioned?
For those who may have not run the code, I've include the result below.
From the supplied code that created the seeded-randomly generated table ITEM, the final table shown below is the summarization of the ITEM table, where the cust_edp_id's prod_cat are concatenated into the column ALL_CATS. You can compare the ALL_CATS list with the dummy variables on each row. Could you please help me understand how my result differ from the result you desire?
cust_edp_id | all_cats | pcat_AA | pcat_AB | pcat_AC | pcat_AD | pcat_AE | pcat_AF | pcat_AG | pcat_AH | pcat_AI | pcat_AJ | pcat_AK | pcat_AL | pcat_AM | pcat_AN | pcat_AO | pcat_AP | pcat_AQ | pcat_AR | pcat_AS | pcat_AT | pcat_AU | pcat_AV | pcat_AW | pcat_AX | pcat_AY | pcat_AZ | pcat_B | pcat_C | pcat_D | pcat_E | pcat_F | pcat_G | pcat_H | pcat_I | pcat_J | pcat_K | pcat_L | pcat_M | pcat_N | pcat_O | pcat_P | pcat_Q | pcat_R | pcat_S | pcat_T | pcat_U | pcat_V | pcat_W | pcat_X | pcat_Y | pcat_Z |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | AV-V-AG-AH-AX-AD-Q-J | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
2 | AM-C-N-AG-P-J-AV-G | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | AP-AK-AV-AP-AZ-F-AL-AQ-AD-AD-S-C-O-AL-AG-AX-L-N-AE-AL-F | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | X-AN-Y-AB-K-H-AR-AK-AW-AG-E-F-AD-AG-AH-AJ-AI-AB-O-AS | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
5 | F-I-Z-AG-J-AC-L-R | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
6 | T-AL-D-AZ-J-U-K-B-E-AP-AO-AC-AS-AU-S-AZ-T-J-AY | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
7 | G-U-V-Z-AJ-I-W | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 |
8 | J-Q-I-G-G-V-T-AN-Y-C | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
9 | T-AW-AU-S-X-AC-AK-J-Y-S | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 |
10 | AG-AG-L-AG-AV-E-E-K-X-I-AH-R-AR-AS | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
11 | AH-AW-Y-AG-AV-AC-K-AS-AO-C-AA-AQ-F-B-Y-C-AA-Y-Z-AT-C-L-AM | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
12 | C-AT-M-H-AZ-T-AR-Y-Y-B-T-W-AJ-T-AX-AC-Q-Y | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 |
13 | V-J-AK-Z-V-AW-B-W | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 |
14 | Y-AV-AM-AC-L-AU-AB-AR-B-L-R-AX-N-X-Z-AV-X-K-AA-AN | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
15 | Z-AW-AF-AX-AT-AO-V-V-B-AZ-Q-T-AA-M-AP-AY-AR-AO-O-AO-B-Q-AQ | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 |
16 | AE-AS-Y-B-J-AK-AK-AQ-AT-Y | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
17 | N-H-AR-S-O-AJ-K-S-AY-K-AW-Q-AJ-V-F-AE-AZ-AZ-V-AV-AU-AL | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
18 | AB-AF-Z-E-AC-AP-W-I-O-J-V-AG-Q-S-L-R-I-U | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 1 |
19 | AX-AB-R-X-AJ-N-AN-F-AJ | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
20 | H-R-AO-AU-AR-AT-Y-Y-W-X-AD-AF-Y-S-AR-Z | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 |
As you previously posted a PROC TRANSPOSE question, I thought I would give you two examples, using PROC TRANSPOSE and then using ARRAYs. How many rows do you want for each ID? The code that you show appears that you are intend to collapse the multiple ID entries into on row, tranposing the Prod_CAT value into column names. Could you supply some data and how you want the result to appear?
Here is an example with fewer variables because I don't see a need to type 30 to show how. The trick is the two array definitions.
data items; input cust_edp_id prod_cat $; datalines ; 1 AA 1 AB 1 AD 2 AC 2 AD ; run; data temp; set items; by cust_edp_id; array p pcat_aa pcat_ab pcat_ac pcat_ad ; array v {4} $ 2 _temporary_('AA' 'AB' 'AC' 'AD'); /* ITEM value in the temporary array for each variable. The order MUST match*/ retain pcat_aa pcat_ab pcat_ac pcat_ad ; if first.cust_edp_id then do i = 1 to dim(p); p[i]=0; end; p[whichc(prod_cat,of v(*))]=1; if last.cust_edp_id; drop i; RUN;
You could use PROC TRANSPOSE, but it's not as efficient as it does some re-read of the data.
But it might be simple to understand...
data temp;
length prod_Cat $ 10;
set item ;
if substr(curr_cat,1,1) = 'A' then prod_cat =substr(curr_cat,1,2);
else prod_cat=substr(curr_cat,1,1);
prod_cat=cats('pcat_',prod_cat);
found=1;
run;
proc sort data=temp(keep=cust_edp_id prod_cat found) out=sortTemp nodupkey;
by cust_edp_id Prod_cat ;
run;
proc transpose data=sortTemp out=wideTemp ;
by cust_edp_id;
id prod_Cat;
var found;
run;
*Generate a row of zeros;
data row0;
array cvar{51} &name_order (51*0);
output;
run;
data wideTemp_filled;
if 0 then set row0; *sets the variable order;
set widetemp;
array pc{*} $ pcat_: ;
do i=1 to dim(pc);
if pc(i)=. then pc{i}=0;
end; drop i;
run;
I used the following to generate the ITEM data set...
*Generated Random data;
data item;
do cust_edp_id=1 to 20;
_occur=ceil(20*ranuni(6)) + 5;
length curr_Cat $ 2;
do _i=1 to _occur;
_letter= input( put (
ceil(26*ranuni(6))+ 64 , hex2.) ,
$hex2.) ; *Generate a random letter... 65=A 66=B ;
if ranuni(6)<.5 then
curr_cat=cats('A', _letter);
else do;
_letter1= input( put (
ceil(25*ranuni(6))+ 65 , hex2.) ,
$hex2.) ; *Generate a random letter... B-Z;
curr_cat=cats(_letter1,_letter);
end;
output;
end;
end;
drop _:;
run;
And you could use a format to do a fast position lookup in the array.
You can create kind of a vlookup table with the data step and load the letter to position table into a format. AA=1 AB=2... Z=51.
You could build a temporary array full of zero to intialize the FIRST row, or you can do it the lazy way and build a single row table with zero values.
*Generate variable list pcat_AA--pcat_AZ pcat_B-pcat_Z
and a DATA set with with positional lookup values AA=1 AB=2..... Z=51;
data list ;
length name $ 500 start $ 2;
retain fmtname '$findit';
do i =65 to 90 ;
letter= input( put( i, hex2.), $hex2.);
name=catx(' ', name, cats('pcat_A', letter));
n+1;
start=cats('A',letter);
label=put(n,2.);
output;
end;
do i =66 to 90 ;
letter= input(put( i, hex2.), $hex2.);
name=catx(' ',name, cats('pcat_', letter));
n+1;
start=cats(letter);
label=put(n,2.);
output;
end;
put name=;
call symputx('name_order',name);
drop i name letter;
run;
proc format cntlin=list fmtlib;
select $findit;
run;
*Generate a row of zeros;
data row0;
array cvar{51} &name_order (51*0);
output;
run;
data temp2;
set item;
by cust_edp_id;
if substr(curr_cat,1,1) = 'A' then prod_cat =substr(curr_cat,1,2);
else prod_cat=substr(curr_cat,1,1);
length all_cats $ 300; *may contain duplicates;
all_cats=catx('-',all_Cats,prod_Cat); retain all_cats;
if 0 then set row0; *sets the variable order;
array pc{*} &name_order;
if first.cust_edp_id=1 then do;
_row=1;
set row0 point=_row; *initialize to zero;
end;
_pos=input( put(prod_cat,$findit.), 2.) ;
pc{_pos}=1;
if last.cust_edp_id=1 then do;
output;
all_cats=' ';
end;
*drop curr_cat prod_cat;
drop _pos;
run;
Sorry for the long period of silence -EOM processing was taking a front seat this morning. -The code I inserted works fine, but since I will be doing this on a regular basis, on around 50 million obs, I was hoping for something that is more efficient.
I am having trouble attaching a SAS dataset -how can I put a sample of the data here?
The DATA step array code would be the most efficient...as it requires only one pass of the table. The PROC TRANPOSE code require that you dedup the id-curr_cat combination before tranposing (multiple categories create multiple rows for the id).
PROC PRINT is probably the easiest way to create a CSV file...
ods csv file='c:\item_sample.txt' ;
proc print data=item(obs=100) noobs;
var cust_edp_id curr_cat;
*where ranuni(6)>.70 ;
*if necessary, you can add this to get a 30% sample of the first 100, adjust OBS= and percentage accordingly;
run;
ods csv close;
attached is the items data
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.