BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
tapas_16880
Calcite | Level 5

Hi All, I am stuck with a SAS file and not sure how to progress.

The layout is -

Cust NoProdAcc flagBal
111a1qaz

10

111b1wsx11
222c1wsx233
333d1grw251
444e1dc25
444f1vefw1231
444g1wefr36
555h1bdfgh615
666i1qe767
777j1erg31
777k1eth472
777l1ret251
777m1dfwef3761
888p1ffef36

I want to introduce two columns in this sas file. One lets say prod_con and other can be Acc_con.

It will look for the cust_no values in this above file. If it finds distinct values then it will simply copy the values of the column Prod and Acc Flag to the new columns prod_con and Acc_con respectively. But whenever it will find duplicates, it should concatenate the values in column Prod and Acc Flag and then copy them into the new columns prod_con and Acc_con respectively.

E.g. for Cust_no = 777, it should put j1k1l1m1 to prod_con and ergethretdfwef to Acc_con and this value will be repeated off course for all 777.

1 ACCEPTED SOLUTION

Accepted Solutions
ErikT
Obsidian | Level 7

How about this:

Create an index on Cust_no on the file

Then create an extra file of unique customers numbers

Read the file with the unique customers numbers and for each one go to the original file using the KEY option in the SET statement to retrieve and concatenate the Prod and Acc_flag variables. The _iorc_=0 and _error_=0 at the end of the step are to prevent error messages in the log.

data original(index=(Cust_no));

input Cust_no Prod $ ACC_flag $ Bal $ ;

datalines;

111    a1    qaz    10

111    b1    wsx    11

222    c1    wsx    233

333    d1    grw    251

444    e1    dc    25

444    f1    vefw    1231

444    g1    wefr    36

555    h1    bdfgh    615

666    i1    qe    767

777    j1    erg    31

777    k1    eth    472

777    l1    ret    251

777    m1    dfwef    3761

888    p1    ffef    36

run;

proc sort data=original(keep=cust_no) out=customers nodupkey;

by Cust_no;

run;

data new;

set customers;

length Prod_con Acc_con $50;

do until (_iorc_ ne 0);

   set original key = Cust_no;

   if _iorc_ = 0 then do;

     Prod_con = CATX('|',Prod_Con,Prod);

     Acc_con = CATX('|',Acc_con,Acc_flag);

   end;

end;

output;

_iorc_ = 0; _error_=0;

run;

proc print;

run;

View solution in original post

11 REPLIES 11
kevin_123
Calcite | Level 5

Hi Tapas.  I recommend doing two passes if you do know how many duplicates you can encounter.  First to create your string and the second pass to add it back to the original records

data inds; 
input   custno prod $ acc_flag $ bal ;
datalines; 111 a1 qaz 10
111 b1 wsx 11
222 c1 wsx 233
333 d1 grw 251
444 e1 dc 25
444 f1 vefw 1231
444 g1 wefr 36
555 h1 bdfgh 615
666 i1 qe 767
777 j1 erg 31
777 k1 eth 472
777 l1 ret 251
777 m1 dfwef
3761
888 p1 ffef 36 ;
run;
proc sort data=inds; by custno prod; run;
data tmp_ds (keep=custno prod_con acc_con);
set inds;
by custno;
length prod_con acc_con $50.;
retain prod_con acc_con;
if first.custno then do;
prod_con =""; Acc_con = "";
end;
prod_con = trim(prod_con) || trim(prod);
acc_con = trim(acc_con) || trim(acc_flag);
if last.custno then output;
run;
data final_ds;
merge inds 
   tmp_ds;
by custno;
run;
Keith
Obsidian | Level 7

@kevin_123,

That's pretty much how I would do it.  One simplification you could make is to use the CALL CATS function in place of

     Varname=TRIM(....) || TRIM(...).

So the 2 lines would be :

call cats(prod_con,prod);

call cats(acc_con,acc_flag);

PaigeMiller
Diamond | Level 26

Here's how I would approach this problem

  1. Sort by CustNo
  2. Proc Transpose variable Prod using BY variable CustNo
  3. Proc Transpose variable AccFlag using BY variable CustNo
  4. merge outputs of above 2 steps BY variable CustNo
  5. concatenate as appropriate to produce the two new columns
--
Paige Miller
Haikuo
Onyx | Level 15

Hi, From what I can see it is a typical 2XDOW scenario, if the group variable 'custno' is clustered, then no need to sort:

data have;

input   (custno prod  acc_flag) (:$) bal ;

datalines;

111 a1 qaz 10

111 b1 wsx 11

222 c1 wsx 233

333 d1 grw 251

444 e1 dc 25

444 f1 vefw 1231

444 g1 wefr 36

555 h1 bdfgh 615

666 i1 qe 767

777 j1 erg 31

777 k1 eth 472

777 l1 ret 251

777 m1 dfwef  3761

888 p1 ffef 36

;

data want;

   do until (last.custno);

      set have;

      by custno notsorted;

      length prod_con acc_con $50.;

      prod_con=cats(prod_con,prod);

      acc_con=cats(acc_con,acc_flag);

    end;

   do until (last.custno);

      set have;

      by custno notsorted;

      output;

    end;

run;

proc print;run;

Haikuo

Haikuo
Onyx | Level 15

I am also wondering if there is way to use cats() +group by in proc sql in a summary function kind of way, seems neat to me.

For this question, there is another data step way, originally from Mike Zdeb (I think):

data have;

input   (custno prod  acc_flag) (:$) bal ;

datalines;

111 a1 qaz 10

111 b1 wsx 11

222 c1 wsx 233

333 d1 grw 251

444 e1 dc 25

444 f1 vefw 1231

444 g1 wefr 36

555 h1 bdfgh 615

666 i1 qe 767

777 j1 erg 31

777 k1 eth 472

777 l1 ret 251

777 m1 dfwef  3761

888 p1 ffef 36

;

data want;

      set have (in=up) have;

      by custno;

      length prod_con acc_con $50.;

      retain prod_con acc_con;

      if first.custno then call missing (prod_con,acc_con);

      if up then do;

              prod_con=cats(prod_con,prod);

              acc_con=cats(acc_con,acc_flag);

      end;

      else output;

run;

proc print;run;

Haikuo

PGStats
Opal | Level 21

Hi Haikuo Bian, combining your two proposals might be optimal :

data want;

do until (last.custno);

     set have (in=up) have;

     by custno;  /* NOTSORTED cannot be used here */

     length prod_con acc_con $50.;

     if up then do;

          prod_con = cats(prod_con, prod);

          acc_con = cats(acc_con, acc_flag);

          end;

     else output;

     end;

run;

PG

notsorted removed by PG.

PG
Haikuo
Onyx | Level 15

Yes, PG. That is a good combination, Thanks!

Haikuo

Astounding
PROC Star

Questions, questions, questions ...

For a single customer, could you ever have two observations with the same Product?  With the same Account flag?

The answers so far might be assuming uniqueness.  That might be correct, but the question should at least be asked.

If duplicates are possible, should the same Product appear twice in the new prod_con variable?

The programming won't change drastically ... it's just a matter of tweaking it properly.

Possibly related:  Could a product or an account flag ever contain an embedded blank?

ErikT
Obsidian | Level 7

How about this:

Create an index on Cust_no on the file

Then create an extra file of unique customers numbers

Read the file with the unique customers numbers and for each one go to the original file using the KEY option in the SET statement to retrieve and concatenate the Prod and Acc_flag variables. The _iorc_=0 and _error_=0 at the end of the step are to prevent error messages in the log.

data original(index=(Cust_no));

input Cust_no Prod $ ACC_flag $ Bal $ ;

datalines;

111    a1    qaz    10

111    b1    wsx    11

222    c1    wsx    233

333    d1    grw    251

444    e1    dc    25

444    f1    vefw    1231

444    g1    wefr    36

555    h1    bdfgh    615

666    i1    qe    767

777    j1    erg    31

777    k1    eth    472

777    l1    ret    251

777    m1    dfwef    3761

888    p1    ffef    36

run;

proc sort data=original(keep=cust_no) out=customers nodupkey;

by Cust_no;

run;

data new;

set customers;

length Prod_con Acc_con $50;

do until (_iorc_ ne 0);

   set original key = Cust_no;

   if _iorc_ = 0 then do;

     Prod_con = CATX('|',Prod_Con,Prod);

     Acc_con = CATX('|',Acc_con,Acc_flag);

   end;

end;

output;

_iorc_ = 0; _error_=0;

run;

proc print;

run;

flyerrhappy2015
Calcite | Level 5

Is there any way to do this in proc sql?

I need to calcualte 40+ column for sum and mean. SQL is easier to calculate sum and mean, but how can I concatenate row values across each ID: E.g. for Cust_no = 777, it should put j1k1l1m1 to prod_con and get the total balance for each SSN in proc sql

Ksharp
Super User

Yes. There is one way. But you need to list all of these unique value as columns , You wouldn't want to that . That is horrible .

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 17519 views
  • 1 like
  • 10 in conversation