BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
tediest
Calcite | Level 5

Sorry for the obscure title.

I have a data like this:

person timebrand
1100A
1200A
1300B
1400C
2100A
2200B
2300A
2400B

I would like to create a new variable for each observation, representing the number of times the corresponding "brand" has occurred before for the given person. So the values of the new variable should be: 0, 1, 0, 0, 0, 0, 1, 1. As the data are pretty big, it is inefficient to do this via loops.

Would someone please help me with this? Thanks a lot in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
Linlin
Lapis Lazuli | Level 10

Hi,

The code below works for the sample data you provided, but I am not sure it will work for all your data. :smileysilly:

data x;

input person brand $char20.;

datalines;

1 KELLOGG

1 POST

1 KASHI GO

1 KASHI GO

2 KELLOGG

2 G MILLS

2 G MILLS

2 G MILLS

2 G MILLS

2 G MILLS

2 G MILLS

2 KELLOGG

2 KELLOGG

2 KELLOGG

;

run;

data y (drop=all);

length all $100;

do until(last.person);

   set x;

   by person;

   howmany = count(all,compress(brand));

   all = catx(', ',all,compress(brand));

   output;

end;

run;

proc print;run;

                   obs    person    brand       howmany

                         1       1      KELLOGG        0

                         2       1      POST           0

                         3       1      KASHI GO       0

                         4       1      KASHI GO       1

                         5       2      KELLOGG        0

                         6       2      G MILLS        0

                         7       2      G MILLS        1

                         8       2      G MILLS        2

                         9       2      G MILLS        3

                        10       2      G MILLS        4

                        11       2      G MILLS        5

                        12       2      KELLOGG        1

                        13       2      KELLOGG        2

                        14       2      KELLOGG        3

View solution in original post

11 REPLIES 11
MikeZdeb
Rhodochrosite | Level 12

hi ... one way ... don't worry about the LOOP since all it does is read the data set one person at a time ...

data x;

input person time brand :$1. @@;

datalines;

1 100 A 1 200 A 1 300 B 1 400 C

2 100 A 2 200 B 2 300 A 2 400 B

;

data y (drop=all);

length all $100;

do until(last.person);

   set x;

   by person;

   howmany = countc(all,brand);

   all = catt(all,brand);

   output;

end;

run;

person    time    brand    howmany

   1       100      A         0

   1       200      A         1

   1       300      B         0

   1       400      C         0

   2       100      A         0

   2       200      B         0

   2       300      A         1

   2       400      B         1

tediest
Calcite | Level 5

Thanks a lot Mike. It is pretty efficient Smiley Happy

However there is one problem if there are multiple words contained in "brand". I changed "countc" to "count". But sometimes the "catt" function would connect two words as one, and "count" would not be able to identify the correct number of times the same brand has occurred before. Are there ways to avoid this problem? Thanks!

Linlin
Lapis Lazuli | Level 10

try this:

data x;

input person time brand :$2. @@;

datalines;

1 100 AB 1 200 AB 1 300 B 1 400 C

2 100 A 2 200 BB 2 300 AB 2 400 BB

;

data y (drop=all) ;

length all $100;

do until(last.person);

   set x;

   by person;

   howmany = count(all,brand);

   all = catx(', ',all,brand,'s');

   output;

end;

run;

proc print;run;

tediest
Calcite | Level 5

Thanks Linlin.

I tried this but there still appear to be problems with some observations. For instance I have one line with: all="KELLOGG, s, G MILLS, s, G MILLS, s, G MILLS, s, G MILLS, s"  and brand="G MILLS". However, count(all,brand) returns 0. It is really odd. Can this be because of the specific string? 

Linlin
Lapis Lazuli | Level 10

post a sample of your real data.

tediest
Calcite | Level 5

I extracted a small sample with useful variables (person, brand) only.

data x;

input person brand $char20.;

datalines;

1 KELLOGG

1 POST

1 KASHI GO

1 KASHI GO

2 KELLOGG

2 G MILLS

2 G MILLS

2 G MILLS

2 G MILLS

2 G MILLS

2 G MILLS

2 KELLOGG

2 KELLOGG

2 KELLOGG

;

run;

Thanks a lot for your help.

Linlin
Lapis Lazuli | Level 10

Hi,

The code below works for the sample data you provided, but I am not sure it will work for all your data. :smileysilly:

data x;

input person brand $char20.;

datalines;

1 KELLOGG

1 POST

1 KASHI GO

1 KASHI GO

2 KELLOGG

2 G MILLS

2 G MILLS

2 G MILLS

2 G MILLS

2 G MILLS

2 G MILLS

2 KELLOGG

2 KELLOGG

2 KELLOGG

;

run;

data y (drop=all);

length all $100;

do until(last.person);

   set x;

   by person;

   howmany = count(all,compress(brand));

   all = catx(', ',all,compress(brand));

   output;

end;

run;

proc print;run;

                   obs    person    brand       howmany

                         1       1      KELLOGG        0

                         2       1      POST           0

                         3       1      KASHI GO       0

                         4       1      KASHI GO       1

                         5       2      KELLOGG        0

                         6       2      G MILLS        0

                         7       2      G MILLS        1

                         8       2      G MILLS        2

                         9       2      G MILLS        3

                        10       2      G MILLS        4

                        11       2      G MILLS        5

                        12       2      KELLOGG        1

                        13       2      KELLOGG        2

                        14       2      KELLOGG        3

tediest
Calcite | Level 5

It works great!  "compress" did the magical work.

Thanks Linlin Smiley Happy

Haikuo
Onyx | Level 15

I really like the LinLin and Mike's classic data step approach. I guess you could also use array() basing on the similar logic. Here is to show Hash() can also have the job done:

data x;

input person brand $char20.;

datalines;

1 KELLOGG

1 POST

1 KASHI GO

1 KASHI GO

2 KELLOGG

2 G MILLS

2 G MILLS

2 G MILLS

2 G MILLS

2 G MILLS

2 G MILLS

2 KELLOGG

2 KELLOGG

2 KELLOGG

;

run;

data want;

  dcl hash h(multidata:'y');

  h.definekey('brand');

  h.definedone();

  do until (last.person);

     set x;

       by person;

       _f=h.find();

       howmany=0;

       if _f=0 then

       do _f=0 by 0 while (_f=0);

          howmany+1;

          _f=h.find_next();

        end;

        _a=h.add();

        output;

    end;

    drop _:;

    run;

    proc print;run;

  

Haikuo

Ksharp
Super User

Bian Haikuo,

You can do it better.

data x;
input person brand $char20.;
datalines;
1 KELLOGG
1 POST
1 KASHI GO
1 KASHI GO
2 KELLOGG
2 G MILLS
2 G MILLS
2 G MILLS
2 G MILLS
2 G MILLS
2 G MILLS
2 KELLOGG
2 KELLOGG
2 KELLOGG
;
run;
data want;
if _n_ eq 1 then do;
 if 0 then set x;
 declare hash ha();
  ha.definekey('brand');
  ha.definedata('count');
  ha.definedone();
end;
 set x;
 by person;
 if first.person then ha.clear();
 if ha.find(key:brand)=0 then do;count+1;ha.replace();end;
  else do;count=0;ha.replace();end;
run;


Ksharp

Haikuo
Onyx | Level 15

It is better Smiley Happy. Thanks for sharing, Keshan.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 2428 views
  • 9 likes
  • 5 in conversation