BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
vrushankshah
Fluorite | Level 6

I have 4 columns: Column1-It has all the clicks made on the website 2) Unique ObjectId for each user,3rd column-time of the click and 4th column-group which indicates the pattern of clicks followed by the user. If the pattern starts with the click a,c,e then it comes under different group. I want to find if all the words of a group are part of the pattern followed by a different user or no? If it is a part of string then return the objectid , group and flag it as repeated. For example:

Notetext  ObjectId  group  CreatedOn
a            123      1       3:22pm
b            123      1       3:23pm
c            124      2       4:00pm
d            124      2       4:02pm
a            124      3       4:03pm
b            124      3       4:04pm
e            125      4       10:00am
f            125      4       10:02am
g            125      4       10:07am
c            126      5       11:06am
t            126      5       11:07am
l            126      5       11:09am
c            127      6       11:13am
d            127      6       11:15am
a            127      7       12:09pm
b            127      7       12:09pm
s            127      7       12:11pm

OUTPUT EXPECTED:
NoteText ObjectId group Part_of_string a 123 1 N b 123 1 N c 124 2 N d 124 2 N a 124 3 Y b 124 3 Y e 125 4 N f 125 4 N g 125 4 N c 126 5 N t 126 5 N l 126 5 N c 127 6 Y d 127 6 Y a 127 7 N b 127 7 N s 127 7 N

In the above example a,b was the part of the pattern followed by user 124 hence it should be flagged Y but 'a,b,s' is not repeated hence it is flagged N.

 

EDIT:

I want this final thing it will be great if someone can help me . The main thing is I don't want that if the pattern is the only thing present in the objectId then don't count it only count it if it is the PART of another big string under a objectId.

 

I want this final thing it will be great if you can help me .

NoteText    ObjectId   Group       

Inquiry      1           1            

Benefits     2           2            

Verify       2           2            

Claims       2           3            

Type         2           3            

Inquiry      2           4             

Benefits     3           5            

Verify       3           5           

Benefits     4           6            

Expected Output:
Notetext Count

Benefits,verify 1
Inquiry 1
Claims,Type 1
Benefits 0

 

 

For example in the above example in the above case Benefits,verify should be counted only once since it was the PART of objectID '2'.

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

It works fine for me. What are you doing wrong?

data HAVE(index=(GROUP));
  input (NOTETEXT  OBJECTID  GROUP  ) ($);
  ROW=_N_;
cards;
a 1 1
b 1 1
d 1 2
f 1 3
g 1 4
t 2 5
y 2 5
f 3 6
a 4 7
b 4 7
d 4 8
f 5 9
r 6 10
s 6 10
t 6 10
a 6 11
b 6 11
s 6 11
q 7 12
w 7 12
e 7 12
r 7 12
d 7 13
f 7 14
g 7 15
run;
data PATTERNS;
  retain PATTERN ;
  length PATTERN $16;
  set HAVE;
  by GROUP;
  if first.GROUP then PATTERN=' ';
  PATTERN=cats(PATTERN,NOTETEXT);
  if last.GROUP then output;
  keep GROUP PATTERN;
run;
proc sql;
  create table WANT as
  select trans.*
       , case when COUNT>1 then 'Y' else 'N' end as REPEATED
       , pattern.PATTERN
       , PATTERN_NO
      , COUNT
  from HAVE                                      trans
      ,PATTERNS                                  pattern
      ,(select PATTERN
             , count(*)    as COUNT
             , monotonic() as PATTERN_NO
        from PATTERNS 
        group by PATTERN)                        count
  where trans.GROUP     = pattern.GROUP
    and pattern.PATTERN = count.PATTERN
  order by ROW;
quit;
NOTETEXT OBJECTID GROUP ROW REPEATED PATTERN PATTERN_NO COUNT
a 1 1 1 Y ab 1 2
b 1 1 2 Y ab 1 2
d 1 2 3 Y d 5 3
f 1 3 4 Y f 15 4
g 1 4 5 Y g 7 2
t 2 5 6 N ty 11 1
y 2 5 7 N ty 11 1
f 3 6 8 Y f 15 4
a 4 7 9 Y ab 1 2
b 4 7 10 Y ab 1 2
d 4 8 11 Y d 5 3
f 5 9 12 Y f 15 4
r 6 10 13 N rst 2 1
s 6 10 14 N rst 2 1
t 6 10 15 N rst 2 1
a 6 11 16 N abs 3 1
b 6 11 17 N abs 3 1
s 6 11 18 N abs 3 1
q 7 12 19 N qwer 4 1
w 7 12 20 N qwer 4 1
e 7 12 21 N qwer 4 1
r 7 12 22 N qwer 4 1
d 7 13 23 Y d 5 3
f 7 14 24 Y f 15 4
g 7 15 25 Y g 7 2

 

 

View solution in original post

7 REPLIES 7
ChrisNZ
Tourmaline | Level 20

Like this?

 

data HAVE;
  input (NOTETEXT  OBJECTID  GROUP  CREATEDON) ($);
  ROW=_N_;
cards;
a            123      1       3:22pm
b            123      1       3:23pm
c            124      2       4:00pm
d            124      2       4:02pm
a            124      3       4:03pm
b            124      3       4:04pm
e            125      4       10:00am
f            125      4       10:02am
g            125      4       10:07am
c            126      5       11:06am
t            126      5       11:07am
l            126      5       11:09am
c            127      6       11:13am
d            127      6       11:15am
a            127      7       12:09pm
b            127      7       12:09pm
s            127      7       12:11pm
run;
data PATTERNS;
  retain PATTERN ;
  length PATTERN $16;
  set HAVE;
  by GROUP;
  if first.GROUP then PATTERN=' ';
  PATTERN=cats(PATTERN,NOTETEXT);
  if last.GROUP then output;
  keep GROUP PATTERN;
run;
proc sql;
  create table WANT as
  select trans.*
       , case when COUNT>1 then 'Y' else 'N' end as REPEATED
       , COUNT
  from HAVE                                                               trans
      ,PATTERNS                                                           pattern
      ,(select PATTERN, count(*) as COUNT from PATTERNS group by PATTERN) count
  where trans.GROUP     = pattern.group
    and pattern.PATTERN = count.PATTERN
  order by ROW;
quit;

 

NOTETEXT OBJECTID GROUP CREATEDON ROW REPEATED COUNT
a 123 1 3:22pm 1 Y 2
b 123 1 3:23pm 2 Y 2
c 124 2 4:00pm 3 Y 2
d 124 2 4:02pm 4 Y 2
a 124 3 4:03pm 5 Y 2
b 124 3 4:04pm 6 Y 2
e 125 4 10:00am 7 N 1
f 125 4 10:02am 8 N 1
g 125 4 10:07am 9 N 1
c 126 5 11:06am 10 N 1
t 126 5 11:07am 11 N 1
l 126 5 11:09am 12 N 1
c 127 6 11:13am 13 Y 2
d 127 6 11:15am 14 Y 2
a 127 7 12:09pm 15 N 1
b 127 7 12:09pm 16 N 1
s 127 7 12:11pm 17 N 1
vrushankshah
Fluorite | Level 6
ya I wanted something like this but when I am trying on other data your code is not working properly.
Data used to test:
Column1 Objectid Group
a 1 1
b 1 1
d 1 2
f 1 3
g 1 4
t 2 5
y 2 5
f 3 6
a 4 7
b 4 7
d 4 8
f 5 9
r 6 10
s 6 10
t 6 10
a 6 11
b 6 11
s 6 11
q 7 12
w 7 12
e 7 12
r 7 12
d 7 13
f 7 14
g 7 15
ChrisNZ
Tourmaline | Level 20
"Not working" is very vague. What didn't work?
vrushankshah
Fluorite | Level 6
Hey sorry for not explaining properly! I think the logic is going a bit wrong I am not getting correct count and it flags Y to most of the pattern which it shouldn’t
ChrisNZ
Tourmaline | Level 20

I don't see how.

All the groups with a Y have repeated patterns.

proc sql;
  create table WANT as
  select trans.*
       , case when COUNT>1 then 'Y' else 'N' end as REPEATED
       , COUNT
       , PATTERN_NO
  from HAVE                                      trans
      ,PATTERNS                                  pattern
      ,(select PATTERN
             , count(*)    as COUNT
             , monotonic() as PATTERN_NO
        from PATTERNS 
        group by PATTERN)                        count
  where trans.GROUP     = pattern.group
    and pattern.PATTERN = count.PATTERN
  order by ROW;
quit;

 

NOTETEXT OBJECTID GROUP CREATEDON ROW REPEATED COUNT PATTERN_NO
a 123 1 3:22pm 1 Y 2 1
b 123 1 3:23pm 2 Y 2 1
c 124 2 4:00pm 3 Y 2 6
d 124 2 4:02pm 4 Y 2 6
a 124 3 4:03pm 5 Y 2 1
b 124 3 4:04pm 6 Y 2 1
e 125 4 10:00am 7 N 1 4
f 125 4 10:02am 8 N 1 4
g 125 4 10:07am 9 N 1 4
c 126 5 11:06am 10 N 1 5
t 126 5 11:07am 11 N 1 5
l 126 5 11:09am 12 N 1 5
c 127 6 11:13am 13 Y 2 6
d 127 6 11:15am 14 Y 2 6
a 127 7 12:09pm 15 N 1 7
b 127 7 12:09pm 16 N 1 7
s 127 7 12:11pm 17 N 1 7

 

 

vrushankshah
Fluorite | Level 6
I was just testing the code on a new data and it didnt worked for that :
New test data :
Column1 Objectid Group
a 1 1
b 1 1
d 1 2
f 1 3
g 1 4
t 2 5
y 2 5
f 3 6
a 4 7
b 4 7
d 4 8
f 5 9
r 6 10
s 6 10
t 6 10
a 6 11
b 6 11
s 6 11
q 7 12
w 7 12
e 7 12
r 7 12
d 7 13
f 7 14
g 7 15
ChrisNZ
Tourmaline | Level 20

It works fine for me. What are you doing wrong?

data HAVE(index=(GROUP));
  input (NOTETEXT  OBJECTID  GROUP  ) ($);
  ROW=_N_;
cards;
a 1 1
b 1 1
d 1 2
f 1 3
g 1 4
t 2 5
y 2 5
f 3 6
a 4 7
b 4 7
d 4 8
f 5 9
r 6 10
s 6 10
t 6 10
a 6 11
b 6 11
s 6 11
q 7 12
w 7 12
e 7 12
r 7 12
d 7 13
f 7 14
g 7 15
run;
data PATTERNS;
  retain PATTERN ;
  length PATTERN $16;
  set HAVE;
  by GROUP;
  if first.GROUP then PATTERN=' ';
  PATTERN=cats(PATTERN,NOTETEXT);
  if last.GROUP then output;
  keep GROUP PATTERN;
run;
proc sql;
  create table WANT as
  select trans.*
       , case when COUNT>1 then 'Y' else 'N' end as REPEATED
       , pattern.PATTERN
       , PATTERN_NO
      , COUNT
  from HAVE                                      trans
      ,PATTERNS                                  pattern
      ,(select PATTERN
             , count(*)    as COUNT
             , monotonic() as PATTERN_NO
        from PATTERNS 
        group by PATTERN)                        count
  where trans.GROUP     = pattern.GROUP
    and pattern.PATTERN = count.PATTERN
  order by ROW;
quit;
NOTETEXT OBJECTID GROUP ROW REPEATED PATTERN PATTERN_NO COUNT
a 1 1 1 Y ab 1 2
b 1 1 2 Y ab 1 2
d 1 2 3 Y d 5 3
f 1 3 4 Y f 15 4
g 1 4 5 Y g 7 2
t 2 5 6 N ty 11 1
y 2 5 7 N ty 11 1
f 3 6 8 Y f 15 4
a 4 7 9 Y ab 1 2
b 4 7 10 Y ab 1 2
d 4 8 11 Y d 5 3
f 5 9 12 Y f 15 4
r 6 10 13 N rst 2 1
s 6 10 14 N rst 2 1
t 6 10 15 N rst 2 1
a 6 11 16 N abs 3 1
b 6 11 17 N abs 3 1
s 6 11 18 N abs 3 1
q 7 12 19 N qwer 4 1
w 7 12 20 N qwer 4 1
e 7 12 21 N qwer 4 1
r 7 12 22 N qwer 4 1
d 7 13 23 Y d 5 3
f 7 14 24 Y f 15 4
g 7 15 25 Y g 7 2

 

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1160 views
  • 1 like
  • 2 in conversation