BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
vrushankshah
Fluorite | Level 6

I have 4 columns: Column1-It has all the clicks made on the website 2) Unique ObjectId for each user,3rd column-time of the click and 4th column-group which indicates the pattern of clicks followed by the user. If the pattern starts with the click a,c,e then it comes under different group. I want to find if all the words of a group are part of the pattern followed by a different user or no? If it is a part of string then return the objectid , group and flag it as repeated. For example:

Notetext  ObjectId  group  CreatedOn
a            123      1       3:22pm
b            123      1       3:23pm
c            124      2       4:00pm
d            124      2       4:02pm
a            124      3       4:03pm
b            124      3       4:04pm
e            125      4       10:00am
f            125      4       10:02am
g            125      4       10:07am
c            126      5       11:06am
t            126      5       11:07am
l            126      5       11:09am
c            127      6       11:13am
d            127      6       11:15am
a            127      7       12:09pm
b            127      7       12:09pm
s            127      7       12:11pm

OUTPUT EXPECTED:
NoteText ObjectId group Part_of_string a 123 1 N b 123 1 N c 124 2 N d 124 2 N a 124 3 Y b 124 3 Y e 125 4 N f 125 4 N g 125 4 N c 126 5 N t 126 5 N l 126 5 N c 127 6 Y d 127 6 Y a 127 7 N b 127 7 N s 127 7 N

In the above example a,b was the part of the pattern followed by user 124 hence it should be flagged Y but 'a,b,s' is not repeated hence it is flagged N.

 

EDIT:

I want this final thing it will be great if someone can help me . The main thing is I don't want that if the pattern is the only thing present in the objectId then don't count it only count it if it is the PART of another big string under a objectId.

 

I want this final thing it will be great if you can help me .

NoteText    ObjectId   Group       

Inquiry      1           1            

Benefits     2           2            

Verify       2           2            

Claims       2           3            

Type         2           3            

Inquiry      2           4             

Benefits     3           5            

Verify       3           5           

Benefits     4           6            

Expected Output:
Notetext Count

Benefits,verify 1
Inquiry 1
Claims,Type 1
Benefits 0

 

 

For example in the above example in the above case Benefits,verify should be counted only once since it was the PART of objectID '2'.

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

It works fine for me. What are you doing wrong?

data HAVE(index=(GROUP));
  input (NOTETEXT  OBJECTID  GROUP  ) ($);
  ROW=_N_;
cards;
a 1 1
b 1 1
d 1 2
f 1 3
g 1 4
t 2 5
y 2 5
f 3 6
a 4 7
b 4 7
d 4 8
f 5 9
r 6 10
s 6 10
t 6 10
a 6 11
b 6 11
s 6 11
q 7 12
w 7 12
e 7 12
r 7 12
d 7 13
f 7 14
g 7 15
run;
data PATTERNS;
  retain PATTERN ;
  length PATTERN $16;
  set HAVE;
  by GROUP;
  if first.GROUP then PATTERN=' ';
  PATTERN=cats(PATTERN,NOTETEXT);
  if last.GROUP then output;
  keep GROUP PATTERN;
run;
proc sql;
  create table WANT as
  select trans.*
       , case when COUNT>1 then 'Y' else 'N' end as REPEATED
       , pattern.PATTERN
       , PATTERN_NO
      , COUNT
  from HAVE                                      trans
      ,PATTERNS                                  pattern
      ,(select PATTERN
             , count(*)    as COUNT
             , monotonic() as PATTERN_NO
        from PATTERNS 
        group by PATTERN)                        count
  where trans.GROUP     = pattern.GROUP
    and pattern.PATTERN = count.PATTERN
  order by ROW;
quit;
NOTETEXT OBJECTID GROUP ROW REPEATED PATTERN PATTERN_NO COUNT
a 1 1 1 Y ab 1 2
b 1 1 2 Y ab 1 2
d 1 2 3 Y d 5 3
f 1 3 4 Y f 15 4
g 1 4 5 Y g 7 2
t 2 5 6 N ty 11 1
y 2 5 7 N ty 11 1
f 3 6 8 Y f 15 4
a 4 7 9 Y ab 1 2
b 4 7 10 Y ab 1 2
d 4 8 11 Y d 5 3
f 5 9 12 Y f 15 4
r 6 10 13 N rst 2 1
s 6 10 14 N rst 2 1
t 6 10 15 N rst 2 1
a 6 11 16 N abs 3 1
b 6 11 17 N abs 3 1
s 6 11 18 N abs 3 1
q 7 12 19 N qwer 4 1
w 7 12 20 N qwer 4 1
e 7 12 21 N qwer 4 1
r 7 12 22 N qwer 4 1
d 7 13 23 Y d 5 3
f 7 14 24 Y f 15 4
g 7 15 25 Y g 7 2

 

 

View solution in original post

7 REPLIES 7
ChrisNZ
Tourmaline | Level 20

Like this?

 

data HAVE;
  input (NOTETEXT  OBJECTID  GROUP  CREATEDON) ($);
  ROW=_N_;
cards;
a            123      1       3:22pm
b            123      1       3:23pm
c            124      2       4:00pm
d            124      2       4:02pm
a            124      3       4:03pm
b            124      3       4:04pm
e            125      4       10:00am
f            125      4       10:02am
g            125      4       10:07am
c            126      5       11:06am
t            126      5       11:07am
l            126      5       11:09am
c            127      6       11:13am
d            127      6       11:15am
a            127      7       12:09pm
b            127      7       12:09pm
s            127      7       12:11pm
run;
data PATTERNS;
  retain PATTERN ;
  length PATTERN $16;
  set HAVE;
  by GROUP;
  if first.GROUP then PATTERN=' ';
  PATTERN=cats(PATTERN,NOTETEXT);
  if last.GROUP then output;
  keep GROUP PATTERN;
run;
proc sql;
  create table WANT as
  select trans.*
       , case when COUNT>1 then 'Y' else 'N' end as REPEATED
       , COUNT
  from HAVE                                                               trans
      ,PATTERNS                                                           pattern
      ,(select PATTERN, count(*) as COUNT from PATTERNS group by PATTERN) count
  where trans.GROUP     = pattern.group
    and pattern.PATTERN = count.PATTERN
  order by ROW;
quit;

 

NOTETEXT OBJECTID GROUP CREATEDON ROW REPEATED COUNT
a 123 1 3:22pm 1 Y 2
b 123 1 3:23pm 2 Y 2
c 124 2 4:00pm 3 Y 2
d 124 2 4:02pm 4 Y 2
a 124 3 4:03pm 5 Y 2
b 124 3 4:04pm 6 Y 2
e 125 4 10:00am 7 N 1
f 125 4 10:02am 8 N 1
g 125 4 10:07am 9 N 1
c 126 5 11:06am 10 N 1
t 126 5 11:07am 11 N 1
l 126 5 11:09am 12 N 1
c 127 6 11:13am 13 Y 2
d 127 6 11:15am 14 Y 2
a 127 7 12:09pm 15 N 1
b 127 7 12:09pm 16 N 1
s 127 7 12:11pm 17 N 1
vrushankshah
Fluorite | Level 6
ya I wanted something like this but when I am trying on other data your code is not working properly.
Data used to test:
Column1 Objectid Group
a 1 1
b 1 1
d 1 2
f 1 3
g 1 4
t 2 5
y 2 5
f 3 6
a 4 7
b 4 7
d 4 8
f 5 9
r 6 10
s 6 10
t 6 10
a 6 11
b 6 11
s 6 11
q 7 12
w 7 12
e 7 12
r 7 12
d 7 13
f 7 14
g 7 15
ChrisNZ
Tourmaline | Level 20
"Not working" is very vague. What didn't work?
vrushankshah
Fluorite | Level 6
Hey sorry for not explaining properly! I think the logic is going a bit wrong I am not getting correct count and it flags Y to most of the pattern which it shouldn’t
ChrisNZ
Tourmaline | Level 20

I don't see how.

All the groups with a Y have repeated patterns.

proc sql;
  create table WANT as
  select trans.*
       , case when COUNT>1 then 'Y' else 'N' end as REPEATED
       , COUNT
       , PATTERN_NO
  from HAVE                                      trans
      ,PATTERNS                                  pattern
      ,(select PATTERN
             , count(*)    as COUNT
             , monotonic() as PATTERN_NO
        from PATTERNS 
        group by PATTERN)                        count
  where trans.GROUP     = pattern.group
    and pattern.PATTERN = count.PATTERN
  order by ROW;
quit;

 

NOTETEXT OBJECTID GROUP CREATEDON ROW REPEATED COUNT PATTERN_NO
a 123 1 3:22pm 1 Y 2 1
b 123 1 3:23pm 2 Y 2 1
c 124 2 4:00pm 3 Y 2 6
d 124 2 4:02pm 4 Y 2 6
a 124 3 4:03pm 5 Y 2 1
b 124 3 4:04pm 6 Y 2 1
e 125 4 10:00am 7 N 1 4
f 125 4 10:02am 8 N 1 4
g 125 4 10:07am 9 N 1 4
c 126 5 11:06am 10 N 1 5
t 126 5 11:07am 11 N 1 5
l 126 5 11:09am 12 N 1 5
c 127 6 11:13am 13 Y 2 6
d 127 6 11:15am 14 Y 2 6
a 127 7 12:09pm 15 N 1 7
b 127 7 12:09pm 16 N 1 7
s 127 7 12:11pm 17 N 1 7

 

 

vrushankshah
Fluorite | Level 6
I was just testing the code on a new data and it didnt worked for that :
New test data :
Column1 Objectid Group
a 1 1
b 1 1
d 1 2
f 1 3
g 1 4
t 2 5
y 2 5
f 3 6
a 4 7
b 4 7
d 4 8
f 5 9
r 6 10
s 6 10
t 6 10
a 6 11
b 6 11
s 6 11
q 7 12
w 7 12
e 7 12
r 7 12
d 7 13
f 7 14
g 7 15
ChrisNZ
Tourmaline | Level 20

It works fine for me. What are you doing wrong?

data HAVE(index=(GROUP));
  input (NOTETEXT  OBJECTID  GROUP  ) ($);
  ROW=_N_;
cards;
a 1 1
b 1 1
d 1 2
f 1 3
g 1 4
t 2 5
y 2 5
f 3 6
a 4 7
b 4 7
d 4 8
f 5 9
r 6 10
s 6 10
t 6 10
a 6 11
b 6 11
s 6 11
q 7 12
w 7 12
e 7 12
r 7 12
d 7 13
f 7 14
g 7 15
run;
data PATTERNS;
  retain PATTERN ;
  length PATTERN $16;
  set HAVE;
  by GROUP;
  if first.GROUP then PATTERN=' ';
  PATTERN=cats(PATTERN,NOTETEXT);
  if last.GROUP then output;
  keep GROUP PATTERN;
run;
proc sql;
  create table WANT as
  select trans.*
       , case when COUNT>1 then 'Y' else 'N' end as REPEATED
       , pattern.PATTERN
       , PATTERN_NO
      , COUNT
  from HAVE                                      trans
      ,PATTERNS                                  pattern
      ,(select PATTERN
             , count(*)    as COUNT
             , monotonic() as PATTERN_NO
        from PATTERNS 
        group by PATTERN)                        count
  where trans.GROUP     = pattern.GROUP
    and pattern.PATTERN = count.PATTERN
  order by ROW;
quit;
NOTETEXT OBJECTID GROUP ROW REPEATED PATTERN PATTERN_NO COUNT
a 1 1 1 Y ab 1 2
b 1 1 2 Y ab 1 2
d 1 2 3 Y d 5 3
f 1 3 4 Y f 15 4
g 1 4 5 Y g 7 2
t 2 5 6 N ty 11 1
y 2 5 7 N ty 11 1
f 3 6 8 Y f 15 4
a 4 7 9 Y ab 1 2
b 4 7 10 Y ab 1 2
d 4 8 11 Y d 5 3
f 5 9 12 Y f 15 4
r 6 10 13 N rst 2 1
s 6 10 14 N rst 2 1
t 6 10 15 N rst 2 1
a 6 11 16 N abs 3 1
b 6 11 17 N abs 3 1
s 6 11 18 N abs 3 1
q 7 12 19 N qwer 4 1
w 7 12 20 N qwer 4 1
e 7 12 21 N qwer 4 1
r 7 12 22 N qwer 4 1
d 7 13 23 Y d 5 3
f 7 14 24 Y f 15 4
g 7 15 25 Y g 7 2

 

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 685 views
  • 1 like
  • 2 in conversation