- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have 4 columns: Column1-It has all the clicks made on the website 2) Unique ObjectId for each user,3rd column-time of the click and 4th column-group which indicates the pattern of clicks followed by the user. If the pattern starts with the click a,c,e then it comes under different group. I want to find if all the words of a group are part of the pattern followed by a different user or no? If it is a part of string then return the objectid , group and flag it as repeated. For example:
Notetext ObjectId group CreatedOn
a 123 1 3:22pm
b 123 1 3:23pm
c 124 2 4:00pm
d 124 2 4:02pm
a 124 3 4:03pm
b 124 3 4:04pm
e 125 4 10:00am
f 125 4 10:02am
g 125 4 10:07am
c 126 5 11:06am
t 126 5 11:07am
l 126 5 11:09am
c 127 6 11:13am
d 127 6 11:15am
a 127 7 12:09pm
b 127 7 12:09pm
s 127 7 12:11pmOUTPUT EXPECTED:
NoteText ObjectId group Part_of_string
a 123 1 N
b 123 1 N
c 124 2 N
d 124 2 N
a 124 3 Y
b 124 3 Y
e 125 4 N
f 125 4 N
g 125 4 N
c 126 5 N
t 126 5 N
l 126 5 N
c 127 6 Y
d 127 6 Y
a 127 7 N
b 127 7 N
s 127 7 N
In the above example a,b was the part of the pattern followed by user 124 hence it should be flagged Y but 'a,b,s' is not repeated hence it is flagged N.
EDIT:
I want this final thing it will be great if someone can help me . The main thing is I don't want that if the pattern is the only thing present in the objectId then don't count it only count it if it is the PART of another big string under a objectId.
I want this final thing it will be great if you can help me . NoteText ObjectId Group Inquiry 1 1 Benefits 2 2 Verify 2 2 Claims 2 3 Type 2 3 Inquiry 2 4 Benefits 3 5 Verify 3 5 Benefits 4 6
Expected Output:
Notetext Count
Benefits,verify 1
Inquiry 1
Claims,Type 1
Benefits 0
For example in the above example in the above case Benefits,verify should be counted only once since it was the PART of objectID '2'.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It works fine for me. What are you doing wrong?
data HAVE(index=(GROUP));
input (NOTETEXT OBJECTID GROUP ) ($);
ROW=_N_;
cards;
a 1 1
b 1 1
d 1 2
f 1 3
g 1 4
t 2 5
y 2 5
f 3 6
a 4 7
b 4 7
d 4 8
f 5 9
r 6 10
s 6 10
t 6 10
a 6 11
b 6 11
s 6 11
q 7 12
w 7 12
e 7 12
r 7 12
d 7 13
f 7 14
g 7 15
run;
data PATTERNS;
retain PATTERN ;
length PATTERN $16;
set HAVE;
by GROUP;
if first.GROUP then PATTERN=' ';
PATTERN=cats(PATTERN,NOTETEXT);
if last.GROUP then output;
keep GROUP PATTERN;
run;
proc sql;
create table WANT as
select trans.*
, case when COUNT>1 then 'Y' else 'N' end as REPEATED
, pattern.PATTERN
, PATTERN_NO
, COUNT
from HAVE trans
,PATTERNS pattern
,(select PATTERN
, count(*) as COUNT
, monotonic() as PATTERN_NO
from PATTERNS
group by PATTERN) count
where trans.GROUP = pattern.GROUP
and pattern.PATTERN = count.PATTERN
order by ROW;
quit;
NOTETEXT | OBJECTID | GROUP | ROW | REPEATED | PATTERN | PATTERN_NO | COUNT |
a | 1 | 1 | 1 | Y | ab | 1 | 2 |
b | 1 | 1 | 2 | Y | ab | 1 | 2 |
d | 1 | 2 | 3 | Y | d | 5 | 3 |
f | 1 | 3 | 4 | Y | f | 15 | 4 |
g | 1 | 4 | 5 | Y | g | 7 | 2 |
t | 2 | 5 | 6 | N | ty | 11 | 1 |
y | 2 | 5 | 7 | N | ty | 11 | 1 |
f | 3 | 6 | 8 | Y | f | 15 | 4 |
a | 4 | 7 | 9 | Y | ab | 1 | 2 |
b | 4 | 7 | 10 | Y | ab | 1 | 2 |
d | 4 | 8 | 11 | Y | d | 5 | 3 |
f | 5 | 9 | 12 | Y | f | 15 | 4 |
r | 6 | 10 | 13 | N | rst | 2 | 1 |
s | 6 | 10 | 14 | N | rst | 2 | 1 |
t | 6 | 10 | 15 | N | rst | 2 | 1 |
a | 6 | 11 | 16 | N | abs | 3 | 1 |
b | 6 | 11 | 17 | N | abs | 3 | 1 |
s | 6 | 11 | 18 | N | abs | 3 | 1 |
q | 7 | 12 | 19 | N | qwer | 4 | 1 |
w | 7 | 12 | 20 | N | qwer | 4 | 1 |
e | 7 | 12 | 21 | N | qwer | 4 | 1 |
r | 7 | 12 | 22 | N | qwer | 4 | 1 |
d | 7 | 13 | 23 | Y | d | 5 | 3 |
f | 7 | 14 | 24 | Y | f | 15 | 4 |
g | 7 | 15 | 25 | Y | g | 7 | 2 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Like this?
data HAVE;
input (NOTETEXT OBJECTID GROUP CREATEDON) ($);
ROW=_N_;
cards;
a 123 1 3:22pm
b 123 1 3:23pm
c 124 2 4:00pm
d 124 2 4:02pm
a 124 3 4:03pm
b 124 3 4:04pm
e 125 4 10:00am
f 125 4 10:02am
g 125 4 10:07am
c 126 5 11:06am
t 126 5 11:07am
l 126 5 11:09am
c 127 6 11:13am
d 127 6 11:15am
a 127 7 12:09pm
b 127 7 12:09pm
s 127 7 12:11pm
run;
data PATTERNS;
retain PATTERN ;
length PATTERN $16;
set HAVE;
by GROUP;
if first.GROUP then PATTERN=' ';
PATTERN=cats(PATTERN,NOTETEXT);
if last.GROUP then output;
keep GROUP PATTERN;
run;
proc sql;
create table WANT as
select trans.*
, case when COUNT>1 then 'Y' else 'N' end as REPEATED
, COUNT
from HAVE trans
,PATTERNS pattern
,(select PATTERN, count(*) as COUNT from PATTERNS group by PATTERN) count
where trans.GROUP = pattern.group
and pattern.PATTERN = count.PATTERN
order by ROW;
quit;
NOTETEXT | OBJECTID | GROUP | CREATEDON | ROW | REPEATED | COUNT |
a | 123 | 1 | 3:22pm | 1 | Y | 2 |
b | 123 | 1 | 3:23pm | 2 | Y | 2 |
c | 124 | 2 | 4:00pm | 3 | Y | 2 |
d | 124 | 2 | 4:02pm | 4 | Y | 2 |
a | 124 | 3 | 4:03pm | 5 | Y | 2 |
b | 124 | 3 | 4:04pm | 6 | Y | 2 |
e | 125 | 4 | 10:00am | 7 | N | 1 |
f | 125 | 4 | 10:02am | 8 | N | 1 |
g | 125 | 4 | 10:07am | 9 | N | 1 |
c | 126 | 5 | 11:06am | 10 | N | 1 |
t | 126 | 5 | 11:07am | 11 | N | 1 |
l | 126 | 5 | 11:09am | 12 | N | 1 |
c | 127 | 6 | 11:13am | 13 | Y | 2 |
d | 127 | 6 | 11:15am | 14 | Y | 2 |
a | 127 | 7 | 12:09pm | 15 | N | 1 |
b | 127 | 7 | 12:09pm | 16 | N | 1 |
s | 127 | 7 | 12:11pm | 17 | N | 1 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Data used to test:
Column1 Objectid Group
a 1 1
b 1 1
d 1 2
f 1 3
g 1 4
t 2 5
y 2 5
f 3 6
a 4 7
b 4 7
d 4 8
f 5 9
r 6 10
s 6 10
t 6 10
a 6 11
b 6 11
s 6 11
q 7 12
w 7 12
e 7 12
r 7 12
d 7 13
f 7 14
g 7 15
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I don't see how.
All the groups with a Y have repeated patterns.
proc sql;
create table WANT as
select trans.*
, case when COUNT>1 then 'Y' else 'N' end as REPEATED
, COUNT
, PATTERN_NO
from HAVE trans
,PATTERNS pattern
,(select PATTERN
, count(*) as COUNT
, monotonic() as PATTERN_NO
from PATTERNS
group by PATTERN) count
where trans.GROUP = pattern.group
and pattern.PATTERN = count.PATTERN
order by ROW;
quit;
NOTETEXT | OBJECTID | GROUP | CREATEDON | ROW | REPEATED | COUNT | PATTERN_NO |
a | 123 | 1 | 3:22pm | 1 | Y | 2 | 1 |
b | 123 | 1 | 3:23pm | 2 | Y | 2 | 1 |
c | 124 | 2 | 4:00pm | 3 | Y | 2 | 6 |
d | 124 | 2 | 4:02pm | 4 | Y | 2 | 6 |
a | 124 | 3 | 4:03pm | 5 | Y | 2 | 1 |
b | 124 | 3 | 4:04pm | 6 | Y | 2 | 1 |
e | 125 | 4 | 10:00am | 7 | N | 1 | 4 |
f | 125 | 4 | 10:02am | 8 | N | 1 | 4 |
g | 125 | 4 | 10:07am | 9 | N | 1 | 4 |
c | 126 | 5 | 11:06am | 10 | N | 1 | 5 |
t | 126 | 5 | 11:07am | 11 | N | 1 | 5 |
l | 126 | 5 | 11:09am | 12 | N | 1 | 5 |
c | 127 | 6 | 11:13am | 13 | Y | 2 | 6 |
d | 127 | 6 | 11:15am | 14 | Y | 2 | 6 |
a | 127 | 7 | 12:09pm | 15 | N | 1 | 7 |
b | 127 | 7 | 12:09pm | 16 | N | 1 | 7 |
s | 127 | 7 | 12:11pm | 17 | N | 1 | 7 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
New test data :
Column1 Objectid Group
a 1 1
b 1 1
d 1 2
f 1 3
g 1 4
t 2 5
y 2 5
f 3 6
a 4 7
b 4 7
d 4 8
f 5 9
r 6 10
s 6 10
t 6 10
a 6 11
b 6 11
s 6 11
q 7 12
w 7 12
e 7 12
r 7 12
d 7 13
f 7 14
g 7 15
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It works fine for me. What are you doing wrong?
data HAVE(index=(GROUP));
input (NOTETEXT OBJECTID GROUP ) ($);
ROW=_N_;
cards;
a 1 1
b 1 1
d 1 2
f 1 3
g 1 4
t 2 5
y 2 5
f 3 6
a 4 7
b 4 7
d 4 8
f 5 9
r 6 10
s 6 10
t 6 10
a 6 11
b 6 11
s 6 11
q 7 12
w 7 12
e 7 12
r 7 12
d 7 13
f 7 14
g 7 15
run;
data PATTERNS;
retain PATTERN ;
length PATTERN $16;
set HAVE;
by GROUP;
if first.GROUP then PATTERN=' ';
PATTERN=cats(PATTERN,NOTETEXT);
if last.GROUP then output;
keep GROUP PATTERN;
run;
proc sql;
create table WANT as
select trans.*
, case when COUNT>1 then 'Y' else 'N' end as REPEATED
, pattern.PATTERN
, PATTERN_NO
, COUNT
from HAVE trans
,PATTERNS pattern
,(select PATTERN
, count(*) as COUNT
, monotonic() as PATTERN_NO
from PATTERNS
group by PATTERN) count
where trans.GROUP = pattern.GROUP
and pattern.PATTERN = count.PATTERN
order by ROW;
quit;
NOTETEXT | OBJECTID | GROUP | ROW | REPEATED | PATTERN | PATTERN_NO | COUNT |
a | 1 | 1 | 1 | Y | ab | 1 | 2 |
b | 1 | 1 | 2 | Y | ab | 1 | 2 |
d | 1 | 2 | 3 | Y | d | 5 | 3 |
f | 1 | 3 | 4 | Y | f | 15 | 4 |
g | 1 | 4 | 5 | Y | g | 7 | 2 |
t | 2 | 5 | 6 | N | ty | 11 | 1 |
y | 2 | 5 | 7 | N | ty | 11 | 1 |
f | 3 | 6 | 8 | Y | f | 15 | 4 |
a | 4 | 7 | 9 | Y | ab | 1 | 2 |
b | 4 | 7 | 10 | Y | ab | 1 | 2 |
d | 4 | 8 | 11 | Y | d | 5 | 3 |
f | 5 | 9 | 12 | Y | f | 15 | 4 |
r | 6 | 10 | 13 | N | rst | 2 | 1 |
s | 6 | 10 | 14 | N | rst | 2 | 1 |
t | 6 | 10 | 15 | N | rst | 2 | 1 |
a | 6 | 11 | 16 | N | abs | 3 | 1 |
b | 6 | 11 | 17 | N | abs | 3 | 1 |
s | 6 | 11 | 18 | N | abs | 3 | 1 |
q | 7 | 12 | 19 | N | qwer | 4 | 1 |
w | 7 | 12 | 20 | N | qwer | 4 | 1 |
e | 7 | 12 | 21 | N | qwer | 4 | 1 |
r | 7 | 12 | 22 | N | qwer | 4 | 1 |
d | 7 | 13 | 23 | Y | d | 5 | 3 |
f | 7 | 14 | 24 | Y | f | 15 | 4 |
g | 7 | 15 | 25 | Y | g | 7 | 2 |