I have 4 columns: Column1-It has all the clicks made on the website 2) Unique ObjectId for each user,3rd column-time of the click and 4th column-group which indicates the pattern of clicks followed by the user. If the pattern starts with the click a,c,e then it comes under different group. I want to find if all the words of a group are part of the pattern followed by a different user or no? If it is a part of string then return the objectid , group and flag it as repeated. For example:
Notetext ObjectId group CreatedOn
a 123 1 3:22pm
b 123 1 3:23pm
c 124 2 4:00pm
d 124 2 4:02pm
a 124 3 4:03pm
b 124 3 4:04pm
e 125 4 10:00am
f 125 4 10:02am
g 125 4 10:07am
c 126 5 11:06am
t 126 5 11:07am
l 126 5 11:09am
c 127 6 11:13am
d 127 6 11:15am
a 127 7 12:09pm
b 127 7 12:09pm
s 127 7 12:11pm
OUTPUT EXPECTED:
NoteText ObjectId group Part_of_string
a 123 1 N
b 123 1 N
c 124 2 N
d 124 2 N
a 124 3 Y
b 124 3 Y
e 125 4 N
f 125 4 N
g 125 4 N
c 126 5 N
t 126 5 N
l 126 5 N
c 127 6 Y
d 127 6 Y
a 127 7 N
b 127 7 N
s 127 7 N
In the above example a,b was the part of the pattern followed by user 124 hence it should be flagged Y but 'a,b,s' is not repeated hence it is flagged N.
EDIT:
I want this final thing it will be great if someone can help me . The main thing is I don't want that if the pattern is the only thing present in the objectId then don't count it only count it if it is the PART of another big string under a objectId.
I want this final thing it will be great if you can help me . NoteText ObjectId Group Inquiry 1 1 Benefits 2 2 Verify 2 2 Claims 2 3 Type 2 3 Inquiry 2 4 Benefits 3 5 Verify 3 5 Benefits 4 6
Expected Output:
Notetext Count
Benefits,verify 1
Inquiry 1
Claims,Type 1
Benefits 0
For example in the above example in the above case Benefits,verify should be counted only once since it was the PART of objectID '2'.
It works fine for me. What are you doing wrong?
data HAVE(index=(GROUP));
input (NOTETEXT OBJECTID GROUP ) ($);
ROW=_N_;
cards;
a 1 1
b 1 1
d 1 2
f 1 3
g 1 4
t 2 5
y 2 5
f 3 6
a 4 7
b 4 7
d 4 8
f 5 9
r 6 10
s 6 10
t 6 10
a 6 11
b 6 11
s 6 11
q 7 12
w 7 12
e 7 12
r 7 12
d 7 13
f 7 14
g 7 15
run;
data PATTERNS;
retain PATTERN ;
length PATTERN $16;
set HAVE;
by GROUP;
if first.GROUP then PATTERN=' ';
PATTERN=cats(PATTERN,NOTETEXT);
if last.GROUP then output;
keep GROUP PATTERN;
run;
proc sql;
create table WANT as
select trans.*
, case when COUNT>1 then 'Y' else 'N' end as REPEATED
, pattern.PATTERN
, PATTERN_NO
, COUNT
from HAVE trans
,PATTERNS pattern
,(select PATTERN
, count(*) as COUNT
, monotonic() as PATTERN_NO
from PATTERNS
group by PATTERN) count
where trans.GROUP = pattern.GROUP
and pattern.PATTERN = count.PATTERN
order by ROW;
quit;
NOTETEXT | OBJECTID | GROUP | ROW | REPEATED | PATTERN | PATTERN_NO | COUNT |
a | 1 | 1 | 1 | Y | ab | 1 | 2 |
b | 1 | 1 | 2 | Y | ab | 1 | 2 |
d | 1 | 2 | 3 | Y | d | 5 | 3 |
f | 1 | 3 | 4 | Y | f | 15 | 4 |
g | 1 | 4 | 5 | Y | g | 7 | 2 |
t | 2 | 5 | 6 | N | ty | 11 | 1 |
y | 2 | 5 | 7 | N | ty | 11 | 1 |
f | 3 | 6 | 8 | Y | f | 15 | 4 |
a | 4 | 7 | 9 | Y | ab | 1 | 2 |
b | 4 | 7 | 10 | Y | ab | 1 | 2 |
d | 4 | 8 | 11 | Y | d | 5 | 3 |
f | 5 | 9 | 12 | Y | f | 15 | 4 |
r | 6 | 10 | 13 | N | rst | 2 | 1 |
s | 6 | 10 | 14 | N | rst | 2 | 1 |
t | 6 | 10 | 15 | N | rst | 2 | 1 |
a | 6 | 11 | 16 | N | abs | 3 | 1 |
b | 6 | 11 | 17 | N | abs | 3 | 1 |
s | 6 | 11 | 18 | N | abs | 3 | 1 |
q | 7 | 12 | 19 | N | qwer | 4 | 1 |
w | 7 | 12 | 20 | N | qwer | 4 | 1 |
e | 7 | 12 | 21 | N | qwer | 4 | 1 |
r | 7 | 12 | 22 | N | qwer | 4 | 1 |
d | 7 | 13 | 23 | Y | d | 5 | 3 |
f | 7 | 14 | 24 | Y | f | 15 | 4 |
g | 7 | 15 | 25 | Y | g | 7 | 2 |
Like this?
data HAVE;
input (NOTETEXT OBJECTID GROUP CREATEDON) ($);
ROW=_N_;
cards;
a 123 1 3:22pm
b 123 1 3:23pm
c 124 2 4:00pm
d 124 2 4:02pm
a 124 3 4:03pm
b 124 3 4:04pm
e 125 4 10:00am
f 125 4 10:02am
g 125 4 10:07am
c 126 5 11:06am
t 126 5 11:07am
l 126 5 11:09am
c 127 6 11:13am
d 127 6 11:15am
a 127 7 12:09pm
b 127 7 12:09pm
s 127 7 12:11pm
run;
data PATTERNS;
retain PATTERN ;
length PATTERN $16;
set HAVE;
by GROUP;
if first.GROUP then PATTERN=' ';
PATTERN=cats(PATTERN,NOTETEXT);
if last.GROUP then output;
keep GROUP PATTERN;
run;
proc sql;
create table WANT as
select trans.*
, case when COUNT>1 then 'Y' else 'N' end as REPEATED
, COUNT
from HAVE trans
,PATTERNS pattern
,(select PATTERN, count(*) as COUNT from PATTERNS group by PATTERN) count
where trans.GROUP = pattern.group
and pattern.PATTERN = count.PATTERN
order by ROW;
quit;
NOTETEXT | OBJECTID | GROUP | CREATEDON | ROW | REPEATED | COUNT |
a | 123 | 1 | 3:22pm | 1 | Y | 2 |
b | 123 | 1 | 3:23pm | 2 | Y | 2 |
c | 124 | 2 | 4:00pm | 3 | Y | 2 |
d | 124 | 2 | 4:02pm | 4 | Y | 2 |
a | 124 | 3 | 4:03pm | 5 | Y | 2 |
b | 124 | 3 | 4:04pm | 6 | Y | 2 |
e | 125 | 4 | 10:00am | 7 | N | 1 |
f | 125 | 4 | 10:02am | 8 | N | 1 |
g | 125 | 4 | 10:07am | 9 | N | 1 |
c | 126 | 5 | 11:06am | 10 | N | 1 |
t | 126 | 5 | 11:07am | 11 | N | 1 |
l | 126 | 5 | 11:09am | 12 | N | 1 |
c | 127 | 6 | 11:13am | 13 | Y | 2 |
d | 127 | 6 | 11:15am | 14 | Y | 2 |
a | 127 | 7 | 12:09pm | 15 | N | 1 |
b | 127 | 7 | 12:09pm | 16 | N | 1 |
s | 127 | 7 | 12:11pm | 17 | N | 1 |
I don't see how.
All the groups with a Y have repeated patterns.
proc sql;
create table WANT as
select trans.*
, case when COUNT>1 then 'Y' else 'N' end as REPEATED
, COUNT
, PATTERN_NO
from HAVE trans
,PATTERNS pattern
,(select PATTERN
, count(*) as COUNT
, monotonic() as PATTERN_NO
from PATTERNS
group by PATTERN) count
where trans.GROUP = pattern.group
and pattern.PATTERN = count.PATTERN
order by ROW;
quit;
NOTETEXT | OBJECTID | GROUP | CREATEDON | ROW | REPEATED | COUNT | PATTERN_NO |
a | 123 | 1 | 3:22pm | 1 | Y | 2 | 1 |
b | 123 | 1 | 3:23pm | 2 | Y | 2 | 1 |
c | 124 | 2 | 4:00pm | 3 | Y | 2 | 6 |
d | 124 | 2 | 4:02pm | 4 | Y | 2 | 6 |
a | 124 | 3 | 4:03pm | 5 | Y | 2 | 1 |
b | 124 | 3 | 4:04pm | 6 | Y | 2 | 1 |
e | 125 | 4 | 10:00am | 7 | N | 1 | 4 |
f | 125 | 4 | 10:02am | 8 | N | 1 | 4 |
g | 125 | 4 | 10:07am | 9 | N | 1 | 4 |
c | 126 | 5 | 11:06am | 10 | N | 1 | 5 |
t | 126 | 5 | 11:07am | 11 | N | 1 | 5 |
l | 126 | 5 | 11:09am | 12 | N | 1 | 5 |
c | 127 | 6 | 11:13am | 13 | Y | 2 | 6 |
d | 127 | 6 | 11:15am | 14 | Y | 2 | 6 |
a | 127 | 7 | 12:09pm | 15 | N | 1 | 7 |
b | 127 | 7 | 12:09pm | 16 | N | 1 | 7 |
s | 127 | 7 | 12:11pm | 17 | N | 1 | 7 |
It works fine for me. What are you doing wrong?
data HAVE(index=(GROUP));
input (NOTETEXT OBJECTID GROUP ) ($);
ROW=_N_;
cards;
a 1 1
b 1 1
d 1 2
f 1 3
g 1 4
t 2 5
y 2 5
f 3 6
a 4 7
b 4 7
d 4 8
f 5 9
r 6 10
s 6 10
t 6 10
a 6 11
b 6 11
s 6 11
q 7 12
w 7 12
e 7 12
r 7 12
d 7 13
f 7 14
g 7 15
run;
data PATTERNS;
retain PATTERN ;
length PATTERN $16;
set HAVE;
by GROUP;
if first.GROUP then PATTERN=' ';
PATTERN=cats(PATTERN,NOTETEXT);
if last.GROUP then output;
keep GROUP PATTERN;
run;
proc sql;
create table WANT as
select trans.*
, case when COUNT>1 then 'Y' else 'N' end as REPEATED
, pattern.PATTERN
, PATTERN_NO
, COUNT
from HAVE trans
,PATTERNS pattern
,(select PATTERN
, count(*) as COUNT
, monotonic() as PATTERN_NO
from PATTERNS
group by PATTERN) count
where trans.GROUP = pattern.GROUP
and pattern.PATTERN = count.PATTERN
order by ROW;
quit;
NOTETEXT | OBJECTID | GROUP | ROW | REPEATED | PATTERN | PATTERN_NO | COUNT |
a | 1 | 1 | 1 | Y | ab | 1 | 2 |
b | 1 | 1 | 2 | Y | ab | 1 | 2 |
d | 1 | 2 | 3 | Y | d | 5 | 3 |
f | 1 | 3 | 4 | Y | f | 15 | 4 |
g | 1 | 4 | 5 | Y | g | 7 | 2 |
t | 2 | 5 | 6 | N | ty | 11 | 1 |
y | 2 | 5 | 7 | N | ty | 11 | 1 |
f | 3 | 6 | 8 | Y | f | 15 | 4 |
a | 4 | 7 | 9 | Y | ab | 1 | 2 |
b | 4 | 7 | 10 | Y | ab | 1 | 2 |
d | 4 | 8 | 11 | Y | d | 5 | 3 |
f | 5 | 9 | 12 | Y | f | 15 | 4 |
r | 6 | 10 | 13 | N | rst | 2 | 1 |
s | 6 | 10 | 14 | N | rst | 2 | 1 |
t | 6 | 10 | 15 | N | rst | 2 | 1 |
a | 6 | 11 | 16 | N | abs | 3 | 1 |
b | 6 | 11 | 17 | N | abs | 3 | 1 |
s | 6 | 11 | 18 | N | abs | 3 | 1 |
q | 7 | 12 | 19 | N | qwer | 4 | 1 |
w | 7 | 12 | 20 | N | qwer | 4 | 1 |
e | 7 | 12 | 21 | N | qwer | 4 | 1 |
r | 7 | 12 | 22 | N | qwer | 4 | 1 |
d | 7 | 13 | 23 | Y | d | 5 | 3 |
f | 7 | 14 | 24 | Y | f | 15 | 4 |
g | 7 | 15 | 25 | Y | g | 7 | 2 |
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.