I am working on a dataset, which includes data both from the current year and from the previous year. The following is an example:
I currently have a dataset like this:
# CY_1 CY_2 CY_3 PY_1 PY_2 PY_3
1 c e g a b e
2 b c f a d g
3 b d e c d e
4 a c d a b e
I want to find out how many variables in CY_1, CY_2, and CY_3 equals valuables in PY_1, PY_2, and PY_3.
I hope to get the following output.
# CY_1 CY_2 CY_3 PY_1 PY_2 PY_3 output
1 c e g a b e 1
2 b c f a d g 0
3 b d e c d e 2
4 a c d a b e 1
Thank you!
HI @TerryC
data have;
input N (CY_1 CY_2 CY_3 PY_1 PY_2 PY_3) ($);
cards;
1 c e g a b e
2 b c f a d g
3 b d e c d e
4 a c d a b e
;
data want;
set have;
want=countc(cats(of py_:),cats(of cy_:));
run;
The above assumes your values are letters just like your example
Try this (under the assumption that none of the CY and PY contains a low-value "00"x as part of it):
data have ;
input (CY_1-CY_3 PY_1-PY_3) ($) ;
cards ;
c e g a b e
b c f a d g
b d e c d e
a c d a b e
;
run ;
data want ;
set have ;
array cy cy: ;
output = 0 ;
do over cy ;
output + ^^ find (catx ("00"x, of py:), trim (cy)) ;
end ;
run ;
Kind regards
Paul D.
Guru @hashman Plagiarizing your idea, forgive me. This is something similar posted by you in another thread about a year and half ago :). Basically, it seems the 'K' modifier in COUNTW makes everything else a delimiter except the excerpt in 2nd argument. So copying your loop construct here, the following may work for "Words"
data have;
input N (CY_1 CY_2 CY_3 PY_1 PY_2 PY_3) ($);
cards;
1 c e g a b e
2 b c f a d g
3 b d e c d e
4 a c d a b e
;
data want;
set have;
v1=cats(of py_:);
array t cy_:;
do over t;
want=sum(countw(v1,strip(t),'k'),want);
end;
drop v1;
run;
k or K | causes all characters that are not in the list of characters to be treated as delimiters. If K is not specified, all characters that are in the list of characters are treated as delimiters. |
EDIT: Please ignore. Sorry that I didn't realize it's all single chars
The above COUNTW is replace with COUNT
data want;
set have;
v1=cats(of py_:);
array t cy_:;
do over t;
want=sum(count(v1,strip(t)),want);
end;
drop v1;
run;
Thanks! It works.
Can I ask an additional question, which is similar to this one?
For example, I have the following strings. I want to get how many numbers these two strings have in common.
# String_1 String_2
1 |5|12|37|44| |8|12|44|
2 |45|67| |23|44|78|82|
3 |12|22|34|67|69|72|78| |34|37|72|78|
4 |2|14|33| |14|79|
I hope to get the following results.
# String_1 String_2 Output
1 |5|12|37|44| |8|12|44| 2
2 |45|67| |23|44|78|82| 0
3 |12|22|34|67|69|72|78| |34|37|72|78| 3
4 |2|14|33| |14|79| 1
Thank you!
Another idea:
data have;
input (cy1-cy3) ($) (py1-py3) ($);
datalines;
c e g a b e
b c f a d g
b d e c d e
a c d a b e
;
run;
data want;
set have;
array cy[*] cy:;
array py[*] py:;
do _i=1 to dim(py);
do _j=1 to dim(cy);
flag=sum(flag, (py[_i]=cy[_j]));
end;
end;
drop _:;
run;
Can a letter appear more than once in the CY's, or in the PY's?
If so, then if "a" appears twice in the CY's and once in the PY's, does that count as 1 match, or 2?
data have;
input (cy1-cy3) ($) (py1-py3) ($);
datalines;
c e g a b e
b c f a d g
b d e c d e
a c d a b e
;
run;
data want;
set have;
array cy{*} cy:;
array py{*} py:;
n=0;
do i=1 to dim(py);
if cy{i} in py then n+1;
end;
run;
proc print;run;
@Ksharp: But indeed. I keep forgetting the convenience of IN against an array's name reference for searching the array!
Kind regards
Paul D.
So am I .Human brain offen fade away something as time goes by .
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: