BookmarkSubscribeRSS Feed
TerryC
Calcite | Level 5

I am working on a dataset, which includes data both from the current year and from the previous year. The following is an example:

 

I currently have a dataset like this:

#    CY_1 CY_2 CY_3         PY_1 PY_2 PY_3

1       c        e          g               a        b        e

2       b        c          f                a        d        g

3       b        d          e               c        d        e

4       a        c          d               a        b        e

 

I want to find out how many variables in CY_1, CY_2, and CY_3 equals valuables in PY_1, PY_2, and PY_3.

 

I hope to get the following output.

#    CY_1 CY_2 CY_3            PY_1 PY_2 PY_3       output

1       c        e          g                  a        b        e             1

2       b        c          f                   a        d        g             0

3       b        d          e                  c        d        e             2

4       a        c          d                  a        b        e             1

 

Thank you!

 

9 REPLIES 9
novinosrin
Tourmaline | Level 20

HI @TerryC 

 


data have;
input N    (CY_1 CY_2 CY_3         PY_1 PY_2 PY_3) ($);
cards;
1       c        e          g               a        b        e
2       b        c          f                a        d        g
3       b        d          e               c        d        e
4       a        c          d               a        b        e
;

data want;
 set have;
 want=countc(cats(of py_:),cats(of cy_:));
run;

The above assumes your values are letters just like your example

hashman
Ammonite | Level 13

@TerryC:

Try this (under the assumption that none of the CY and PY contains a low-value "00"x as part of it):

data have ;                                                                                                                             
  input (CY_1-CY_3 PY_1-PY_3) ($) ;                                                                                                     
  cards ;                                                                                                                               
c  e  g    a  b  e                                                                                                                      
b  c  f    a  d  g                                                                                                                      
b  d  e    c  d  e                                                                                                                      
a  c  d    a  b  e                                                                                                                      
;                                                                                                                                       
run ;                                                                                                                                   
                                                                                                                                        
data want ;                                                                                                                             
  set have ;                                                                                                                            
  array cy cy: ;                                                                                                                        
  output = 0 ;                                                                                                                          
  do over cy ;                                                                                                                          
    output + ^^ find (catx ("00"x, of py:), trim (cy)) ;                                                                                
  end ;                                                                                                                                 
run ;            

Kind regards

Paul D. 

novinosrin
Tourmaline | Level 20

Guru @hashman  Plagiarizing your idea, forgive me. This is something similar posted by you in another thread about a year and half ago :). Basically, it seems the 'K' modifier in COUNTW makes everything else a delimiter except the excerpt in 2nd argument. So copying your loop construct here, the following may work for "Words"

 

data have;
input N    (CY_1 CY_2 CY_3         PY_1 PY_2 PY_3) ($);
cards;
1       c        e          g               a        b        e
2       b        c          f                a        d        g
3       b        d          e               c        d        e
4       a        c          d               a        b        e
;

data want;
 set have;
 v1=cats(of py_:);
 array t cy_:;
  do over t;
    want=sum(countw(v1,strip(t),'k'),want);
  end;
 drop v1;
run;
k or K causes all characters that are not in the list of characters to be treated as delimiters. If K is not specified, all characters that are in the list of characters are treated as delimiters.

 

EDIT: Please ignore. Sorry that I didn't realize it's all single chars

 

The above  COUNTW is replace with COUNT

 


data want;
 set have;
 v1=cats(of py_:);
 array t cy_:;
  do over t;
    want=sum(count(v1,strip(t)),want);
  end;
 drop v1;
run;

 

TerryC
Calcite | Level 5

Thanks! It works.

 

Can I ask an additional question, which is similar to this one?

 

For example, I have the following strings. I want to get how many numbers these two strings have in common.

 

#             String_1                                   String_2

1            |5|12|37|44|                        |8|12|44|

2            |45|67|                                |23|44|78|82|

3            |12|22|34|67|69|72|78|       |34|37|72|78|

4            |2|14|33|                             |14|79|

 

I hope to get the following results.

 

#             String_1                                   String_2                    Output

1            |5|12|37|44|                        |8|12|44|                              2           

2            |45|67|                                |23|44|78|82|                       0

3            |12|22|34|67|69|72|78|       |34|37|72|78|                       3                            

4            |2|14|33|                             |14|79|                                 1

 

Thank you!

unison
Lapis Lazuli | Level 10

Another idea:

data have;
	input (cy1-cy3) ($) (py1-py3) ($);
	datalines;
c e g a b e
b c f a d g
b d e c d e
a c d a b e
;
run;

data want;
	set have;
	array cy[*] cy:;
	array py[*] py:;
	do _i=1 to dim(py);

		do _j=1 to dim(cy);
			flag=sum(flag, (py[_i]=cy[_j]));
		end;
	end;
	drop _:;
run;
-unison
mkeintz
PROC Star

Can a letter appear more than once in the CY's,  or in the PY's?

 

If so, then if "a" appears twice in the CY's  and once in the PY's, does that count as 1 match, or 2?

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User
data have;
	input (cy1-cy3) ($) (py1-py3) ($);
	datalines;
c e g a b e
b c f a d g
b d e c d e
a c d a b e
;
run;

data want;
	set have;
	array cy{*} cy:;
	array py{*} py:;
	n=0;
	do i=1 to dim(py);
	 if cy{i} in py then n+1;
	end;
run;

proc print;run;
hashman
Ammonite | Level 13

@Ksharp: But indeed. I keep forgetting the convenience of IN against an array's name reference for searching the array!

 

Kind regards

Paul D. 

Ksharp
Super User

So am I .Human brain offen fade away something as time goes by .

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 2156 views
  • 4 likes
  • 6 in conversation