## extracting letters from observations and matching them together with other observations

Solved
Occasional Contributor
Posts: 12

# extracting letters from observations and matching them together with other observations

I have data like this

 Code Sx1 Sx2 Sx3 SX4 P1 XX AA ZZ HH P2 AA XX AA BB RR1 XA XX ZA HB RR2 AA XA ZA BH RR3 AX AX ZZ HH RR4 AA AA AA BB RR5 XX XX ZA HB RR6 AX XA ZZ HH RR7 AA AX AA BB RR8 XX AA AZ HH

I want to extract one letter from P1 (X) and another letter from P2 (A) and then I want to use those extracted letters (AX or XA) to match with observations from RR1 to RR8, if they match then 1, else 0. I want to use arrays because my data contains over 8,000 observations.

I want like this

 SX1 SX2 SX3 SX4 1 0 1 1 0 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 1 0 0 0 0 1 0

I am trying to solve this by using Substr function

Data x;

Set y;

var1 = SUBSTR (SX1,1,1);

var2= SUBSTR (SX2,1,1);

var3= SUBSTR (SX3,1,1);

var4= SUBSTR (SX4,1,1);

PACKAGE_SUBSTR = LETTER_1|| LETTER_2|| LETTER_3;

RUN;

I will use arrays to store my variables from SX1-SX4 and then

Here after I want to match this concatenated variable with the other observations in the array to reach my goal. I want to know I am proceeding in the right way or not? help neede to solve this problem

Thanking you very much

Accepted Solutions
Solution
‎05-23-2013 07:27 AM
Regular Contributor
Posts: 195

## Re: extracting letters from observations and matching them together with other observations

Hi,

Here is the modifies SAS Code...

data want(drop = i);

set have;

array _temp (4) sx1-sx4;

if code = "P1" or code = "P2" then do;

do i = 1 to 4;

_temp(i) = substr(_temp(i),1,1);

end;

end;

run;

proc transpose data = want

out  = want;

id code;

var sx1-sx4;

run;

data want;

set want;

first_letter = compress(P1||P2);

array _temp (8000) rr1-rr8000;

do i = 1 to 8000;

if first_letter = _temp(i) or reverse(first_letter) = _temp(i) then _temp(i) = 1;

else _temp(i) = 0;

end;

run;

proc transpose data = want

out  = want(rename = (_NAME_ = CODE));

var rr1-rr8000;

run;

Based on your observation, you need to change the value of 8000 in ARRAY statement...

-Urvish

All Replies
Regular Contributor
Posts: 195

## Re: extracting letters from observations and matching them together with other observations

Hi,

If you are saying that you extract one letter from P1(X) and other from P2(A) and you combine them to produce XA or AX and then you will compare with other observations...then my question is like if the value of CODE variable is P1 or P2 then there will be other extracted words like ZA or HB so in that case don't you need to compare ZA or HB values with other values...???

Please clear that whether you want to extract only XA or AX that is SX1 and SX2 values or you want first letter of all the SX1-SX4 variables...and then compare them...??? Because in your explanation, the only value want to compare is XA or AX, what about other values like ZA,HB...???

Here is the possible solution...

data full_data(drop = i)    /* Contains all other values except P1 or P2 */

first_letter(drop = i); /* Contains only first letter of P1 or P2    */

set have;

array _temp (4) sx1-sx4;

if code = "P1" or code = "P2" then do;

do i = 1 to 4;

_temp(i) = substr(_temp(i),1,1);

end;

end;

if code = "P1" or code = "P2" then output first_letter;

else do;

output full_data;

end;

run;

/* Transpose the first_letter data to concatenate the values of P1 and P2 */

proc transpose data = first_letter

out  = first_letter

id code;

var sx1-sx4;

run;

data first_letter(drop = P1 P2);

set first_letter;

first_letter = compress(P1||P2);

run;

proc transpose data = first_letter

out  = first_letter(drop = _NAME_);

var first_letter;

run;

Now at this step you have first letter of P1 and P2 for all SX1 to SX4 variables...These values you can eassily compare with RR1 to RR8 values...

But please make sure that you want AX/XA or also you want other values...

You can directly compare first_letter(containing P1 and P2 first letter values) dataset values to full_data(RR1 to RR8 values) dataset values by joining them and assign conditions of 0 and 1...

Hope it works and giving you idea of what i want to say...

-Urvish

Posts: 4,741

## Re: extracting letters from observations and matching them together with other observations

To give you an answer you need to explain your data even better.

From what I understand so far you want to select the first letter of column "SX1" for rows where the code is either "P1" or "P2", then combine these selected letters and compare it to all columns in rows where the code is "RR1" to "RR8".

But how does your data continue? Is it then code "P3" and "P4" rows to be compared to code rows "RR9" to "RR16"? Or are these 2 "P1" and "P2" rows the only ones in your data and then we're having "RR1" to "RR8000"?

Please explain the logic to be built as a narrative.

Occasional Contributor
Posts: 12

## Re: extracting letters from observations and matching them together with other observations

Dear Patrick,

Thank you very much for your help. My data will continue as your expected i mean after some rows my P1 and P2 will change, if they changed ( for example p3 and p4) i have to repeat the same procedure like extracting 1st letter from both P3 and P4 and matching them as explained. I think Do until loop will for me in that case.

Thanking you very much,

Regards

Solution
‎05-23-2013 07:27 AM
Regular Contributor
Posts: 195

## Re: extracting letters from observations and matching them together with other observations

Hi,

Here is the modifies SAS Code...

data want(drop = i);

set have;

array _temp (4) sx1-sx4;

if code = "P1" or code = "P2" then do;

do i = 1 to 4;

_temp(i) = substr(_temp(i),1,1);

end;

end;

run;

proc transpose data = want

out  = want;

id code;

var sx1-sx4;

run;

data want;

set want;

first_letter = compress(P1||P2);

array _temp (8000) rr1-rr8000;

do i = 1 to 8000;

if first_letter = _temp(i) or reverse(first_letter) = _temp(i) then _temp(i) = 1;

else _temp(i) = 0;

end;

run;

proc transpose data = want

out  = want(rename = (_NAME_ = CODE));

var rr1-rr8000;

run;

Based on your observation, you need to change the value of 8000 in ARRAY statement...

-Urvish

Occasional Contributor
Posts: 12

## Re: extracting letters from observations and matching them together with other observations

Posted in reply to UrvishShah

Dear urvish,

Thank you very much for your help and sorry not explaining my problem very well.

i just want to extract one letter from both P1 and P2 for one variables for example SX1 (XA or AX) and if i get extracted letters like XA or AX and then i want to match  them with remaining observation (RR1 to RR8) of SX1 vars, if they match then 1 else 0. I want to do like this for each variable like SX2 (XA or AX), SX3(ZA or AZ) and SX4 (HB).

Thanking you very much,

Regards,

Regular Contributor
Posts: 195

## Re: extracting letters from observations and matching them together with other observations

Ya i understood your requirement...Just do one thing...Copy the above code which i have posted it second time...and in ARRAY statement of final data step just change the value of 8000 to as per your number of observations...

The above SAS Programme will excatly meeting your requirement as i have tested it...

-Urvish

Occasional Contributor
Posts: 12

## Re: extracting letters from observations and matching them together with other observations

Hi genetist,

The function VERIFY will do. You may try this (it's just an example and you need to adjust your dataset):

data a;

input id1 \$ id2 \$;

cards;

AB CD

;

data b;

input a \$;

cards;

AC

CD

BD

FE

AA

;

data c;

set b;

if _n_=1 then merge a;

c=verify(a,id1);

d=verify(a,id2);

if c+d<3 then a='0';

else a='1';

run;

proc print;run;

Qinly

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
• 7 replies
• 276 views
• 0 likes
• 4 in conversation