Hi,
I have a SAS puzzle I need help with. I have a table structured like below. In the fake data below, the Desc field has the bits of a phrase that belong to each different Qtype. The final phrases are in the second table below that shows ideally what I'd like to have.
The rules for parsing the phase makes this a difficult challenge for me. The Qtype determines what Desc are grouped together. The N_ID determines what order to put the data for each Qtype in. The N_Type determines the sub-order.
Here is where it gets even harder to explain. N_Type = 15 acts as a conjunction between sets of N_Type 8, 11 and 9 (in that order) below it until it finds another 15.
Also, a space will need to be added between each Desc once put together.
Looking at the examples below will probably answer most questions but let me know if it doesn’t.
What I have:
Qtype | N_ID | N_Type | Desc |
1000 | 1 | 15 | and |
1000 | 2 | 11 | likes |
1000 | 3 | 8 | He |
1000 | 4 | 9 | cats |
1000 | 5 | 11 | likes |
1000 | 6 | 8 | She |
1000 | 7 | 9 | dogs |
1000 | 8 | 11 | like |
1000 | 9 | 8 | They |
1000 | 10 | 9 | animals |
1000 | 11 | 15 | Or |
1000 | 12 | 11 | eat |
1000 | 13 | 8 | They |
1000 | 14 | 9 | meat |
1000 | 15 | 11 | eat |
1000 | 16 | 8 | They |
1000 | 17 | 9 | vegetables |
111 | 2 | 11 | likes |
111 | 3 | 8 | Bill |
111 | 4 | 9 | hamsters |
What I need:
Qtype | ParsedString |
1000 | He likes cats and She likes dogs and They like animals Or They eat meat Or They eat vegetables |
111 | Bill likes hamsters |
Message was edited by: Mishka1 One Added: Also, a space will need to be added between each Desc once put together.
Or more compact:
data clauses(keep=qtype str);
length s c x v j $16 str $200;
array a{8:15} s c x v x x x j;
do until (last.qtype);
set input;
by qtype notsorted;
a{nod_Type} = desc;
if cmiss(s, v, c) = 0 then do;
str = catx(cat(" ", trim(j), " "), str, catx(" ", s, v, c));
call missing (s, v, c);
end;
end;
output;
run;
PG
Should have added this to get you started:
Data Input;
input Qtype Nod_ID Nod_Type Desc $;
cards;
1000 1 15 and
1000 2 11 likes
1000 3 8 He
1000 4 9 cats
1000 5 11 likes
1000 6 8 She
1000 7 9 dogs
1000 8 11 like
1000 9 8 They
1000 10 9 animals
1000 11 15 Or
1000 12 11 eat
1000 13 8 They
1000 14 9 meat
1000 15 11 eat
1000 16 8 They
1000 17 9 vegetables
111 2 11 likes
111 3 8 Bill
111 4 9 hamsters
;
run;
But I can't get the right order of these words.
111 | 2 | 11 | likes<-- should be Second |
111 | 3 | 8 | Bill <-- should be First |
111 | 4 | 9 | hamsters<-- should be Third |
11 8 9 is not the order I need.
Hi,
Try this..not an optimal solution but it may suffice your requirement.
Data Input;
input Qtype Nod_ID Nod_Type Desc $;
if nod_type=8 then t=1;
if nod_type=11 then t=2;
if nod_type=9 then t=3;
if nod_type=15 then t=4;
cards;
1000 1 15 and
1000 2 11 likes
1000 3 8 He
1000 4 9 cats
1000 5 11 likes
1000 6 8 She
1000 7 9 dogs
1000 8 11 like
1000 9 8 They
1000 10 9 animals
1000 11 15 Or
1000 12 11 eat
1000 13 8 They
1000 14 9 meat
1000 15 11 eat
1000 16 8 They
1000 17 9 vegetables
111 2 11 likes
111 3 8 Bill
111 4 9 hamsters
;
run;
proc sort data=input;by Qtype t ;run;
proc transpose data=input out=tt let;
by qtype t;
var desc;
run;
proc transpose data=tt out=want(where=(_1 ne '')) let;
id t ;
var col1 col2 col3 col4 col5;
by qtype;
run;
data test;
set want;
xx=CATx('',OF _1 -_4);
drop _:;
run;
Data Need ;
Length ParsedString $ 400 ;
Set test ;
by qtype;
Retain ParsedString ;
ParsedString = CatX( '' , ParsedString , xx ) ;
if last.qtype then output;
drop xx;
Run ;
Thanks,
Shiva
This is close. The value in nod_type=15 needs to be repeatedly inserted between each of the combined (8,11,9) strings. So it's 8,9,11,15,8,9,11,15. I think by this method we lose the order of the Desc values from Nod_ID which determines which nod_type=15 Desc value will be inserted between the groups. This is a great start and I'll try working with it. Thank you!!
This will do it:
Data Input;
length desc $16;
input Qtype Nod_ID Nod_Type Desc $;
datalines;
1000 1 15 and
1000 2 11 likes
1000 3 8 He
1000 4 9 cats
1000 5 11 likes
1000 6 8 She
1000 7 9 dogs
1000 8 11 like
1000 9 8 They
1000 10 9 animals
1000 11 15 Or
1000 12 11 eat
1000 13 8 They
1000 14 9 meat
1000 15 11 eat
1000 16 8 They
1000 17 9 vegetables
111 2 11 likes
111 3 8 Bill
111 4 9 hamsters
;
data clauses(keep=qtype str);
retain j s v c str;
length j s v c $16 str $200;
set input;
by qtype notsorted;
if first.qtype then call missing(str, j);
select (nod_Type);
when (8) s = desc;
when (9) c = desc;
when (11) v = desc;
when (15) j = desc;
end;
if cmiss(s, v, c) = 0 then do;
str = catx(cat(" ", trim(j), " "), str, catx(" ", s, v, c));
call missing (s, v, c);
end;
if last.qtype then output;
run;
PG
Or more compact:
data clauses(keep=qtype str);
length s c x v j $16 str $200;
array a{8:15} s c x v x x x j;
do until (last.qtype);
set input;
by qtype notsorted;
a{nod_Type} = desc;
if cmiss(s, v, c) = 0 then do;
str = catx(cat(" ", trim(j), " "), str, catx(" ", s, v, c));
call missing (s, v, c);
end;
end;
output;
run;
PG
Awesome manipulation! Can you describe what the 'x' is doing in the first definition and array?
Yes, x is a scratch variable that gets the value from Desc if nod_type is not one of the expected values. A useful feature of that solution is that it accepts duplicates and any ordering of the words, as long as the nod_type=15 word doesn't arrive last.Incomplete clauses are also tolerated and ignored.
PG
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.