BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Mishka1
Fluorite | Level 6

Hi,

I have a SAS puzzle I need help with. I have a table structured like below. In the fake data below, the Desc field has the bits of a phrase that belong to each different Qtype. The final phrases are in the second table below that shows ideally what I'd like to have.

The rules for parsing the phase makes this a difficult challenge for me. The Qtype determines what Desc are grouped together. The N_ID determines what order to put the data for each Qtype in. The N_Type determines the sub-order.

Here is where it gets even harder to explain. N_Type = 15 acts as a conjunction between sets of N_Type 8, 11 and 9 (in that order) below it until it finds another 15.

Also, a space will need to be added between each Desc once put together.

Looking at the examples below will probably answer most questions but let me know if it doesn’t.

What I have:

Qtype

N_ID

N_Type

Desc

1000

1

15

and

1000

2

11

likes

1000

3

8

He

1000

4

9

cats

1000

5

11

likes

1000

6

8

She

1000

7

9

dogs

1000

8

11

like

1000

9

8

They

1000

10

9

animals

1000

11

15

Or

1000

12

11

eat

1000

13

8

They

1000

14

9

meat

1000

15

11

eat

1000

16

8

They

1000

17

9

vegetables

111

2

11

likes

111

3

8

Bill

111

4

9

hamsters

What I need:

Qtype

ParsedString

1000

He likes cats and She likes dogs and They like animals Or They eat meat Or They eat vegetables

111

Bill likes hamsters

Message was edited by: Mishka1 One Added: Also, a space will need to be added between each Desc once put together.

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Or more compact:

data clauses(keep=qtype str);

length s c x v j $16 str $200;

array a{8:15} s c x v x x x j;

do until (last.qtype);

set input;

by qtype notsorted;

a{nod_Type} = desc;

if cmiss(s, v, c) = 0 then do;

  str = catx(cat(" ", trim(j), " "), str, catx(" ", s, v, c));

  call missing (s, v, c);

end;

end;

output;

run;

PG

PG

View solution in original post

8 REPLIES 8
Mishka1
Fluorite | Level 6

Should have added this to get you started:

Data Input;

  input Qtype    Nod_ID     Nod_Type   Desc $;

  cards;

1000  1     15    and

1000  2     11    likes

1000  3     8     He

1000  4     9     cats

1000  5     11    likes

1000  6     8     She

1000  7     9     dogs

1000  8     11    like

1000  9     8     They

1000  10    9     animals

1000  11    15    Or

1000  12    11    eat

1000  13    8     They

1000  14    9     meat

1000  15    11    eat

1000  16    8     They

1000  17    9     vegetables

111         2     11    likes

111         3     8     Bill

111         4     9     hamsters

;

run;

Ksharp
Super User

But I can't get the right order of these words.

111

2

11

likes<-- should be Second

111

3

8

Bill <-- should be First

111

4

9

hamsters<-- should be Third

11 8 9 is not the order I need.

shivas
Pyrite | Level 9

Hi,

Try this..not an optimal solution but it may suffice your requirement.

Data Input;

  input Qtype    Nod_ID     Nod_Type   Desc $;

  if nod_type=8 then t=1;

if nod_type=11 then t=2;

if nod_type=9 then t=3;

if nod_type=15 then t=4;

  cards;

1000  1     15    and

1000  2     11    likes

1000  3     8     He

1000  4     9     cats

1000  5     11    likes

1000  6     8     She

1000  7     9     dogs

1000  8     11    like

1000  9     8     They

1000  10    9     animals

1000  11    15    Or

1000  12    11    eat

1000  13    8     They

1000  14    9     meat

1000  15    11    eat

1000  16    8     They

1000  17    9     vegetables

111   2     11    likes

111   3     8     Bill

111   4     9     hamsters

;

run;

proc sort data=input;by  Qtype t ;run;

proc transpose data=input out=tt let;

by qtype t;

var desc;

run;

proc transpose data=tt out=want(where=(_1 ne '')) let;

id t ;

var col1 col2 col3 col4 col5;

by qtype;

run;

data test;

set want;

xx=CATx('',OF _1 -_4);

drop  _:;

run;

Data Need ;

Length ParsedString $ 400 ;

Set test ;

by qtype;

Retain ParsedString ;

ParsedString = CatX( '' , ParsedString , xx ) ;

if last.qtype then output;

drop xx;

Run ;

Thanks,

Shiva

Mishka1
Fluorite | Level 6

This is close. The value in nod_type=15 needs to be repeatedly inserted between each of the combined (8,11,9) strings. So it's 8,9,11,15,8,9,11,15. I think by this method we lose the order of the Desc values from Nod_ID which determines which nod_type=15 Desc value will be inserted between the groups. This is a great start and I'll try working with it. Thank you!!

PGStats
Opal | Level 21

This will do it:

Data Input;
  length desc $16;
  input Qtype    Nod_ID     Nod_Type   Desc $;
  datalines;
1000  1     15    and
1000  2     11    likes
1000  3     8     He
1000  4     9     cats
1000  5     11    likes
1000  6     8     She
1000  7     9     dogs
1000  8     11    like
1000  9     8     They
1000  10    9     animals
1000  11    15    Or
1000  12    11    eat
1000  13    8     They
1000  14    9     meat
1000  15    11    eat
1000  16    8     They
1000  17    9     vegetables
111   2     11    likes
111   3     8     Bill
111   4     9     hamsters
;

data clauses(keep=qtype str);
retain j s v c str;
length j s v c $16 str $200;
set input;
by qtype notsorted;
if first.qtype then call missing(str, j);
select (nod_Type);
when (8) s = desc;
when (9) c = desc;
when (11) v = desc;
when (15) j = desc;
end;
if cmiss(s, v, c) = 0 then do;
str = catx(cat(" ", trim(j), " "), str, catx(" ", s, v, c));
call missing (s, v, c);
end;
if last.qtype then output;
run;

PG

PG
PGStats
Opal | Level 21

Or more compact:

data clauses(keep=qtype str);

length s c x v j $16 str $200;

array a{8:15} s c x v x x x j;

do until (last.qtype);

set input;

by qtype notsorted;

a{nod_Type} = desc;

if cmiss(s, v, c) = 0 then do;

  str = catx(cat(" ", trim(j), " "), str, catx(" ", s, v, c));

  call missing (s, v, c);

end;

end;

output;

run;

PG

PG
Mishka1
Fluorite | Level 6

Awesome manipulation! Can you describe what the 'x' is doing in the first definition and array?

PGStats
Opal | Level 21

Yes, x is a scratch variable that gets the value from Desc if nod_type is not one of the expected values. A useful feature of that solution is that it accepts duplicates and any ordering of the words, as long as the nod_type=15 word doesn't arrive last.Incomplete clauses are also tolerated and ignored.

PG

PG

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1199 views
  • 7 likes
  • 4 in conversation