concatenating variable number of text lines with identical keys in table

Reply
Occasional Contributor
Posts: 15

concatenating variable number of text lines with identical keys in table

[ Edited ]

Hello

I have inheritted a table layout that saves continuous notes as parsed lines of the text.  I need to be able to put them back together but due to the variation in length parameters and returns used in the text entry, the number of lines is not uniform from note to note.

 

A slight twist on the original question.  After re-examining the table: 

 

Notice that for a given Survey and ID there can be multiple notes ranging lines 1 to n with different time stamps.  I would like my final output to have variables survey,ID,Text-where text has concatenated consecutive "line" records on survey,id,timestamp beginning with the associated "<timestamp>" as the first string.  Onto this, I would lke to concatenate notes with identical survey and Id but differing timestamps.  As a deliniator between each note (indicated by a differing timestamp and the restart of line numbering) I would like "||" before the next "<timestamp> continuous note text" See excel spreadsheet sample for a clear picture of what I'm trying to say.    

 

data notesconcat;
set notes;
by ID Timestamp;
length new_text $32767;
retain new_text;
new_text=catx(' ',new_text,text);
if last.id then do;
output;
call missing(new_text);
drop line Text;
end;
Run;

 

I'm new to sas and proc sql so if anyone has code I can copy and paste in, I would greatly appreciate it!

Super User
Posts: 17,750

Re: concatenating variable number of text lines with identical keys in table

Simple way - transpose and then use a CAT type function.

 

Can you post some data, not as text, but that we can copy into the editor to code with? Smiley Happy

Occasional Contributor
Posts: 15

Re: concatenating variable number of text lines with identical keys in table

[ Edited ]

I have attached sample data in an excel spreadsheet showing how it is in the table vs. how i need the output.

Super User
Posts: 17,750

Re: concatenating variable number of text lines with identical keys in table

You'll have to play around to find the correct CAT(T/S/X) function that works and adds the spaces in for you between rows so that you have readable sentences. 

proc transpose data=have out=flipped prefix=words;
by id;
var sentence;
run;

data want;
length out $500.;
set flipped;

array words(*) words1-words99;

out=catx(" ", of words(*));

run;

 

 

 

Respected Advisor
Posts: 3,887

Re: concatenating variable number of text lines with identical keys in table

Something like below should do:

 

data have;
  infile datalines truncover;
  input Survey $ ID $ Text $100.;
  datalines;
A 1 I need this
A 1 on one
A 1 line
A 2 false 
A 2 alarm
B 4 concatenate
B 4 these two
C 5 first name
C 5 middle 
C 5 last
;
run;

data want;
  set have;
  by Survey id;
  length new_text $1000;
  retain new_text;
  new_text=catx(' ',new_text,text);
  if last.id then
    do;
      output;
      call missing(new_text);
    end;
run;
Occasional Contributor
Posts: 15

Re: concatenating variable number of text lines with identical keys in table

[ Edited ]

Your code worked but I noticed that the system allows for multiple id survey notes denoted only by a differing timestamp.  As is, the code overwrites anything with a differing timestamp but same id,survey. I added in timestamp to the "by" line and this makes it continuous but hard to tell when the next note began.  I'd line to put a deliniator of some sort in like <timestamp> at the beggining of each new timestamp and then seperate different notes for the same Id, survey combo with a "||".  See my updated excel spreadsheet for a visual of the actual table setup and the desired output.

 

data notesconcat;
set notes;
by ID Timestamp;
length new_text $32767;
retain new_text;
new_text=catx(' ',new_text,text);
if last.id then do;
output;
call missing(new_text);
drop line Text;
end;
Run;
Super User
Posts: 17,750

Re: concatenating variable number of text lines with identical keys in table

Since we don't know which solution you're using it's hard to comment.

 

It's relatively straightforward to add the variable into the CAT function, please post you code and someone can help modify if you're not getting it correct.

 

Also please mark this question as answered.

Occasional Contributor
Posts: 15

Re: concatenating variable number of text lines with identical keys in table

Adding this bit of code accomplished what I wanted.

 

data notesconcat1;
set notes;
If line=1 then do text=cat('||',timestamp,'||',text);
end;
run;

Occasional Contributor
Posts: 15

Re: concatenating variable number of text lines with identical keys in table

Hello All.  I'm back again.  The solution worked for a while but I've since been tasked with concatenating all notes for each ID deliminated by the formatted date_time.  My code below works, but it cuts off much earlier than the max length I set it for.  

Whenever the date_time stamp is concatenated with it, it appears not to be able to take the full string for seq 1 (line one) of each note.

I came across another post saying that if there is a where clause in the statement then cats and cat() can only handle $200 length.  This is a huge problem for me as I can't have the middle of messages just drop off.

 

Can anyone think of a way where I can:

 

a)  conditionally insert a record for each (ID Time_STAMP) combination that I could then run the following code on where it would not be subject to this constraint?

 

b) concat the timestamp to the begining of the note without using my if seq=1 line (that I think is causing the limit of characters

 

c) redistribute the the text that would be cut off, to the next line, on and on down to the last line

 

d)re-arrange my code in a way that would still make this work and would not be subject to the the text cutting off before 32767 character limit.

 

 

Here's an example of what's happening (and see pic below and excel document attached containing what's in the picture):

 

Survey     ID    date_time       Seq      time_key           Textx

123          1      sasdatetime1   1         sastime1          Hello

123          1      sasdatetime1   2        sastime1           World

123           1     sasdatetime2   1        sastime2           Good

123           1     sasdatetime2   2        sastime2           Bye

....

so what the code below produces is:

123     1         sasdatetime 1        2     sastime1       (sastime1) Hel World

 

 (See actual) case in attached excel document.  Although I've set textx and new_text fields for 

length new_text $32767;

 

.Sample for SasForum.jpg

 

 

 

data notes;
set other.SUPPLY_NOTES;
length Textx $32767;
keep ID Survey SEQ Textx Item_Steps time_key date_time;
If (SEQ=1 and item_steps not in ("","."," ")) 
then do textx=cat('(',put(date_time,datetime18.),') ','||',"ITEM: ",ITEM_Steps,' || ',textx);
end; data notes; set other.SUPPLY_NOTES; keep ID Survey SEQ Textx Item_Steps time_key date_time; where survey="115"; run;
proc sort data=notes; by id time_key SEQ; Run;
proc sort data=itemnotes; by id time_key SEQ; Run;
data notesconcat; set notes; by ID time_key Seq; length new_text $32767; retain new_text; new_text=catx(' ',new_text,textx); if last.id then do; output; call missing(new_text); drop line Textx; end; run;

 

Thank you all for your assistance in advance!

Super User
Posts: 5,074

Re: concatenating variable number of text lines with identical keys in table

A DATA step lets you program this in a straightforward manner.  Assuming that your data set is sorted already by SURVEY ID:

 

data want;

set have;

by survey id;

length combined_text $ 500;

if first.id then combined_text = text;

else combined_text = trim(combined_text) || ' ' || text;

if last.id;

run;

 

You have to pick a sensible length for COMBINED_TEXT ... I'm just guessing here at $500.

 

Good luck.

Ask a Question
Discussion stats
  • 9 replies
  • 1411 views
  • 1 like
  • 4 in conversation