DATA Step, Macro, Functions and more

Use retain to aggregate strings across observations

Accepted Solution Solved
Reply
Contributor
Posts: 24
Accepted Solution

Use retain to aggregate strings across observations

Hi all,

 

I am learning how to use the retain statement. I got confused when I tried to aggregate strings across observations by marking <span class="usertitle"> = 1 and </span> =0. Here is the record and the intended results (User_Name).

 

Record                                                 User_Name

</ul>                                                               

</div>. 

<span class="usertitle">

ABC                                                       ABCDEF

DEF

</span>            

<span class="usertitle">

<span style="...></span>

</span>

 

Please help. Thank you.


Accepted Solutions
Solution
‎01-04-2017 10:36 PM
Super User
Posts: 17,868

Re: Use retain to aggregate strings across observations

_n_=1 is the wrong check - that checks only the first record. 

 

This is a bit hokey, but it gets the job done. I'm not sure what your original source is, but there's usually ways to parse some of this out ahead of time...sometimes.

 

data have;
	x='<span class="usertitle">';
	output;
	x="Junior";
	output;
	x="Member";
	output;
	x="</span>";
	output;
run;

data want;
	set have;
	retain flag value;
	length value $100.;

	if find(x, 'usertitle')>0 then
		flag=1;

	if flag=1 and x="</span>" then
		do;
			flag=0;
			call missing(value);
		end;

	if flag=1 and substr(x, 1, 1) ne "<" then
		value=catx(" ", value, x);
run;

proc print data=want;
run;

View solution in original post


All Replies
Super User
Posts: 17,868

Re: Use retain to aggregate strings across observations

Post what you've tried. 

Respected Advisor
Posts: 3,896

Re: Use retain to aggregate strings across observations

@may0423

Sounds like a very similar problem to the discussion here:

https://communities.sas.com/t5/Base-SAS-Programming/Extract-rows-from-website/m-p/321317/highlight/t...

 

See if you can amend the code for your data.

 

Contributor
Posts: 24

Re: Use retain to aggregate strings across observations

@Patrick yes. Inspired by your code, I kept practicing and changing, and now codes are getting weirder......

 

8 Data test;
9 Set work. Posts;
10 Retain user_title 0;
11 If find(record,'usertitle">') then usertitle=1;
12  else if find(record,'</span>') then usertitle=0;

 

Here are errors that I could not get rid of:

ERROR 386-185: Expecting an arithmetic expression.

ERROR 200-322: The symbol is not recognized and will be ignored.

ERROR 76-322: Syntax error, statement will be ignored."

 

Actually why sometimes there is a 0 in the retain statement and sometimes not? Those errors pointed to the ' in find statement.....

Super User
Posts: 17,868

Re: Use retain to aggregate strings across observations

@may0423 It's helpful if you post the exact log. 

Super User
Posts: 10,516

Re: Use retain to aggregate strings across observations

Two things that will help diagnose issues like this: First post the entire code and messages from the log. Second paste the copied code into a code box opened with the {i} icon on the menu above.

 

The error messages you show will almost certainly appear with an underscore character highlighting the location the compiler has an issue with. Posting into the code box will preserve formatting to show where that character is.

 

You may also want to see if this spelling is actually as desired:

 

10 Retain user_title 0;
11 If find(record,'usertitle">') then usertitle=1;
12  else if find(record,'</span>') then usertitle=0;

Contributor
Posts: 24

Re: Use retain to aggregate strings across observations

Thank you everybody.

 

@ballardw. The Errors were gone. The questions are:

 

(1) the <span class="usertitle"> is not wanted, how could I get rid of it when the information in tmptitle is transfered to user_title?

(2) what if the information wanted across two observations, e.g. retain junior (next row) member?

(3) how to flag NA in photo 2 situation? the information between  <span class="usertitle"> and </span> was not even flagged as 1.

 

@Patrick mentioned Perl Regular Expressions that is very worthwhile learning, but it seems too advanced for me at this point. @Reeza I pasted the unfinished code after the photos. 

 

Thank you all again!

 

1.jpg

 

2.jpg

 

 

Data test;
Set work. test0;
Retain user_title tmp;
if find (record, 'usertitle">') then tmp=1;
if find (record, '</span>') then tmp=0;
if tmp=1 then do;
tmptitle=record;
end;
if _n_=1 then user_title=tmptitle;
else user_title=user_title||tmptitle;
Run;

Contributor
Posts: 24

Re: Use retain to aggregate strings across observations

Question 1 & 3 were solved (see code below). The code works well for information in only one observation. If information is across two observations, I still don't know how to retain them into one observation with basic statements, and the records afterwards could not be correctly matched in the output anymore.....(I'm thinking about using || to connect them, but no idea yet.....)

 

e.g. Junior Member is wanted in one cell under usertitle column and next to Member like the photo shown:

 

1.jpg

 

 

Code of question 1&3:

Data test;
Set work. test0;

retain titlecheck tmpusertitle;
if find(record, 'usertitle">') then titlecheck=1;
if find(record, '</span>') then titlecheck=0;
if titlecheck=1 then user_title=record;

 

if _n_=1 then do;

...... (omit some steps of other variables here);

tmpusertitle=user_title;

end;

...... (omit some steps of other variables here);

if user_title ne "" then tmpusertitle=user_title;

if post_id ne "" then do;

user_title=tmpusertitle;

end;

 

Solution
‎01-04-2017 10:36 PM
Super User
Posts: 17,868

Re: Use retain to aggregate strings across observations

_n_=1 is the wrong check - that checks only the first record. 

 

This is a bit hokey, but it gets the job done. I'm not sure what your original source is, but there's usually ways to parse some of this out ahead of time...sometimes.

 

data have;
	x='<span class="usertitle">';
	output;
	x="Junior";
	output;
	x="Member";
	output;
	x="</span>";
	output;
run;

data want;
	set have;
	retain flag value;
	length value $100.;

	if find(x, 'usertitle')>0 then
		flag=1;

	if flag=1 and x="</span>" then
		do;
			flag=0;
			call missing(value);
		end;

	if flag=1 and substr(x, 1, 1) ne "<" then
		value=catx(" ", value, x);
run;

proc print data=want;
run;
Contributor
Posts: 24

Re: Use retain to aggregate strings across observations

Hi @Reeza,

 

Thank you for the code. My original source is from a website (actually it could be from any website as it is just for practice). I saved the records in a dataset which have 8k records, then started statements. As for the Junior Member thing, it could sometimes be Senior Member or something else member. I am not sure if the x= in your code would work as I could not go through 8k records to specified the type of members. Please advise. 

Super User
Posts: 17,868

Re: Use retain to aggregate strings across observations


may0423 wrote:

Hi @Reeza,

 

Thank you for the code. My original source is from a website (actually it could be from any website as it is just for practice). I saved the records in a dataset which have 8k records, then started statements. As for the Junior Member thing, it could sometimes be Senior Member or something else member. I am not sure if the x= in your code would work as I could not go through 8k records to specified the type of members. Please advise. 


I don't understand. 

Contributor
Posts: 24

Re: Use retain to aggregate strings across observations

@Reeza I amended your code a little bit, but it still didn't add up at the last record.

 

Actually, what doesn't " " mean in your  code value=catx(" ", value, x)? I might do something wrong to this part.....

 

Thank you.

Contributor
Posts: 24

Re: Use retain to aggregate strings across observations

@Reeza your code worked! I did something wrong when amending.

 

thank you a lot!!

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 235 views
  • 1 like
  • 4 in conversation