BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
may0423
Obsidian | Level 7

Hi all,

 

I am learning how to use the retain statement. I got confused when I tried to aggregate strings across observations by marking <span class="usertitle"> = 1 and </span> =0. Here is the record and the intended results (User_Name).

 

Record                                                 User_Name

</ul>                                                               

</div>. 

<span class="usertitle">

ABC                                                       ABCDEF

DEF

</span>            

<span class="usertitle">

<span style="...></span>

</span>

 

Please help. Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

_n_=1 is the wrong check - that checks only the first record. 

 

This is a bit hokey, but it gets the job done. I'm not sure what your original source is, but there's usually ways to parse some of this out ahead of time...sometimes.

 

data have;
	x='<span class="usertitle">';
	output;
	x="Junior";
	output;
	x="Member";
	output;
	x="</span>";
	output;
run;

data want;
	set have;
	retain flag value;
	length value $100.;

	if find(x, 'usertitle')>0 then
		flag=1;

	if flag=1 and x="</span>" then
		do;
			flag=0;
			call missing(value);
		end;

	if flag=1 and substr(x, 1, 1) ne "<" then
		value=catx(" ", value, x);
run;

proc print data=want;
run;

View solution in original post

12 REPLIES 12
Reeza
Super User

Post what you've tried. 

Patrick
Opal | Level 21

@may0423

Sounds like a very similar problem to the discussion here:

https://communities.sas.com/t5/Base-SAS-Programming/Extract-rows-from-website/m-p/321317/highlight/t...

 

See if you can amend the code for your data.

 

may0423
Obsidian | Level 7

@Patrick yes. Inspired by your code, I kept practicing and changing, and now codes are getting weirder......

 

8 Data test;
9 Set work. Posts;
10 Retain user_title 0;
11 If find(record,'usertitle">') then usertitle=1;
12  else if find(record,'</span>') then usertitle=0;

 

Here are errors that I could not get rid of:

ERROR 386-185: Expecting an arithmetic expression.

ERROR 200-322: The symbol is not recognized and will be ignored.

ERROR 76-322: Syntax error, statement will be ignored."

 

Actually why sometimes there is a 0 in the retain statement and sometimes not? Those errors pointed to the ' in find statement.....

Reeza
Super User

@may0423 It's helpful if you post the exact log. 

ballardw
Super User

Two things that will help diagnose issues like this: First post the entire code and messages from the log. Second paste the copied code into a code box opened with the {i} icon on the menu above.

 

The error messages you show will almost certainly appear with an underscore character highlighting the location the compiler has an issue with. Posting into the code box will preserve formatting to show where that character is.

 

You may also want to see if this spelling is actually as desired:

 

10 Retain user_title 0;
11 If find(record,'usertitle">') then usertitle=1;
12  else if find(record,'</span>') then usertitle=0;

may0423
Obsidian | Level 7

Thank you everybody.

 

@ballardw. The Errors were gone. The questions are:

 

(1) the <span class="usertitle"> is not wanted, how could I get rid of it when the information in tmptitle is transfered to user_title?

(2) what if the information wanted across two observations, e.g. retain junior (next row) member?

(3) how to flag NA in photo 2 situation? the information between  <span class="usertitle"> and </span> was not even flagged as 1.

 

@Patrick mentioned Perl Regular Expressions that is very worthwhile learning, but it seems too advanced for me at this point. @Reeza I pasted the unfinished code after the photos. 

 

Thank you all again!

 

1.jpg

 

2.jpg

 

 

Data test;
Set work. test0;
Retain user_title tmp;
if find (record, 'usertitle">') then tmp=1;
if find (record, '</span>') then tmp=0;
if tmp=1 then do;
tmptitle=record;
end;
if _n_=1 then user_title=tmptitle;
else user_title=user_title||tmptitle;
Run;

may0423
Obsidian | Level 7

Question 1 & 3 were solved (see code below). The code works well for information in only one observation. If information is across two observations, I still don't know how to retain them into one observation with basic statements, and the records afterwards could not be correctly matched in the output anymore.....(I'm thinking about using || to connect them, but no idea yet.....)

 

e.g. Junior Member is wanted in one cell under usertitle column and next to Member like the photo shown:

 

1.jpg

 

 

Code of question 1&3:

Data test;
Set work. test0;

retain titlecheck tmpusertitle;
if find(record, 'usertitle">') then titlecheck=1;
if find(record, '</span>') then titlecheck=0;
if titlecheck=1 then user_title=record;

 

if _n_=1 then do;

...... (omit some steps of other variables here);

tmpusertitle=user_title;

end;

...... (omit some steps of other variables here);

if user_title ne "" then tmpusertitle=user_title;

if post_id ne "" then do;

user_title=tmpusertitle;

end;

 

Reeza
Super User

_n_=1 is the wrong check - that checks only the first record. 

 

This is a bit hokey, but it gets the job done. I'm not sure what your original source is, but there's usually ways to parse some of this out ahead of time...sometimes.

 

data have;
	x='<span class="usertitle">';
	output;
	x="Junior";
	output;
	x="Member";
	output;
	x="</span>";
	output;
run;

data want;
	set have;
	retain flag value;
	length value $100.;

	if find(x, 'usertitle')>0 then
		flag=1;

	if flag=1 and x="</span>" then
		do;
			flag=0;
			call missing(value);
		end;

	if flag=1 and substr(x, 1, 1) ne "<" then
		value=catx(" ", value, x);
run;

proc print data=want;
run;
may0423
Obsidian | Level 7

Hi @Reeza,

 

Thank you for the code. My original source is from a website (actually it could be from any website as it is just for practice). I saved the records in a dataset which have 8k records, then started statements. As for the Junior Member thing, it could sometimes be Senior Member or something else member. I am not sure if the x= in your code would work as I could not go through 8k records to specified the type of members. Please advise. 

Reeza
Super User

@may0423 wrote:

Hi @Reeza,

 

Thank you for the code. My original source is from a website (actually it could be from any website as it is just for practice). I saved the records in a dataset which have 8k records, then started statements. As for the Junior Member thing, it could sometimes be Senior Member or something else member. I am not sure if the x= in your code would work as I could not go through 8k records to specified the type of members. Please advise. 


I don't understand. 

may0423
Obsidian | Level 7

@Reeza I amended your code a little bit, but it still didn't add up at the last record.

 

Actually, what doesn't " " mean in your  code value=catx(" ", value, x)? I might do something wrong to this part.....

 

Thank you.

may0423
Obsidian | Level 7

@Reeza your code worked! I did something wrong when amending.

 

thank you a lot!!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1467 views
  • 1 like
  • 4 in conversation