BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Zachary
Obsidian | Level 7

I am very close to a final solution in trying to dichotomize my words and phrases based on a listing. I am also supplying screen shots to help identify what is happening.

Below are some assorted comments and their claim numbers:

Comments.JPG

Below are the words and phrases that I am searching for:

Text4Matches.JPG

I made a numeric column associated with each of these words and phrases. The goal is to eventually get a bunch of new dichotomous variables (Boolean indicators) if the word or phrase appears in a particular comment. Originally I wanted to do it so that the actual word or phrase is appended to the end of these new "flag" variables, but that was plagued by problems. Below is sort of a shot in how the final data will look:

FinalOutputShouldLookLike.JPG

The last columnn is from my NumberForLabels column.

And the following are some of the errors I get:

Errors.JPG

My code appears below. Perhaps my proc sql routine is not making the data unique in some way? I am sure there is a simple solution to this- any and all help is highly valued and appreciated.

proc sql;
  create table scoresL as
  select CLAIMNO, COMMENTTEXT, Text4Matching, NumberForLabels
    from
    SRS_Comments500 inner join
    TextFromExcel on indexw(COMMENTTEXT, Text4Matching)>0
  order by CLAIMNO, COMMENTTEXT, Text4Matching;
quit;
proc print data=scoresL noobs; run;

proc transpose data=scoresL out=FlaggedCommentsT(drop=_:) prefix=flag_;
  by CLAIMNO;
  id NumberForLabels;
run;
proc print data=FlaggedCommentsT noobs; run;

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

It looks like you need to go back to the logic involved in assigning NumberForLabels if you are getting multiple values within CLAIMNO. Possibly the source data had the CLAIMNO split somehow so that your approach process pieces separately resulting in duplicate values for NumberForLabels with different meanings.

A crude investigative technique would be to take those values with errors and subset the data on them and sort by NumberForLabels and Claimno and see if you have multiple text values associated.

I suspect that a fix might involve and earlier SORT by CLAIMNO.

View solution in original post

2 REPLIES 2
ballardw
Super User

It looks like you need to go back to the logic involved in assigning NumberForLabels if you are getting multiple values within CLAIMNO. Possibly the source data had the CLAIMNO split somehow so that your approach process pieces separately resulting in duplicate values for NumberForLabels with different meanings.

A crude investigative technique would be to take those values with errors and subset the data on them and sort by NumberForLabels and Claimno and see if you have multiple text values associated.

I suspect that a fix might involve and earlier SORT by CLAIMNO.

Zachary
Obsidian | Level 7

Thank you very much. It was an issue where I had text for multiple comments per single claim. This created the situation of multiple ids per by variable.

Once I stripped the text comments from the file and removed the redundant codes all is working out. Thank you again.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 20098 views
  • 0 likes
  • 2 in conversation