I am very close to a final solution in trying to dichotomize my words and phrases based on a listing. I am also supplying screen shots to help identify what is happening.
Below are some assorted comments and their claim numbers:
Below are the words and phrases that I am searching for:
I made a numeric column associated with each of these words and phrases. The goal is to eventually get a bunch of new dichotomous variables (Boolean indicators) if the word or phrase appears in a particular comment. Originally I wanted to do it so that the actual word or phrase is appended to the end of these new "flag" variables, but that was plagued by problems. Below is sort of a shot in how the final data will look:
The last columnn is from my NumberForLabels column.
And the following are some of the errors I get:
My code appears below. Perhaps my proc sql routine is not making the data unique in some way? I am sure there is a simple solution to this- any and all help is highly valued and appreciated.
proc sql;
create table scoresL as
select CLAIMNO, COMMENTTEXT, Text4Matching, NumberForLabels
from
SRS_Comments500 inner join
TextFromExcel on indexw(COMMENTTEXT, Text4Matching)>0
order by CLAIMNO, COMMENTTEXT, Text4Matching;
quit;
proc print data=scoresL noobs; run;
proc transpose data=scoresL out=FlaggedCommentsT(drop=_:) prefix=flag_;
by CLAIMNO;
id NumberForLabels;
run;
proc print data=FlaggedCommentsT noobs; run;
It looks like you need to go back to the logic involved in assigning NumberForLabels if you are getting multiple values within CLAIMNO. Possibly the source data had the CLAIMNO split somehow so that your approach process pieces separately resulting in duplicate values for NumberForLabels with different meanings.
A crude investigative technique would be to take those values with errors and subset the data on them and sort by NumberForLabels and Claimno and see if you have multiple text values associated.
I suspect that a fix might involve and earlier SORT by CLAIMNO.
It looks like you need to go back to the logic involved in assigning NumberForLabels if you are getting multiple values within CLAIMNO. Possibly the source data had the CLAIMNO split somehow so that your approach process pieces separately resulting in duplicate values for NumberForLabels with different meanings.
A crude investigative technique would be to take those values with errors and subset the data on them and sort by NumberForLabels and Claimno and see if you have multiple text values associated.
I suspect that a fix might involve and earlier SORT by CLAIMNO.
Thank you very much. It was an issue where I had text for multiple comments per single claim. This created the situation of multiple ids per by variable.
Once I stripped the text comments from the file and removed the redundant codes all is working out. Thank you again.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.