We have a very old system that tracks notes entered by several different systems. These notes are in one way or another surrounded by **'s. All of the notes are in a single column and string.
EXAMPLE:
| ID_NUMBER | NOTE
| 12345678 | **system message** **system update** **mm/dd/yy - blablablablabla**
| 12345679 | **system message** **mm/dd/yy - blablabla ** **mm/dd/yy - wahwahwahwah**
Each of these NOTES have a UNIQUE 8 digit number.
I am trying to find a DATA STEP that will create a new table with EACH of the notes seperated into their own row.
something like:
| ID_NUMBER | NOTE
| 12345678 | system message
| 12345678 | system update
| 12345678 | mm/dd/yyyy - blablablablabla
| 12345679 | system message
| 12345679 | mm/dd/yyyy - blablablabla
| 12345679 | mm/dd/yyyy - wahwahwahwah
Thanks,
Without seeing an example of the actual file we can only guess at possible solutions.
If your starting file looks like the one below, then the code I suggest below might work:
libname library '/folders/myfolders'; data library.sg_notes; informat notes $100.; input case_no notes &; notes=dequote(notes); cards; 12345678 "**system message** **system update** **mm/dd/yy - blablablablabla**" 12345679 "**system message** **mm/dd/yy - blablabla ** **mm/dd/yy - wahwahwahwah**" ; data LIBRARY.SG_NOTES_TEST (KEEP=CASE_NO NOTE); SET LIBRARY.SG_NOTES; NOTES=transtrn(NOTES,'**','"'); do i=0 to 7; note=scan(notes,(1+(i*2)),'"'); if not missing(note) then output; end; run;
Art, CEO, AnalystFinder.com
Here is one way:
data want (keep=id_number note); infile cards firstobs=2; informat note note1-note3 $40.; array notes(*) $ note1-note3; input @; _infile_=compress(transtrn(_infile_,'**','"'),'|'); input id_number (note1-note3) (&); do i=1 to 3; note=dequote(notes(i)); output; end; cards; | ID_NUMBER | NOTE | 12345678 | **system message** **system update** **mm/dd/yy - blablablablabla** | 12345679 | **system message** **mm/dd/yy - blablabla ** **mm/dd/yy - wahwahwahwah** ;
Art, CEO, AnalystFinder.com
Here is a next way
data have;
ID_NUMBER= 12345678;
NOTE ='**system message** **system update** **mm/dd/yy - blablablablabla**';
output;
ID_NUMBER= 12345679;
NOTE ='**system message** **mm/dd/yy - blablabla ** **mm/dd/yy - wahwahwahwah';
output;
run;
data want;
set have;
ExpressionID = prxparse('/[A-Z0-9._%+\/]+(\s+|\s+\-\s+)[A-Z0-9._%+\/]+/i');
start = 1;
stop = length(NOTE);
/* Use PRXNEXT to find the first instance of the pattern, */
/* then use DO WHILE to find all further instances. */
/* PRXNEXT changes the start parameter so that searching */
/* begins again after the last match. */
call prxnext(ExpressionID, start, stop, NOTE, position, length);
do while (position > 0);
found = substr(NOTE, position, length);
put found= position= length=;
call prxnext(ExpressionID, start, stop, NOTE, position, length);
output;
end;
drop ExpressionID start stop position length;
run;
If we can assume that this version of the HAVE data set exists, here's another way to break out the NOTE values:
data want;
set have;
all_notes = note;
drop all_notes;
do _n_=1 to 99 by 2 until (note=' ');
note = scan(all_notes, _n_, '*');
if note > ' ' then output;
end;
run;
It does run into trouble if one of the notes actually contains an asterisk.
@art297 Sorry, I'm still learning SAS data steps.
This solution appears to do exactly what I need. Instead of using the "cards;" as my input, how would I go about calling my DATA that I have in a SAS Library to this solution? I attempted to do a SET library.table; and replace the relevant field names, but I get an error in regards to INFILE.
You have to declare the location where the SAS dataset resides. You can do that with a libname statement.
Can't answer your other questions until you let us know what that file looks like. i.e., how many variables does the file have and what are the variable names?
Art, CEO, AnalystFinder.com
Sure, it contains a column with a unique Case Number, and each call number has a notes column which contains system notes surrounded by two asterisks on each side. Each call number is unique, and each Note string can contain as many as 8 different notes within it.
There are around 500000 rows, but only about 10,000 new records will need to be processed at a time after the initial run.
So 1 table, two columns (case_number, note).
Case_number is a unique number, and a Note is several hundred characters with two astericks on either side of a string containing a individual note. There could be as many as 8 notes in a row, so I expanded your note1-note3 logic to 8. Here's what I have right now, but it is obviously throwing errors.
data LIBRARY.SG_NOTES_TEST (KEEP=CASE_NO NOTE); SET LIBRARY.SG_NOTES; informat NOTE note1-note8 $225.; array notes(*) $ note1-note8; NOTE=compress(transtrn(NOTE,'**','"')); input CASE_NO (note1-note8) (&); do i=1 to 8; NOTE=dequote(notes(i)); output; end;
Thanks again!
Without seeing an example of the actual file we can only guess at possible solutions.
If your starting file looks like the one below, then the code I suggest below might work:
libname library '/folders/myfolders'; data library.sg_notes; informat notes $100.; input case_no notes &; notes=dequote(notes); cards; 12345678 "**system message** **system update** **mm/dd/yy - blablablablabla**" 12345679 "**system message** **mm/dd/yy - blablabla ** **mm/dd/yy - wahwahwahwah**" ; data LIBRARY.SG_NOTES_TEST (KEEP=CASE_NO NOTE); SET LIBRARY.SG_NOTES; NOTES=transtrn(NOTES,'**','"'); do i=0 to 7; note=scan(notes,(1+(i*2)),'"'); if not missing(note) then output; end; run;
Art, CEO, AnalystFinder.com
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.