BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SGrisham
Fluorite | Level 6

We have a very old system that tracks notes entered by several different systems. These notes are in one way or another surrounded by **'s. All of the notes are in a single column and string.

 

EXAMPLE:

|  ID_NUMBER  |  NOTE 

|   12345678      |  **system message**    **system update**     **mm/dd/yy - blablablablabla**

|   12345679      |  **system message**     **mm/dd/yy - blablabla **   **mm/dd/yy - wahwahwahwah**

 

Each of these NOTES have a UNIQUE 8 digit number.

 

I am trying to find a DATA STEP that will create a new table with EACH of the notes seperated into their own row.

 

something like:

 

|  ID_NUMBER   |   NOTE

|   12345678       |  system message

|   12345678       |  system update

|   12345678       |  mm/dd/yyyy - blablablablabla

|   12345679       |  system message

|   12345679       | mm/dd/yyyy - blablablabla

|   12345679       |  mm/dd/yyyy -  wahwahwahwah

 

 

Thanks,

 

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

Without seeing an example of the actual file we can only guess at possible solutions.

 

If your starting file looks like the one below, then the code I suggest below might work:

libname library '/folders/myfolders';
data library.sg_notes;
  informat notes $100.;
  input case_no notes &;
  notes=dequote(notes);
  cards;
12345678    "**system message** **system update** **mm/dd/yy - blablablablabla**"
12345679    "**system message** **mm/dd/yy - blablabla ** **mm/dd/yy - wahwahwahwah**"
;

data LIBRARY.SG_NOTES_TEST (KEEP=CASE_NO NOTE);
  SET LIBRARY.SG_NOTES;
  NOTES=transtrn(NOTES,'**','"');
  do i=0 to 7;
    note=scan(notes,(1+(i*2)),'"');
    if not missing(note) then output;
  end;
run;

Art, CEO, AnalystFinder.com

 

View solution in original post

8 REPLIES 8
art297
Opal | Level 21

Here is one way:

data want (keep=id_number note);
  infile cards firstobs=2;
  informat note note1-note3 $40.;
  array notes(*) $ note1-note3;
  input @;
  _infile_=compress(transtrn(_infile_,'**','"'),'|');
  input id_number (note1-note3) (&);
  do i=1 to 3;
    note=dequote(notes(i));
    output;
  end;
  cards;
|  ID_NUMBER  |  NOTE 
|   12345678      |  **system message**    **system update**     **mm/dd/yy - blablablablabla**
|   12345679      |  **system message**     **mm/dd/yy - blablabla **   **mm/dd/yy - wahwahwahwah**
;

Art, CEO, AnalystFinder.com

 

WoolieKyat
SAS Employee

Here is a next way

data have;

ID_NUMBER= 12345678;

NOTE ='**system message** **system update** **mm/dd/yy - blablablablabla**';

output;

ID_NUMBER= 12345679;

NOTE ='**system message** **mm/dd/yy - blablabla ** **mm/dd/yy - wahwahwahwah';

output;

run;

 

data want;

set have;

 

ExpressionID = prxparse('/[A-Z0-9._%+\/]+(\s+|\s+\-\s+)[A-Z0-9._%+\/]+/i');

start = 1;

stop = length(NOTE);

/* Use PRXNEXT to find the first instance of the pattern, */

/* then use DO WHILE to find all further instances. */

/* PRXNEXT changes the start parameter so that searching */

/* begins again after the last match. */

call prxnext(ExpressionID, start, stop, NOTE, position, length);

do while (position > 0);

found = substr(NOTE, position, length);

put found= position= length=;

call prxnext(ExpressionID, start, stop, NOTE, position, length);

output;

end;

drop ExpressionID start stop position length;

run;

Astounding
PROC Star

If we can assume that this version of the HAVE data set exists, here's another way to break out the NOTE values:

 

data want;

set have;

all_notes = note;

drop all_notes;

do _n_=1 to 99 by 2 until (note=' ');

   note = scan(all_notes, _n_, '*');

   if note > ' ' then output;

end;

run;

 

It does run into trouble if one of the notes actually contains an asterisk.

SGrisham
Fluorite | Level 6

@art297 Sorry, I'm still learning SAS data steps.

This solution appears to do exactly what I need. Instead of using the "cards;" as my input, how would I go about calling my DATA that I have in a SAS Library to this solution? I attempted to do a SET library.table; and replace the relevant field names, but I get an error in regards to INFILE.

art297
Opal | Level 21

You have to declare the location where the SAS dataset resides. You can do that with a libname statement.

 

Can't answer your other questions until you let us know what that file looks like. i.e., how many variables does the file have and what are the variable names?

 

Art, CEO, AnalystFinder.com

SGrisham
Fluorite | Level 6

Sure, it contains a column with a unique Case Number, and each call number has a notes column which contains system notes surrounded by two asterisks on each side. Each call number is unique, and each Note string can contain as many as 8 different notes within it.

 

There are around 500000 rows, but only about 10,000 new records will need to be processed at a time after the initial run.

 

So 1 table, two columns (case_number, note).

Case_number is a unique number, and a Note is several hundred characters with two astericks on either side of a string containing a individual note. There could be as many as 8 notes in a row, so I expanded your note1-note3 logic to 8. Here's what I have right now, but it is obviously throwing errors.

 

data LIBRARY.SG_NOTES_TEST (KEEP=CASE_NO NOTE);
SET LIBRARY.SG_NOTES;
  informat NOTE note1-note8 $225.;
  array notes(*) $ note1-note8;
  NOTE=compress(transtrn(NOTE,'**','"'));
  input CASE_NO (note1-note8) (&);
  do i=1 to 8;
    NOTE=dequote(notes(i));
    output;
  end;

Thanks again!

 

art297
Opal | Level 21

Without seeing an example of the actual file we can only guess at possible solutions.

 

If your starting file looks like the one below, then the code I suggest below might work:

libname library '/folders/myfolders';
data library.sg_notes;
  informat notes $100.;
  input case_no notes &;
  notes=dequote(notes);
  cards;
12345678    "**system message** **system update** **mm/dd/yy - blablablablabla**"
12345679    "**system message** **mm/dd/yy - blablabla ** **mm/dd/yy - wahwahwahwah**"
;

data LIBRARY.SG_NOTES_TEST (KEEP=CASE_NO NOTE);
  SET LIBRARY.SG_NOTES;
  NOTES=transtrn(NOTES,'**','"');
  do i=0 to 7;
    note=scan(notes,(1+(i*2)),'"');
    if not missing(note) then output;
  end;
run;

Art, CEO, AnalystFinder.com

 

SGrisham
Fluorite | Level 6
Great, this appears to do exactly what I want, and VERY fast! thank you!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1220 views
  • 8 likes
  • 4 in conversation