BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
DocMartin
Quartz | Level 8

A friend of mine asked me a question that I could not answer. She's collecting hospital data in a Notepad file (argh!!!!) and wants to read it into a SAS data set.

 

I've attached records from two hospitals (data were made up just to show what's needed): Cons.txt. Below is how her data set of 13 observations should look like.

 

Site       ID   Code                fts   Heal pc
BTP04 1     xys                    H9   322   11
BTP04 2     v3bwwwr15   m2   95      5
BTP04 3     qd23               H4   228    3
BTP04 4    FRt5                m3    142    3
BTP04 5    p002Ft           H4    361    9
BTP04 6    NNNz13          d1      48    1
RRR02 1    qus                    H7  204   11
RRR02 2   w752                m3    95     5
RRR02 3   rsss                   PT  228     3
RRR02 4  4562                 m3   142     3
RRR02 5  4862                 H4   361     9
RRR02 6  bVf6RW          d1    122      1
RRR02 7  YyPpk8            M3  100   12

 

Note that the variable "Site" is captured from the first line of data for each hospital.

 

Thank you very much!

 

Andrew

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

So are you saying you want help reading data that looks like the lines in your attached file?

 

BTP04  Patients      Mags 1=23 2=15 3=9 4=6 5=4
___________________________________________________________________________________
ID   Code              fts     |LOS|BH  |pcE |pcEP|pcLP|pcSR|Spd  |X| CONSm|Heal|pc|
  1  xys                H9|    |   |    |    |    |    |   ||     | |      | 322|11|
  2  v3bwwwr15          m2|    |   |    |  3 |  3 |  3 |   ||  3  | |     3|  95| 5|
  3  qd23               H4|  3 | 2 |  3 |    |  1 |  1 | 3 || 21  | | 2 1 3| 228| 3|
  4  FRt5               m3|  2 | 3 |  2 |  1 |    |    | 1 || 23  | | 2 2 1| 142| 3|
  5  p002Ft             H4|    |   |    |    |    |    |   ||     | |      | 361| 9|
  6  NNNz13             d1|  1 | 1 |  1 |  2 |  2 |  2 | 2 || 37  | | 3 1  |  48| 1|

RRR02  Patients      Mags 1=23 2=15 3=9 4=6 5=4
___________________________________________________________________________________
ID   Code              fts     |LOS|BH  |pcE |pcEP|pcLP|pcSR|Spd||X| CONSm|Heal|pc|
  1  qus                H7|    |   |    |    |    |    |   ||     ||      | 204|11|
  2  w752               m3|    |   |    |  3 |  3 |  3 |   ||  3  ||     3|  95| 5|
  3  rsss               PT|  3 | 2 |  3 |    |  1 |  1 | 3 || 21  || 2 1 3| 228| 3|
  4  4562               m3|  2 | 3 |  2 |  1 |    |    | 1 || 23  || 2 2 1| 142| 3|
  5  4862               H4|    |   |    |    |    |    |   ||     ||      | 361| 9|
  6  bVf6RW             d1|  1 | 1 |  1 |  2 |  2 |  2 | 2 || 37  || 3 4  | 122| 1|
  7  YyPpk8             M3|  2 | 3 |  2 |  1 |  1 |  2 | 3 || 922 || 2 2 3| 100|12|

That looks like a pretty standard report file.  I am not sure why there seem to be more columns of data than names in the header row, but hopefully you know what that means.

 

 

 

To figure out what kind of line you are on so you know how to read the line take advantage of the trailing @ on the INPUT statement.

 

To capture the information in the "by" line between the blocks

 

BTP04  Patients      Mags 1=23 2=15 3=9 4=6 5=4

 

take advantage of the RETAIN statement.

 

Something like this should work:

 

data want;
  infile text dlm=' |' truncover;
  length site $8 id 8;
  retain site;
  input id ?? @;
  if _infile_ = ' ' or _infile_ in: ('_' 'ID') then delete;
  if missing(id) then do;
    input @1 site ;
    delete;
  end;
  input code $ fts $;
  heal = input(scan(_infile_,-2,'|'),??32.);
  pc = input(scan(_infile_,-1,'|'),??32.);
run;

Results:

 Obs    site     id    code        fts    heal    pc

   1    BTP04     1    xys         H9      322    11
   2    BTP04     2    v3bwwwr1    m2       95     5
   3    BTP04     3    qd23        H4      228     3
   4    BTP04     4    FRt5        m3      142     3
   5    BTP04     5    p002Ft      H4      361     9
   6    BTP04     6    NNNz13      d1       48     1
   7    RRR02     1    qus         H7      204    11
   8    RRR02     2    w752        m3       95     5
   9    RRR02     3    rsss        PT      228     3
  10    RRR02     4    4562        m3      142     3
  11    RRR02     5    4862        H4      361     9
  12    RRR02     6    bVf6RW      d1      122     1
  13    RRR02     7    YyPpk8      M3      100    12

 

View solution in original post

5 REPLIES 5
Tom
Super User Tom
Super User

So are you saying you want help reading data that looks like the lines in your attached file?

 

BTP04  Patients      Mags 1=23 2=15 3=9 4=6 5=4
___________________________________________________________________________________
ID   Code              fts     |LOS|BH  |pcE |pcEP|pcLP|pcSR|Spd  |X| CONSm|Heal|pc|
  1  xys                H9|    |   |    |    |    |    |   ||     | |      | 322|11|
  2  v3bwwwr15          m2|    |   |    |  3 |  3 |  3 |   ||  3  | |     3|  95| 5|
  3  qd23               H4|  3 | 2 |  3 |    |  1 |  1 | 3 || 21  | | 2 1 3| 228| 3|
  4  FRt5               m3|  2 | 3 |  2 |  1 |    |    | 1 || 23  | | 2 2 1| 142| 3|
  5  p002Ft             H4|    |   |    |    |    |    |   ||     | |      | 361| 9|
  6  NNNz13             d1|  1 | 1 |  1 |  2 |  2 |  2 | 2 || 37  | | 3 1  |  48| 1|

RRR02  Patients      Mags 1=23 2=15 3=9 4=6 5=4
___________________________________________________________________________________
ID   Code              fts     |LOS|BH  |pcE |pcEP|pcLP|pcSR|Spd||X| CONSm|Heal|pc|
  1  qus                H7|    |   |    |    |    |    |   ||     ||      | 204|11|
  2  w752               m3|    |   |    |  3 |  3 |  3 |   ||  3  ||     3|  95| 5|
  3  rsss               PT|  3 | 2 |  3 |    |  1 |  1 | 3 || 21  || 2 1 3| 228| 3|
  4  4562               m3|  2 | 3 |  2 |  1 |    |    | 1 || 23  || 2 2 1| 142| 3|
  5  4862               H4|    |   |    |    |    |    |   ||     ||      | 361| 9|
  6  bVf6RW             d1|  1 | 1 |  1 |  2 |  2 |  2 | 2 || 37  || 3 4  | 122| 1|
  7  YyPpk8             M3|  2 | 3 |  2 |  1 |  1 |  2 | 3 || 922 || 2 2 3| 100|12|

That looks like a pretty standard report file.  I am not sure why there seem to be more columns of data than names in the header row, but hopefully you know what that means.

 

 

 

To figure out what kind of line you are on so you know how to read the line take advantage of the trailing @ on the INPUT statement.

 

To capture the information in the "by" line between the blocks

 

BTP04  Patients      Mags 1=23 2=15 3=9 4=6 5=4

 

take advantage of the RETAIN statement.

 

Something like this should work:

 

data want;
  infile text dlm=' |' truncover;
  length site $8 id 8;
  retain site;
  input id ?? @;
  if _infile_ = ' ' or _infile_ in: ('_' 'ID') then delete;
  if missing(id) then do;
    input @1 site ;
    delete;
  end;
  input code $ fts $;
  heal = input(scan(_infile_,-2,'|'),??32.);
  pc = input(scan(_infile_,-1,'|'),??32.);
run;

Results:

 Obs    site     id    code        fts    heal    pc

   1    BTP04     1    xys         H9      322    11
   2    BTP04     2    v3bwwwr1    m2       95     5
   3    BTP04     3    qd23        H4      228     3
   4    BTP04     4    FRt5        m3      142     3
   5    BTP04     5    p002Ft      H4      361     9
   6    BTP04     6    NNNz13      d1       48     1
   7    RRR02     1    qus         H7      204    11
   8    RRR02     2    w752        m3       95     5
   9    RRR02     3    rsss        PT      228     3
  10    RRR02     4    4562        m3      142     3
  11    RRR02     5    4862        H4      361     9
  12    RRR02     6    bVf6RW      d1      122     1
  13    RRR02     7    YyPpk8      M3      100    12

 

DocMartin
Quartz | Level 8

I spoke with my colleague and she said that your code worked like a charm. Thank you,

Ksharp
Super User

Here is what I got.

 

data want;
infile "C:\Users\xiakeshan\Documents\Downloads\Consx.txt" encoding='utf8' termstr=crlf expandtabs;
input;
length site id code fts heal pc$ 80;
retain site;
if find(_infile_,'patients','it') then site=scan(_infile_,1,' ');
if prxmatch('/^\s*\d+/',_infile_) then do;
  id=scan(_infile_,1,' ');
  code=scan(_infile_,2,' ');
  fts=scan(_infile_,3,' |');
  heal=scan(_infile_,-2,'|');
  pc=scan(_infile_,-1,'|');
  output;
end;
run;
LinusH
Tourmaline | Level 20
If this is collected by one person, maybe it's easier to direct her to store the data in a more readable format.
If this data is entered manually it's error prone.
Data never sleeps
data_null__
Jade | Level 19

I know I'm late to the party, but I always like show example of NAMED-INPUT and PARMCARDS

 

options validvarname=any;
filename FT15F001 temp;
data test;
   retain;
   length dlm $1;
   infile FT15F001 dlm=dlm stopover;
   input @;
   if _infile_ eq ' ' or _infile_ eq: '_' then delete;
   if find(_infile_,'Patients') then do;
      dlm = ' ';
      input site:$5. @'Mags' ('1'n-'5'n)(=);
      delete;
      end;
   else if _infile_ eq: ' ' then do;
      _infile_ = tranwrd(_infile_,'||','| |');
      dlm=' ';
      input id code:$12. @;
      dlm='|';
      input fts:$2. (dum1 LOS BH  pcE pcEP pcLP pcSR dum2 spd X)(:) CONSm:$8. (Heal pc)(:);
      end;
   else delete;
   rename '1'n-'5'n=mag1-mag5;
   parmcards4;
BTP04  Patients      Mags 1=23 2=15 3=9 4=6 5=4
___________________________________________________________________________________
ID   Code              fts     |LOS|BH  |pcE |pcEP|pcLP|pcSR|Spd  |X| CONSm|Heal|pc|
  1  xys                H9|    |   |    |    |    |    |   ||     | |      | 322|11|
  2  v3bwwwr15          m2|    |   |    |  3 |  3 |  3 |   ||  3  | |     3|  95| 5|
  3  qd23               H4|  3 | 2 |  3 |    |  1 |  1 | 3 || 21  | | 2 1 3| 228| 3|
  4  FRt5               m3|  2 | 3 |  2 |  1 |    |    | 1 || 23  | | 2 2 1| 142| 3|
  5  p002Ft             H4|    |   |    |    |    |    |   ||     | |      | 361| 9|
  6  NNNz13             d1|  1 | 1 |  1 |  2 |  2 |  2 | 2 || 37  | | 3 1  |  48| 1|

RRR02  Patients      Mags 1=23 2=15 3=9 4=6 5=4
___________________________________________________________________________________
ID   Code              fts     |LOS|BH  |pcE |pcEP|pcLP|pcSR|Spd||X| CONSm|Heal|pc|
  1  qus                H7|    |   |    |    |    |    |   ||     ||      | 204|11|
  2  w752               m3|    |   |    |  3 |  3 |  3 |   ||  3  ||     3|  95| 5|
  3  rsss               PT|  3 | 2 |  3 |    |  1 |  1 | 3 || 21  || 2 1 3| 228| 3|
  4  4562               m3|  2 | 3 |  2 |  1 |    |    | 1 || 23  || 2 2 1| 142| 3|
  5  4862               H4|    |   |    |    |    |    |   ||     ||      | 361| 9|
  6  bVf6RW             d1|  1 | 1 |  1 |  2 |  2 |  2 | 2 || 37  || 3 4  | 122| 1|
  7  YyPpk8             M3|  2 | 3 |  2 |  1 |  1 |  2 | 3 || 922 || 2 2 3| 100|12|
;;;;
   run;

proc contents varnum;
proc print;
   run;
/*
Obs    site     mag1    mag2    mag3    mag4    mag5    id    code         fts    dum1    LOS    BH    pcE    pcEP    pcLP    pcSR    dum2    spd    X    CONSm    Heal    pc

  1    BTP04     23      15       9       6       4      1    xys          H9       .      .      .     .       .       .       .       .       .    .              322    11
  2    BTP04     23      15       9       6       4      2    v3bwwwr15    m2       .      .      .     3       3       3       .       .       3    .    3          95     5
  3    BTP04     23      15       9       6       4      3    qd23         H4       3      2      3     .       1       1       3       .      21    .    2 1 3     228     3
  4    BTP04     23      15       9       6       4      4    FRt5         m3       2      3      2     1       .       .       1       .      23    .    2 2 1     142     3
  5    BTP04     23      15       9       6       4      5    p002Ft       H4       .      .      .     .       .       .       .       .       .    .              361     9
  6    BTP04     23      15       9       6       4      6    NNNz13       d1       1      1      1     2       2       2       2       .      37    .    3 1        48     1
  7    RRR02     23      15       9       6       4      1    qus          H7       .      .      .     .       .       .       .       .       .    .              204    11
  8    RRR02     23      15       9       6       4      2    w752         m3       .      .      .     3       3       3       .       .       3    .    3          95     5
  9    RRR02     23      15       9       6       4      3    rsss         PT       3      2      3     .       1       1       3       .      21    .    2 1 3     228     3
 10    RRR02     23      15       9       6       4      4    4562         m3       2      3      2     1       .       .       1       .      23    .    2 2 1     142     3
 11    RRR02     23      15       9       6       4      5    4862         H4       .      .      .     .       .       .       .       .       .    .              361     9
 12    RRR02     23      15       9       6       4      6    bVf6RW       d1       1      1      1     2       2       2       2       .      37    .    3 4       122     1
 13    RRR02     23      15       9       6       4      7    YyPpk8       M3       2      3      2     1       1       2       3       .     922    .    2 2 3     100    12
*/

Capture.PNG

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 827 views
  • 3 likes
  • 5 in conversation