BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
lydiawawa
Lapis Lazuli | Level 10

Hi,

I have a dataset that I need to transpose by a specific design that involves substring.

The original dataset:

IDNT1NT2NT3NT4
1NOTES 12:13:44 03-16-2018  CODE: ABCAMLNOTES 09:13:11 03-12-2018  CODE: OPITEST
2NOTES 04:25:09 01-04-2018 CODE: FDSIMDNOTES 03:25:10  01-09-2018 CODE: FGHTEST
3NOTES 12:22:49 11-12-2018 CODE: DGHTESTNOTES 08:02:49 11-11-2018 CODE: LKOAML
4NOTES 22:02:21 01-14-2018 CODE: MKLTESTNOTES 07:02:21 01-10-2018 CODE: LOPIMD
5NOTES 09:01:36 01-23-2018 CODE:  HJKTESTNOTES 09:01:56 01-23-2018 CODE: UIYTEST

 

Transpose by ID to split out time , date, code and notes:

IDtimedatecodenotes
1 12:13:44  03-16-2018ABCAML
1 09:13:11  03-12-2018OPITEST
2 04:25:09  01-04-2018FDSIMD
2 03:25:10  01-09-2018FGHTEST
3 12:22:49  11-12-2018DGHTEST
3 08:02:49  11-11-2018LKOAML
4 22:02:21  01-14-2018MKLTEST
4 07:02:21  01-10-2018LOPIMD
5 09:01:36  01-23-2018HJKTEST
5 09:01:56  01-23-2018UIYTEST

 

After transposing, ID will have multiple levels based on NT variables.

The code I have so far does not yield the desirable output:

data note11;
 length note0 $50;
 set note10;

  array t{*} nt:;;
    do _i = 1 to dim(t);
/*make sure notes are not blank*/
      if not missing(t[_i]) then note0 = catx('/',note0,vname(t(_i)));
/*timestamp, date, code*/
      if find(upcase(t[_i]),"NOTES") then do;
	 timestamp=substr(left(nt0), index(left(nt0), 'NOTES')+5, 9);                                                                                                              
         date=substr(left(nt0), 15, 10);  
         code = substr(left(nt0), index(left(nt0), 'CODE:')+5); 	
	  end;

    end;
   drop _i;
run;

Thank you for any help!

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

@lydiawawa wrote:

In reality, there are about 75 nt variables, the 'notes' always have an even suffix , such as nt2, nt4, nt6....I cannot just call them independently 


"about 75"?   Are you saying that you don't have complete pairs (one with the parseable text, the other with the NOTES result)?

 

If you do have complete pairs, and there are many of them, then a loop is the way to go.

 

 

Edtted addition:

 

Let's say you have 36 complete pairs, saved in variables NT1-NT72.  Then the loop approach (untested) could be:

 

data want (drop=nt: row);
  set have;
  length code $4;
  array ntvars {36,2} $ nt1-nt72;

  do row=1 to 36;
    code=scan(ntvars{row,1},-1,' ');
    time= input(scan(ntvars{row,1},2,' '),time8.0);
    format time time8.0;
    date= input(scan(ntvars{row,1},3,' '),mmddyy10.);
    format date mmddyy10. ;

    length notes $4;
    notes=ntvars{row,2};
    output;
  end;
run;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

4 REPLIES 4
mkeintz
PROC Star

You can "read" each original observation twice.  The first time keeping id nt1 and nt2, the second time keeping id nt3 and nt4.

Rename nt2 and nt4 to NOTES is simple.  Rename nt1 and nt3 to note_text, parse that text for TIME, DATE, and CODE.

 

Untested, in the absence of a working sample data step:

 

data want (drop=_note_text);
  set have (keep=id nt1 nt2  rename=(nt1=_note_text nt2=notes))
      have (keep=id nt3 nt4  rename=(nt3=_note_text nt4=notes));
  by id;
  length code $4;
  code=scan(_note_text,-1,' ');

  time= input(scan(_note_text,2,' '),time8.0);
  format time time8.0;

  date= input(scan(_note_text,3,' '),mmddyy10.);
  format date mmddyy10. ;
run;

This program assumes

  1. time is always the second "word" in _NOTE_TEXT, 
  2. date is always the third
  3. code is always the last

 

The set statement has two references to HAVE, one for NT1 and NT2, the other for NT3 and NT4.  This eliminates the need to create a loop to first process NT1 and NT2, then NT3 and NT4.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ballardw
Super User

If this were my data, assuming this is as read, I would go back to the step where I read the data and modify the input statement and basically read the two records from each line.

lydiawawa
Lapis Lazuli | Level 10

In reality, there are about 75 nt variables, the 'notes' always have an even suffix , such as nt2, nt4, nt6....I cannot just call them independently 

mkeintz
PROC Star

@lydiawawa wrote:

In reality, there are about 75 nt variables, the 'notes' always have an even suffix , such as nt2, nt4, nt6....I cannot just call them independently 


"about 75"?   Are you saying that you don't have complete pairs (one with the parseable text, the other with the NOTES result)?

 

If you do have complete pairs, and there are many of them, then a loop is the way to go.

 

 

Edtted addition:

 

Let's say you have 36 complete pairs, saved in variables NT1-NT72.  Then the loop approach (untested) could be:

 

data want (drop=nt: row);
  set have;
  length code $4;
  array ntvars {36,2} $ nt1-nt72;

  do row=1 to 36;
    code=scan(ntvars{row,1},-1,' ');
    time= input(scan(ntvars{row,1},2,' '),time8.0);
    format time time8.0;
    date= input(scan(ntvars{row,1},3,' '),mmddyy10.);
    format date mmddyy10. ;

    length notes $4;
    notes=ntvars{row,2};
    output;
  end;
run;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1128 views
  • 1 like
  • 3 in conversation