Hi,
I have a dataset that I need to transpose by a specific design that involves substring.
The original dataset:
ID | NT1 | NT2 | NT3 | NT4 |
1 | NOTES 12:13:44 03-16-2018 CODE: ABC | AML | NOTES 09:13:11 03-12-2018 CODE: OPI | TEST |
2 | NOTES 04:25:09 01-04-2018 CODE: FDS | IMD | NOTES 03:25:10 01-09-2018 CODE: FGH | TEST |
3 | NOTES 12:22:49 11-12-2018 CODE: DGH | TEST | NOTES 08:02:49 11-11-2018 CODE: LKO | AML |
4 | NOTES 22:02:21 01-14-2018 CODE: MKL | TEST | NOTES 07:02:21 01-10-2018 CODE: LOP | IMD |
5 | NOTES 09:01:36 01-23-2018 CODE: HJK | TEST | NOTES 09:01:56 01-23-2018 CODE: UIY | TEST |
Transpose by ID to split out time , date, code and notes:
ID | time | date | code | notes |
1 | 12:13:44 | 03-16-2018 | ABC | AML |
1 | 09:13:11 | 03-12-2018 | OPI | TEST |
2 | 04:25:09 | 01-04-2018 | FDS | IMD |
2 | 03:25:10 | 01-09-2018 | FGH | TEST |
3 | 12:22:49 | 11-12-2018 | DGH | TEST |
3 | 08:02:49 | 11-11-2018 | LKO | AML |
4 | 22:02:21 | 01-14-2018 | MKL | TEST |
4 | 07:02:21 | 01-10-2018 | LOP | IMD |
5 | 09:01:36 | 01-23-2018 | HJK | TEST |
5 | 09:01:56 | 01-23-2018 | UIY | TEST |
After transposing, ID will have multiple levels based on NT variables.
The code I have so far does not yield the desirable output:
data note11;
length note0 $50;
set note10;
array t{*} nt:;;
do _i = 1 to dim(t);
/*make sure notes are not blank*/
if not missing(t[_i]) then note0 = catx('/',note0,vname(t(_i)));
/*timestamp, date, code*/
if find(upcase(t[_i]),"NOTES") then do;
timestamp=substr(left(nt0), index(left(nt0), 'NOTES')+5, 9);
date=substr(left(nt0), 15, 10);
code = substr(left(nt0), index(left(nt0), 'CODE:')+5);
end;
end;
drop _i;
run;
Thank you for any help!
@lydiawawa wrote:
In reality, there are about 75 nt variables, the 'notes' always have an even suffix , such as nt2, nt4, nt6....I cannot just call them independently
"about 75"? Are you saying that you don't have complete pairs (one with the parseable text, the other with the NOTES result)?
If you do have complete pairs, and there are many of them, then a loop is the way to go.
Edtted addition:
Let's say you have 36 complete pairs, saved in variables NT1-NT72. Then the loop approach (untested) could be:
data want (drop=nt: row);
set have;
length code $4;
array ntvars {36,2} $ nt1-nt72;
do row=1 to 36;
code=scan(ntvars{row,1},-1,' ');
time= input(scan(ntvars{row,1},2,' '),time8.0);
format time time8.0;
date= input(scan(ntvars{row,1},3,' '),mmddyy10.);
format date mmddyy10. ;
length notes $4;
notes=ntvars{row,2};
output;
end;
run;
You can "read" each original observation twice. The first time keeping id nt1 and nt2, the second time keeping id nt3 and nt4.
Rename nt2 and nt4 to NOTES is simple. Rename nt1 and nt3 to note_text, parse that text for TIME, DATE, and CODE.
Untested, in the absence of a working sample data step:
data want (drop=_note_text);
set have (keep=id nt1 nt2 rename=(nt1=_note_text nt2=notes))
have (keep=id nt3 nt4 rename=(nt3=_note_text nt4=notes));
by id;
length code $4;
code=scan(_note_text,-1,' ');
time= input(scan(_note_text,2,' '),time8.0);
format time time8.0;
date= input(scan(_note_text,3,' '),mmddyy10.);
format date mmddyy10. ;
run;
This program assumes
The set statement has two references to HAVE, one for NT1 and NT2, the other for NT3 and NT4. This eliminates the need to create a loop to first process NT1 and NT2, then NT3 and NT4.
If this were my data, assuming this is as read, I would go back to the step where I read the data and modify the input statement and basically read the two records from each line.
In reality, there are about 75 nt variables, the 'notes' always have an even suffix , such as nt2, nt4, nt6....I cannot just call them independently
@lydiawawa wrote:
In reality, there are about 75 nt variables, the 'notes' always have an even suffix , such as nt2, nt4, nt6....I cannot just call them independently
"about 75"? Are you saying that you don't have complete pairs (one with the parseable text, the other with the NOTES result)?
If you do have complete pairs, and there are many of them, then a loop is the way to go.
Edtted addition:
Let's say you have 36 complete pairs, saved in variables NT1-NT72. Then the loop approach (untested) could be:
data want (drop=nt: row);
set have;
length code $4;
array ntvars {36,2} $ nt1-nt72;
do row=1 to 36;
code=scan(ntvars{row,1},-1,' ');
time= input(scan(ntvars{row,1},2,' '),time8.0);
format time time8.0;
date= input(scan(ntvars{row,1},3,' '),mmddyy10.);
format date mmddyy10. ;
length notes $4;
notes=ntvars{row,2};
output;
end;
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.