Hi SAS Friends,
Am working with a dataset where each field in an "Event Dates" column contains multiple dates in a string with no delimiter. Each date connects to an "EVENT" in an adjacent column and field. Like the dates, there are multiple events in each field and no obvious delimiter.
A sample SAS data file is attached.
The dates and events are in the correct order and sequence within their fields and when looking at them on a screen in EXCEL (the source file) they line up perfectly. I don't have the data in any other format, unfortunately.
In SAS, the dates become a single string, all with the same format (YYYY/MM/DD) .
Need to create one row per Drug_Name, Date, Event combination, that looks like this:
Drug_1 2014/02/20 Withdrawn from Market
Drug_1 2004/12/13 License Discontinued
Drug_1 2003/07/31 First Launch
Drug_1 2003/06/30 First Approval
Drug_1 2003/02/24 Orphan Drug Status Granted
Drug_1 2001/11/29 New Disease
Drug_1 2001/10/29 New Licensee
Drug_1 2000/10/01 New Licensing Opportunity
Drug_1 1999/06/30 First Filing
Drug_1 1999/01/15 New Licensee
Drug_1 1998/09/15 Nonproprietary Name Granted
Drug_1 1998/02/15 New Licensing Opportunity
Drug_1 1996/11/15 Global Status Advance
Drug_1 1995/07/15 Drug Added
Any and all suggestions are greatly appreciated.
Am using SAS ver. 9.4.
Thank you !
R.
I downloaded your SAS dataset and looked at it with SAS Universal Viewer. Both of your variables, EVENT_DATE and EVENT_TYPE, actually do have delimiters, but they aren't displayed when seen in the viewer. The delimiters are, in hexadecimal, 0D0A (control characters for carriage return/line feed).
You probably have a single Excel cell with multiple lines in it (one line per date). That is a self-defeating way of storing data. You can, however, use the 0D0A as boundaries:
data want (drop=_: event_date event_type);
set pp_obs;
length _string1 _string2 $500 ;
_string1=event_date;
_string2=left(event_type);
do until (_string1=' ');
date=input(substr(_string1,1,10),yymmdd10.);
format date yymmdd10.;
_string1=substr(_string1,13); * Shift left by 12 characters *;
length event $40;
event=scan(_string2,1,'0D'x); * Take everything preceding '0D0A'x *;
_string2=left(substr(_string2,indexc(_string2,'0A'x)+1)); *Shift left past first '0D0A'x *;
output;
end;
run;
This program assumes EVENT_DATE and EVENT_TYPE have the same number of 0D0A control character pairs.
Edited addition: The SAS universal viewer was used just to confirm the download was an actual SAS data set. It did not display the 0D0A. I ran a DATA step to show the underlying hex codes.
I downloaded your SAS dataset and looked at it with SAS Universal Viewer. Both of your variables, EVENT_DATE and EVENT_TYPE, actually do have delimiters, but they aren't displayed when seen in the viewer. The delimiters are, in hexadecimal, 0D0A (control characters for carriage return/line feed).
You probably have a single Excel cell with multiple lines in it (one line per date). That is a self-defeating way of storing data. You can, however, use the 0D0A as boundaries:
data want (drop=_: event_date event_type);
set pp_obs;
length _string1 _string2 $500 ;
_string1=event_date;
_string2=left(event_type);
do until (_string1=' ');
date=input(substr(_string1,1,10),yymmdd10.);
format date yymmdd10.;
_string1=substr(_string1,13); * Shift left by 12 characters *;
length event $40;
event=scan(_string2,1,'0D'x); * Take everything preceding '0D0A'x *;
_string2=left(substr(_string2,indexc(_string2,'0A'x)+1)); *Shift left past first '0D0A'x *;
output;
end;
run;
This program assumes EVENT_DATE and EVENT_TYPE have the same number of 0D0A control character pairs.
Edited addition: The SAS universal viewer was used just to confirm the download was an actual SAS data set. It did not display the 0D0A. I ran a DATA step to show the underlying hex codes.
This is excellent and works like a charm, thank you.
I did not know that SAS Universal Viewer can help identify ODOA characters.
I've learned a great deal from your approach,
Thanks again,
R.
@rmacarthur wrote:
This is excellent and works like a charm, thank you.
I did not know that SAS Universal Viewer can help identify ODOA characters.
I've learned a great deal from your approach,
Thanks again,
Sorry to mislead, but as far as I know SAS Universal Viewer does NOT help identify 0D0A characters. I just used it first to make sure the file was an actual SAS data set (I don't usually download SAS datasets from SAS Communities).
It looked real, so then I ran a SAS DATA step, and displayed each character in event_type and event_date in ascii characters, and and the corresponding hex code. That's where I saw the otherwise undisplayed carriage return/line feed characters.
Thanks very much for clarifying, this is very helpful too, and am looking up how to identify ASCI characters in a data step, now. They can certainly reek havoc!
Much Appreciated,
R.
One part of getting to Know Your Data (Maxim 3) is using the $HEX format for character variables to make everything contained in a string visible. It lets you see control characters like CR and LF, and you can identify single-byte characters vs. UTF.
Hi Kurt,
Thanks, I found an earlier comment from you where you recommend, "Use the RANK Function to get the ASCII code, and the BYTE Function to get the character from the code.". I'll add "learn how to use $HEX format", and will look into all three. BC the datasets am using at the moment are full of 'hidden characters' and they have thrown monkey wrenches into past analyses, resulting in much wasted time.
BTW am a big fan of the 'Maxims' !
Thanks !
R.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.