BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
rmacarthur
Pyrite | Level 9

Hi SAS Friends, 

Am working with a dataset where each field in an "Event Dates" column contains multiple dates in a string with no delimiter.  Each date connects to an "EVENT" in an adjacent column and field.  Like the dates, there are multiple events in each field and no obvious delimiter. 

A  sample SAS data file is attached. 

The dates and events are in the correct order and sequence within their fields and when looking at them on a screen in EXCEL (the source file) they line up perfectly.  I don't have the data in any other format, unfortunately. 

In SAS, the dates become a single string, all with the same format (YYYY/MM/DD) .  

Need to create one row per Drug_Name, Date, Event combination, that looks like this:

Drug_1 2014/02/20 Withdrawn from Market
Drug_1 2004/12/13 License Discontinued
Drug_1 2003/07/31 First Launch
Drug_1 2003/06/30 First Approval
Drug_1 2003/02/24 Orphan Drug Status Granted
Drug_1 2001/11/29 New Disease
Drug_1 2001/10/29 New Licensee
Drug_1 2000/10/01 New Licensing Opportunity
Drug_1 1999/06/30 First Filing
Drug_1 1999/01/15 New Licensee
Drug_1 1998/09/15 Nonproprietary Name Granted
Drug_1 1998/02/15 New Licensing Opportunity
Drug_1 1996/11/15 Global Status Advance
Drug_1 1995/07/15 Drug Added

 

Any and all suggestions are greatly appreciated.

Am using SAS ver. 9.4.

Thank you !

R. 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

I downloaded your SAS dataset and looked at it with SAS Universal Viewer.  Both of your variables, EVENT_DATE and EVENT_TYPE, actually do have delimiters, but they aren't displayed when seen in the viewer. The delimiters are, in hexadecimal, 0D0A (control characters for carriage return/line feed).

 

You probably have a single Excel cell with multiple lines in it (one line per date).  That is a self-defeating way of storing data.  You can, however, use the 0D0A as boundaries:  

 


data want (drop=_: event_date event_type);
  set pp_obs;
  length _string1 _string2 $500 ;
  _string1=event_date;
  _string2=left(event_type);

  do until (_string1=' ');
    date=input(substr(_string1,1,10),yymmdd10.);
    format date yymmdd10.;
    _string1=substr(_string1,13);    * Shift left by 12 characters *;
    length event $40;
    event=scan(_string2,1,'0D'x);    * Take everything preceding '0D0A'x *;
    _string2=left(substr(_string2,indexc(_string2,'0A'x)+1));  *Shift left past first '0D0A'x *;
    output;
  end;

run;

 

This program assumes EVENT_DATE and EVENT_TYPE have the same number of 0D0A control character pairs.

 

 

Edited addition:  The SAS universal viewer was used just to confirm the download was an actual SAS data set. It did not display the 0D0A.  I ran a DATA step to show the underlying hex codes.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

6 REPLIES 6
mkeintz
PROC Star

I downloaded your SAS dataset and looked at it with SAS Universal Viewer.  Both of your variables, EVENT_DATE and EVENT_TYPE, actually do have delimiters, but they aren't displayed when seen in the viewer. The delimiters are, in hexadecimal, 0D0A (control characters for carriage return/line feed).

 

You probably have a single Excel cell with multiple lines in it (one line per date).  That is a self-defeating way of storing data.  You can, however, use the 0D0A as boundaries:  

 


data want (drop=_: event_date event_type);
  set pp_obs;
  length _string1 _string2 $500 ;
  _string1=event_date;
  _string2=left(event_type);

  do until (_string1=' ');
    date=input(substr(_string1,1,10),yymmdd10.);
    format date yymmdd10.;
    _string1=substr(_string1,13);    * Shift left by 12 characters *;
    length event $40;
    event=scan(_string2,1,'0D'x);    * Take everything preceding '0D0A'x *;
    _string2=left(substr(_string2,indexc(_string2,'0A'x)+1));  *Shift left past first '0D0A'x *;
    output;
  end;

run;

 

This program assumes EVENT_DATE and EVENT_TYPE have the same number of 0D0A control character pairs.

 

 

Edited addition:  The SAS universal viewer was used just to confirm the download was an actual SAS data set. It did not display the 0D0A.  I ran a DATA step to show the underlying hex codes.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
rmacarthur
Pyrite | Level 9

This is excellent and works like a charm, thank you. 

I did not know that SAS Universal Viewer can help identify ODOA characters.

I've learned a great deal from your approach, 

Thanks again, 

R. 

 

mkeintz
PROC Star

@rmacarthur wrote:

This is excellent and works like a charm, thank you. 

I did not know that SAS Universal Viewer can help identify ODOA characters.

I've learned a great deal from your approach, 

Thanks again, 

Sorry to mislead, but as far as I know SAS Universal Viewer does NOT help identify 0D0A characters.  I just used it first to make sure the file was an actual SAS data set (I don't usually download SAS datasets from SAS Communities).

 

It looked real, so then I ran a SAS DATA step, and displayed each character in event_type and event_date in ascii characters, and and the corresponding hex code.  That's where I saw the otherwise undisplayed carriage return/line feed characters.


 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
rmacarthur
Pyrite | Level 9

Thanks very much for clarifying, this is very helpful too, and am looking up how to identify ASCI characters in a data step, now.  They can certainly reek havoc!

Much Appreciated, 

R.

Kurt_Bremser
Super User

One part of getting to Know Your Data (Maxim 3) is using the $HEX format for character variables to make everything contained in a string visible. It lets you see control characters like CR and LF, and you can identify single-byte characters vs. UTF.

rmacarthur
Pyrite | Level 9

Hi Kurt, 

Thanks, I found an earlier comment from you where you recommend, "Use the RANK Function to get the ASCII code, and the BYTE Function to get the character from the code.".  I'll add "learn how to use $HEX format", and will look into all three.  BC the datasets am using at the moment are full of 'hidden characters' and they have thrown monkey wrenches into past analyses, resulting in much wasted time. 

BTW am a big fan of the 'Maxims' !

Thanks !

R. 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 550 views
  • 3 likes
  • 3 in conversation