BookmarkSubscribeRSS Feed
Cruise
Ammonite | Level 13

Hello SAS experts,

 

I have a huge text file trying to read in with SAS. The first few observations look as below when I open it with excel. The data is separated by }. However, excel truncates to its first million rows which is a fraction of the complete data.  I attached csv file of these first few observations to this post, in case. As you will note, ndc_code variable is there for the majority of the data (99%) but not 100% consistent in all rows.

 

I'll highly appreciate any help. The data is so important to me guys. I'm freaked out to lose or break it before or during my attempts to read it into SAS.

 

Thanks in advance.

 

Copy of first few observations. Attached as csv as well.

 

ID}"trans_id"}"ndc_code"}"start_date"        
Y96033K}200 524 406 3 641}}2005-08-02 00:00:00  
C89950C}200 524 406 7 191}}2005-07-15 00:00:00  
C89950C}200 524 406 7 194}}2005-07-19 00:00:00  
G57388K}200 524 406 7 444}}2005-07-29 00:00:00  
P33846V}200 524 406 11 998}}2005-07-29 00:00:00  
X87617C}200 524 406 24 406}}2005-08-09 00:00:00  
S88141M}200 524 406 38 001}}2005-06-17 00:00:00  
P01353V}200 524 406 41 517}}2005-07-23 00:00:00  
Y58550R}200 524 406 55 262}"00185011701"}2005-08-24 00:00:00
N87928G}200 524 406 63 646}"00186504054"}2005-08-25 00:00:00
E11620G}200 524 406 68 200}"00024542131"}2005-08-25 00:00:00
K96751N}200 524 406 69 379}}2005-07-14 00:00:00  
W65400Y}200 524 406 74 975}"00456321060"}2005-08-25 00:00:00
W65400Y}200 524 406 75 085}"00536375610"}2005-08-25 00:00:00
S47546N}200 524 406 75 269}"00006011731"}2005-08-25 00:00:00
W65400Y}200 524 406 75 358}"00013830304"}2005-08-25 00:00:00
X45852C}200 524 406 78 350}"00088110747"}2005-08-25 00:00:00
W41407V}200 524 406 83 114}"65726023510"}2005-08-25 00:00:00
E39839F}200 524 406 83 373}"00182432906"}2005-08-25 00:00:00
X94973M}200 524 406 84 142}"00023917715"}2005-08-25 00:00:00

 

 

 

 

5 REPLIES 5
pau13rown
Lapis Lazuli | Level 10

firstly, i wouldn't modify the csv. Not sure if you were contemplating that, in any case I would make the changes to the data within sas eg x=index(variable,'}') y=substr(variable,x+1) etc. I don't see where ndc is missing, but it wouldn't matter if it was. Are you having problems reading the file or you're just concerned about the format you received it in ie with "}" etc

Cruise
Ammonite | Level 13

Thanks for asking.

 

It just worked out as below:

 

However, Resulting date 'start_date' looks like: 01JUN05:00:00:00. I don't know yet if it is the best format to read in date with. Please let me know if better suggestion.

 

data drug;
infile 'data.csv' delimiter='}' missover
dsd lrecl=32767 firstobs=2;
informat ID $7.;
informat second_id $20. ;
informat ndc $11.;
informat start_date anydtdtm40. ;
format ID $7. ;
format second_id $20. ;
format ndc $11. ;
format start_date datetime. ;
input 
IF $
second_id $
ndc $
start_date
;
run;
pau13rown
Lapis Lazuli | Level 10

re date format, since you have so many obs you'd want to check whether there are partial dates or spurious dates eg year only. You may want to read it as text and then derive the date within sas. If you have time as part of the start date then you might want to separate out the time component. It appears that you don't have time and thus you may want to use date9. format because the time part is superfluous

Cruise
Ammonite | Level 13

@pau13rown

Good point. proc freq turned 0.01% of years were '3000'. Bad dates. Thanks Paul.

 

data have1(compress=yes); set have;
start_date1 = datepart(start_date);
format start_date1 date9.;
start_year=year(start_date1);
run;
proc freq data=have1(compress=yes);
tables start_year;
run; 
Reeza
Super User

Sometimes that's a problem with the ANYDTDTM informat rather than the actual data, I would confirm those data points are incorrect by verifying against the actual data first.

 


@Cruise wrote:


Good point. proc freq turned 0.01% of years were '3000'. Bad dates. Thanks Paul.

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 771 views
  • 5 likes
  • 3 in conversation