BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
RobCo
Fluorite | Level 6

Hello, 

I have difficulty to import the ASCII file "DRUG19Q4" (October - December 2019) from FAERS database which can be found in the link below: 

https://fis.fda.gov/extensions/FPD-QDE-FAERS/FPD-QDE-FAERS.html  

 

I used the code that was provided for "DEMO19Q1" ASCII file but i got some error. I would appreciate if you can help me to correct it! 

 

I used this code:

data drug19q4;
infile '/folders/myfolders/sasuser.v94/DRUG19Q4.txt' dlm='$' dsd truncover firstobs=2;
input
  primaryid :$10.
  caseid :$8.
  drug_seq
  role_cod :$1.
  drugname :$500.
  prod_ai :$800.
  val_vbm :$1. 
  route : $8.
  dose_vbm :$3.
  cum_dose_chr :$40.
  cum_dose_unit :$30.
  dechal :$20.
  rechal :$5.
  lot_num :$4.
  exp_dt :$3.
  nda_num :$1.
  dose_amt :$3.
  dose_unit :$1.
  dose_form :$6.
  dose_freq :$5.
;
run;

 

and the error that i got is: 

 1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 72         
 73         data drug19q4;
 74         infile '/folders/myfolders/sasuser.v94/DRUG19Q4.txt' dlm='$' dsd truncover firstobs=2;
 75         input
 76           primaryid :$10.
 77           caseid :$8.
 78           drug_seq
 79           role_cod :$1.
 80           drugname :$500.
 81           prod_ai :$800.
 82           val_vbm :$1.
 83           route : $8.
 84           dose_vbm :$3.
 85           cum_dose_chr :$40.
 86           cum_dose_unit :$30.
 87           dechal :$20.
 88           rechal :$5.
 89           lot_num :$4.
 90           exp_dt :$3.
 91           nda_num :$1.
 92           dose_amt :$3.
 93           dose_unit :$1.
 94           dose_form :$6.
 95           dose_freq :$5.
 96         ;
 97         run;
 
 NOTE: The infile '/folders/myfolders/sasuser.v94/DRUG19Q4.txt' is:
       Filename=/folders/myfolders/sasuser.v94/DRUG19Q4.txt,
       Owner Name=root,Group Name=vboxsf,
       Access Permission=-rwxrwx---,
       Last Modified=28Feb2020:17:42:18,
       File Size (bytes)=162044576
 
 ERROR: Insufficient space in file WORK.DRUG19Q4.DATA.
 ERROR: File WORK.DRUG19Q4.DATA is damaged. I/O processing did not complete.
 NOTE: 1452912 records were read from the infile '/folders/myfolders/sasuser.v94/DRUG19Q4.txt'.
       The minimum record length was 40.
       The maximum record length was 411.
 NOTE: The SAS System stopped processing this step because of errors.
 WARNING: The data set WORK.DRUG19Q4 may be incomplete.  When this step was stopped there were 1452912 observations and 20 variables.
 WARNING: Data set WORK.DRUG19Q4 was not replaced because this step was stopped.
 NOTE: DATA statement used (Total process time):
       real time           11.34 seconds
       cpu time            6.79 seconds
       
 
 98         
 99         OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 111        

Thanks a lot for your help!

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

You MUST (and I mean MUST) read the documentation supplied in ASC_NTS.pdf, as it contains the essentially important information about the columns contained in the data files. Following this document, and my previous experience with the DEMO file, I wrote this code and ran it after downloading the file to my University Edition:

libname myfold '/folders/myfolders';

data myfold.drug19q4 (compress=yes);
infile '/folders/myfolders/DRUG19Q4.txt' dlm='$' dsd truncover firstobs=2;
input
  primaryid :$10.
  caseid :$8.
  drug_seq
  role_cod :$2.
  drugname :$30.
  prod_ai :$50.
  val_vbm :$1.
  route :$10.
  dose_vbm :$30.
  cum_dose_chr
  cum_dose_unit :$7.
  dechal :$1.
  rechal :$1.
  lot_num :$20.
  _exp_dt :$10.
  @
;
if length (_exp_dt) = 4
then do;
  if _exp_dt > "19"
  then exp_dt = mdy(1,1,input(_exp_dt,4.));
  else exp_dt = mdy(input(substr(_exp_dt,1,2),2.),1,2000+input(substr(_exp_dt,3,2),2.));
end;
else if length(_exp_dt) = 6
then exp_dt = input(cats(_exp_dt,'01'),yymmdd8.);
else exp_dt = input(_exp_dt,yymmdd8.);
drop _exp_dt;
format exp_dt e8601da10.;
input
  nda_num
  dose_amt :$30.
  dose_unit :$7.
  dose_form :$30.
  dose_freq :$4.
;
run;

which imported the file successfully, with only 2 NOTEs for really hare-brained date entries that do not match the doc at all.

The resulting dataset file has a size of 215MB, so should be easy to handle in UE. The compress option resulted in a 55 % decrease, so even the uncompressed file would not be a showstopper.

Note that I used a library in the shared folder, so the data is stored outside the VM, which further alleviates any storage problems.

You should now inspect the data to see how the strings are filled; if the maximum length of contents is smaller than the used informat, you can adapt the informat; if strings of a length corresponding to the informat length are found, the informat needs to be increased, until you get a few blanks left with even the largest input string.

View solution in original post

8 REPLIES 8
RobCo
Fluorite | Level 6

I have changed the code to: 

data drug19q4;
infile '/folders/myfolders/sasuser.v94/DRUG19Q4.txt' dlm='$' dsd truncover firstobs=2;
input
  primaryid :$10.
  caseid :$8.
  drug_seq :$7.
  role_cod :$1.
  drugname :$500.
  prod_ai :$300.
  val_vbm :$1. 
  route : $8.
  dose_vbm :$3.
  cum_dose_chr :$40.
  cum_dose_unit :$30.
  dechal :$20.
  rechal :$5.
  lot_num :$4.
  exp_dt :$3.
  nda_num :$1.
  dose_amt :$3.
  dose_unit :$8.
  dose_form :$6.
  dose_freq :$5.
;
run;

and it shows me the tables without any errors but there is missing data!

I appreciate if you can help me to fix it

 

 

 

 

 

 

 

ballardw
Super User

@RobCo wrote:

Hello, 

I have difficulty to import the ASCII file "DRUG19Q4" (October - December 2019) from FAERS database which can be found in the link below: 

https://fis.fda.gov/extensions/FPD-QDE-FAERS/FPD-QDE-FAERS.html  

 

I used the code that was provided for "DEMO19Q1" ASCII file but i got some error. I would appreciate if you can help me to correct it! 

 

I used this code:

data drug19q4;
infile '/folders/myfolders/sasuser.v94/DRUG19Q4.txt' dlm='$' dsd truncover firstobs=2;
input
  primaryid :$10.
  caseid :$8.
  drug_seq
  role_cod :$1.
  drugname :$500.
  prod_ai :$800.
  val_vbm :$1. 
  route : $8.
  dose_vbm :$3.
  cum_dose_chr :$40.
  cum_dose_unit :$30.
  dechal :$20.
  rechal :$5.
  lot_num :$4.
  exp_dt :$3.
  nda_num :$1.
  dose_amt :$3.
  dose_unit :$1.
  dose_form :$6.
  dose_freq :$5.
;
run;

 

and the error that i got is: 

 1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 72         
 73         data drug19q4;
 74         infile '/folders/myfolders/sasuser.v94/DRUG19Q4.txt' dlm='$' dsd truncover firstobs=2;
 75         input
 76           primaryid :$10.
 77           caseid :$8.
 78           drug_seq
 79           role_cod :$1.
 80           drugname :$500.
 81           prod_ai :$800.
 82           val_vbm :$1.
 83           route : $8.
 84           dose_vbm :$3.
 85           cum_dose_chr :$40.
 86           cum_dose_unit :$30.
 87           dechal :$20.
 88           rechal :$5.
 89           lot_num :$4.
 90           exp_dt :$3.
 91           nda_num :$1.
 92           dose_amt :$3.
 93           dose_unit :$1.
 94           dose_form :$6.
 95           dose_freq :$5.
 96         ;
 97         run;
 
 NOTE: The infile '/folders/myfolders/sasuser.v94/DRUG19Q4.txt' is:
       Filename=/folders/myfolders/sasuser.v94/DRUG19Q4.txt,
       Owner Name=root,Group Name=vboxsf,
       Access Permission=-rwxrwx---,
       Last Modified=28Feb2020:17:42:18,
       File Size (bytes)=162044576
 
 ERROR: Insufficient space in file WORK.DRUG19Q4.DATA.
 ERROR: File WORK.DRUG19Q4.DATA is damaged. I/O processing did not complete.
 NOTE: 1452912 records were read from the infile '/folders/myfolders/sasuser.v94/DRUG19Q4.txt'.
       The minimum record length was 40.
       The maximum record length was 411.
 NOTE: The SAS System stopped processing this step because of errors.
 WARNING: The data set WORK.DRUG19Q4 may be incomplete.  When this step was stopped there were 1452912 observations and 20 variables.
 WARNING: Data set WORK.DRUG19Q4 was not replaced because this step was stopped.
 NOTE: DATA statement used (Total process time):
       real time           11.34 seconds
       cpu time            6.79 seconds
       
 
 98         
 99         OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 111        

Thanks a lot for your help!

 


 If you are using the SAS University edition it is intended as learning tool and has some limits as to how big a data set you can use, maybe 10 megabytes.

Try adding something like OBS=100 to the INFILE statement to read the data and see if you end up with 99 records in the data.

If that works then keep increasing the OBS value until you get the errors again. That will tell you about how many records of that data you can work with.

 

If you don't need every single variable then it may be possible to modify the input statement to reduce the data set size. I really wonder how many drug names actually use 500 characters, or whatever Prod_ai might be at 800. Especially when the reported longest record is only 411.

 

RobCo
Fluorite | Level 6

Yes, i am using the SAS UE. I am a student and trying to learn while i am doing a research using FDA database (FAERS). I did not know these numbers are for characters. Thanks for clarification. 

I have fixed those. Just one question. So, the way i am giving the code is correct? and i do not need to do any cleaning or add any other code?

Thanks

data drug19q4;
infile '/folders/myfolders/sasuser.v94/DRUG19Q4.txt' dlm='$' dsd truncover firstobs=2;
input
  primaryid :$10.
  caseid :$8.
  drug_seq
  role_cod :$1.
  drugname :$50.
  prod_ai :$80.
  val_vbm :$1. 
  route : $20.
  dose_vbm :$20.
  cum_dose_chr :$10.
  cum_dose_unit :$10.
  dechal :$2.
  rechal :$5.
  lot_num :$8.
  exp_dt :$3.
  nda_num :$8.
  dose_amt :$4.
  dose_unit :$2.
  dose_form :$8.
  dose_freq :$5.
;
run;
Kurt_Bremser
Super User

Why did you define one variable with $500 and another with $800? Were those lengths specified in the documentation?

 

Anyway, your longest input line was just 411 characters long, so it is clear those strings are never as long as defined. When you have character variables that are mostly empty, using the compress option is a MUST:

data drug19q4 (compress=yes);

It will significantly reduce the physical file size of your dataset, at next to no cost in CPU load. Use it anytime you read character values of 10 bytes or longer when you suspect that the length is not always needed. Inspect the log to see what compression rate you achieved.

RobCo
Fluorite | Level 6

Thank you very much for your advice!

I have some questions: 

1) Where can i find out the length of each variable? 

2) How can I eliminate the variables that i do not need? for example, i do not need these variables to be shown in the output data: 

 dechal :$2.
  rechal :$5.
  lot_num :$8.
  exp_dt :$3.

Thanks

Kurt_Bremser
Super User

You MUST (and I mean MUST) read the documentation supplied in ASC_NTS.pdf, as it contains the essentially important information about the columns contained in the data files. Following this document, and my previous experience with the DEMO file, I wrote this code and ran it after downloading the file to my University Edition:

libname myfold '/folders/myfolders';

data myfold.drug19q4 (compress=yes);
infile '/folders/myfolders/DRUG19Q4.txt' dlm='$' dsd truncover firstobs=2;
input
  primaryid :$10.
  caseid :$8.
  drug_seq
  role_cod :$2.
  drugname :$30.
  prod_ai :$50.
  val_vbm :$1.
  route :$10.
  dose_vbm :$30.
  cum_dose_chr
  cum_dose_unit :$7.
  dechal :$1.
  rechal :$1.
  lot_num :$20.
  _exp_dt :$10.
  @
;
if length (_exp_dt) = 4
then do;
  if _exp_dt > "19"
  then exp_dt = mdy(1,1,input(_exp_dt,4.));
  else exp_dt = mdy(input(substr(_exp_dt,1,2),2.),1,2000+input(substr(_exp_dt,3,2),2.));
end;
else if length(_exp_dt) = 6
then exp_dt = input(cats(_exp_dt,'01'),yymmdd8.);
else exp_dt = input(_exp_dt,yymmdd8.);
drop _exp_dt;
format exp_dt e8601da10.;
input
  nda_num
  dose_amt :$30.
  dose_unit :$7.
  dose_form :$30.
  dose_freq :$4.
;
run;

which imported the file successfully, with only 2 NOTEs for really hare-brained date entries that do not match the doc at all.

The resulting dataset file has a size of 215MB, so should be easy to handle in UE. The compress option resulted in a 55 % decrease, so even the uncompressed file would not be a showstopper.

Note that I used a library in the shared folder, so the data is stored outside the VM, which further alleviates any storage problems.

You should now inspect the data to see how the strings are filled; if the maximum length of contents is smaller than the used informat, you can adapt the informat; if strings of a length corresponding to the informat length are found, the informat needs to be increased, until you get a few blanks left with even the largest input string.

RobCo
Fluorite | Level 6

Thanks a lot for your useful information and help!

I really appreciate it!

Tom
Super User Tom
Super User

Don't try to make such a large WORK dataset.  Store the dataset into your real disk instead of into the virtual machine by writing it into a file in your shared folder.

 

So if you made a subdirectory at the same level as your sasuser.v94 directory, call it FAERS for example.  Then you could put the dataset there by making a libref that points to it and using a two level name for that dataset.

libname faers '/folders/myfolders/FAERS/';
data faers.drug19q4;
  infile ...

You can use a DROP (or KEEP) statement to decide to only store some of the variables you have read into the dataset. Since you have a lot of long string it will also help to use the COMPRESS option so that the dataset will take less space on your disk.

 

 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1592 views
  • 4 likes
  • 4 in conversation