BookmarkSubscribeRSS Feed
LacksCreativity
Calcite | Level 5

Hi, so I'm having a bit of trouble with importing this dataset because it has to come from a weblink for reproducibility. I am working in Jupyter Notebook through SAS University Edition. I have to take the following three datasets from a larger one on natality and merge them into one dataset containing the variables I select with keep. I'm a little unsure why it's not working.

 

options nosource nonotes errors=0;
filename nat1 url "https://uofi.box.com/shared/static/gb07nz7jafspk65wzxq6awlu02kypiq7.csv" termstr=crlf;
filename nat10 url "https://uofi.box.com/shared/static/xf7r94alpzaqv4wxh5liqw7rpk21wb6l.csv" termstr=crlf;
filename nat20 url "https://uofi.box.com/shared/static/eq9k4d73ansal5p8mspu3mpslmzs7x8g.csv" termstr=crlf;
proc import datafile=nat1 out=var1(keep = M_Ht_In BMI PWgt_R DWgt_R RF_PDIAB RF_GDIAB RF_PHYPE RF_GHYPE RF_EHYPE RF_PPTERM RF_CESAR ME_PRES ATTEND PAY_REC APGAR5 APGAR10 AB_NICU BFED priorlive priordead mrace6 frace6 dplural OEGest_R3 SEX MAGER **bleep**ECOMB MEDUC FEDUC PREVIS_REC previs CIG_REC WTGAIN DMETH_REC dbwt) replace dbms=csv; 
run;
proc import datafile=nat10 out=var2(keep = M_Ht_In BMI PWgt_R DWgt_R RF_PDIAB RF_GDIAB RF_PHYPE RF_GHYPE RF_EHYPE RF_PPTERM RF_CESAR ME_PRES ATTEND PAY_REC APGAR5 APGAR10 AB_NICU BFED priorlive priordead mrace6 frace6 dplural OEGest_R3 SEX MAGER **bleep**ECOMB MEDUC FEDUC PREVIS_REC previs CIG_REC WTGAIN DMETH_REC dbwt) replace dbms=csv; 
run;
proc import datafile=nat20 out=var3(keep = M_Ht_In BMI PWgt_R DWgt_R RF_PDIAB RF_GDIAB RF_PHYPE RF_GHYPE RF_EHYPE RF_PPTERM RF_CESAR ME_PRES ATTEND PAY_REC APGAR5 APGAR10 AB_NICU BFED priorlive priordead mrace6 frace6 dplural OEGest_R3 SEX MAGER **bleep**ECOMB MEDUC FEDUC PREVIS_REC previs CIG_REC WTGAIN DMETH_REC dbwt) replace dbms=csv; 
run;
data nat;
    set var1 var2 var3;
run;

I run into the following error, which isn't very helpful to me because I know these datasets do exist and are accessible, so I'm not really sure how to approach this error.

 

ERROR: Physical file does not exist, https://uofi.box.com/shared/static/gb07nz7jafspk65wzxq6awlu02kypiq7.csv.
ERROR: Import unsuccessful. See SAS Log for details.
ERROR: Physical file does not exist, https://uofi.box.com/shared/static/xf7r94alpzaqv4wxh5liqw7rpk21wb6l.csv.
ERROR: Import unsuccessful. See SAS Log for details.
ERROR: Physical file does not exist, https://uofi.box.com/shared/static/eq9k4d73ansal5p8mspu3mpslmzs7x8g.csv.
ERROR: Import unsuccessful. See SAS Log for details.
ERROR: File WORK.NAT01.DATA does not exist.
ERROR: File WORK.NAT02.DATA does not exist.
ERROR: File WORK.NAT03.DATA does not exist.

2 REPLIES 2
Patrick
Opal | Level 21

If I execute the code you've posted then first Error message I get is:

NOTE: The SAS System stopped processing this step because of errors.
30         (keep = M_Ht_In BMI PWgt_R DWgt_R RF_PDIAB RF_GDIAB RF_PHYPE RF_GHYPE RF_EHYPE RF_PPTERM RF_CESAR
31         ME_PRES ATTEND PAY_REC APGAR5 APGAR10 AB_NICU BFED priorlive priordead mrace6 frace6 dplural OEGest_R3
32         SEX MAGER **bleep**ECOMB MEDUC FEDUC PREVIS_REC previs CIG_REC WTGAIN DMETH_REC dbwt)
                     __     __
                     214    214
                     23     23
ERROR 214-322: Variable name ** is not valid.

ERROR 23-7: Invalid value for the KEEP option.

Some of the variable names in your keep list don't exist as column headers in the Excel file plus even if they would the names are not valid (...and if you ever get such invalid names you need to quote them as literals in the form 'invalid name'n ).

 

I'd also remove the options in your code as the log messages you suppress here are actually useful.

 

And last but not least:

Proc Import will create SAS variables based on analysing the source data. Even if the column names in the 3 source .csv are the same the 3 Proc Imports will likely create the variables with different attributes - like different lengths - based on the actual data in the .csv.

When you now use a data step to combine the tree tables then the variable definition from the first data set in the set statement will be used. If the length of the variable is shorter than required for the same named variable in the 2nd or 3rd input data set then there will be string truncation.

The safer - and better way - is to not use Proc Import but a data step where you explicitly define the variables.

To make things easier for you: Use Proc Import (code like below). Then go to the SAS log and copy the generated data step, amend as required and then use this data step instead of Proc Import for your final code. IF the structure of the 3 .csv's are the same then you can use the same data step code for all of your csv's.

Kurt_Bremser
Super User

Do NOT use those options while developing, and later ONLY if you have a specific need to hide information from the log. The log is your #1 tool in debugging/maintaining code, and reducing its value for this is foolish at best.

If you already know what columns are in the file (as your keep list implies), do not use proc import, write your data step according to the file documentation.

Your code was scrambled somewhere by political correctness software that found an "offensive" string in there and replaced it with "**bleep**", causing WARNINGs.

And, while testing, I found that none of the data files has a column with a name ending on ECOMB, so that particular part of your code wouldn't work anyway.

 

PS apart from that, your code worked on my University Edition on a Mac from home. I suspect that some kind of firewall prevented your SAS session from accessing the website.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 383 views
  • 1 like
  • 3 in conversation