Hi fellow SAS EG users!
I've been researching this problem for a while, and haven't found a solution yet. I'm hoping you can help.
I work with data from external agencies quite often, and each time they send us a csv file, it's in the exact same format. Same columns, same order, same field names.
I've been using the Import Wizard in SAS EG to import these files, but as there are 33 columns and I have to manually specify the datatype of each field, this is tedious and time consuming. Here's the desired format of each column:
Name | Type | Length | Format | Informat | Label |
Your_Unique_Identifier | Character | 1 | $CHAR1. | $CHAR1. | Your Unique Identifier |
First_Name | Character | 16 | $CHAR16. | $CHAR16. | First Name |
Middle_Initial | Character | 1 | $CHAR1. | $CHAR1. | Middle Initial |
Last_Name | Character | 20 | $CHAR20. | $CHAR20. | Last Name |
Name_Suffix | Character | 1 | $CHAR1. | $CHAR1. | Name Suffix |
Requester_Return_Field | Character | 20 | $CHAR20. | $CHAR20. | Requester Return Field |
Record_Found_Y_N | Character | 1 | $CHAR1. | $CHAR1. | Record Found Y/N |
Search_Date | Date | 8 | YYMMDD10. | YYMMDD10. | Search Date |
College_Code_Branch | Character | 9 | $CHAR9. | $CHAR9. | College Code/Branch |
College_Name | Character | 50 | $CHAR50. | $CHAR50. | College Name |
College_State | Character | 2 | $CHAR2. | $CHAR2. | College State |
_2_year___4_year | Character | 1 | $CHAR1. | $CHAR1. | 2-year / 4-year |
Public___Private | Character | 7 | $CHAR7. | $CHAR7. | Public / Private |
Enrollment_Begin | Date | 8 | YYMMDD10. | YYMMDD10. | Enrollment Begin |
Enrollment_End | Date | 8 | YYMMDD10. | YYMMDD10. | Enrollment End |
Enrollment_Status | Character | 1 | $CHAR1. | $CHAR1. | Enrollment Status |
Class_Level | Character | 1 | $CHAR1. | $CHAR1. | Class Level |
Enrollment_Major_1 | Character | 78 | $CHAR78. | $CHAR78. | Enrollment Major 1 |
Enrollment_CIP_1 | Numeric | 8 | BEST6. | BEST6. | Enrollment CIP 1 |
Enrollment_Major_2 | Character | 60 | $CHAR60. | $CHAR60. | Enrollment Major 2 |
Enrollment_CIP_2 | Numeric | 8 | BEST6. | BEST6. | Enrollment CIP 2 |
Graduated_ | Character | 1 | $CHAR1. | $CHAR1. | Graduated? |
Graduation_Date | Date | 8 | YYMMDD10. | YYMMDD10. | Graduation Date |
Degree_Title | Character | 69 | $CHAR69. | $CHAR69. | Degree Title |
Degree_Major_1 | Character | 80 | $CHAR80. | $CHAR80. | Degree Major 1 |
Degree_CIP_1 | Numeric | 8 | BEST6. | BEST6. | Degree CIP 1 |
Degree_Major_2 | Character | 49 | $CHAR49. | $CHAR49. | Degree Major 2 |
Degree_CIP_2 | Numeric | 8 | BEST6. | BEST6. | Degree CIP 2 |
Degree_Major_3 | Character | 34 | $CHAR34. | $CHAR34. | Degree Major 3 |
Degree_CIP_3 | Numeric | 8 | BEST6. | BEST6. | Degree CIP 3 |
Degree_Major_4 | Character | 20 | $CHAR20. | $CHAR20. | Degree Major 4 |
Degree_CIP_4 | Numeric | 8 | BEST6. | BEST6. | Degree CIP 4 |
College_Sequence | Numeric | 8 | BEST2. | BEST2. | College Sequence |
Is there a way to create a template for importing these kinds of files so I don't have to manually import them each time? I've tried using PROC IMPORT:
PROC IMPORT DATAFILE= "C:\MyUsername\Desktop\filename.csv" OUT= outdata DBMS=csv REPLACE; GETNAMES=YES; RUN;
But got the following error message:
NOTE: Unable to open parameter catalog: SASUSER.PARMS.PARMS.SLIST in update mode. Temporary parameter values will be saved to WORK.PARMS.PARMS.SLIST.
ERROR: Physical file does not exist, C:\MyUsername\Desktop\filename.csv.
I found this post explaining how to fix this error, but it seems like my SASUSER Directory is locked to me.
I've also tried using INFILE based on this page I found on the SAS Help Center, but got the same error as before ("Physical file does not exist"):
/*infile*/ data output; length file2read $60; /**/ input file2read $; /**/ infile datalines; /**/ infile dummy filevar = file2read end = done; /**/ do while(not done); /**/ input "Your Unique Identifier"n $ "First Name"n; output; /**/ end; datalines; C:\MyUsername\Desktop\filename.csv ; proc print data=output; run;
Let me know of something else I can try. Anything helps!
Use excel to build the data import steps.
Copy/paste into Excel.
Build formulas using:
`=CONCATENATE("Informat ", A2, " ", E2, ";")
Repeat for format/label/length.
data importFile;
infile ......;
Informat Your_Unique_Identifier $CHAR1.; format Your_Unique_Identifier $CHAR1.; label Your_Unique_Identifier 'Your Unique Identifier';
Informat First_Name $CHAR16.; format First_Name $CHAR16.; label First_Name 'First Name';
Informat Middle_Initial $CHAR1.; format Middle_Initial $CHAR1.; label Middle_Initial 'Middle Initial';
Informat Last_Name $CHAR20.; format Last_Name $CHAR20.; label Last_Name 'Last Name';
Informat Name_Suffix $CHAR1.; format Name_Suffix $CHAR1.; label Name_Suffix 'Name Suffix';
Informat Requester_Return_Field $CHAR20.; format Requester_Return_Field $CHAR20.; label Requester_Return_Field 'Requester Return Field';
Informat Record_Found_Y_N $CHAR1.; format Record_Found_Y_N $CHAR1.; label Record_Found_Y_N 'Record Found Y/N';
Informat Search_Date YYMMDD10.; format Search_Date YYMMDD10.; label Search_Date 'Search Date';
Informat College_Code_Branch $CHAR9.; format College_Code_Branch $CHAR9.; label College_Code_Branch 'College Code/Branch';
Informat College_Name $CHAR50.; format College_Name $CHAR50.; label College_Name 'College Name';
Informat College_State $CHAR2.; format College_State $CHAR2.; label College_State 'College State';
Informat _2_year___4_year $CHAR1.; format _2_year___4_year $CHAR1.; label _2_year___4_year '2-year / 4-year';
Informat Public___Private $CHAR7.; format Public___Private $CHAR7.; label Public___Private 'Public / Private';
Informat Enrollment_Begin YYMMDD10.; format Enrollment_Begin YYMMDD10.; label Enrollment_Begin 'Enrollment Begin';
Informat Enrollment_End YYMMDD10.; format Enrollment_End YYMMDD10.; label Enrollment_End 'Enrollment End';
Informat Enrollment_Status $CHAR1.; format Enrollment_Status $CHAR1.; label Enrollment_Status 'Enrollment Status';
Informat Class_Level $CHAR1.; format Class_Level $CHAR1.; label Class_Level 'Class Level';
Informat Enrollment_Major_1 $CHAR78.; format Enrollment_Major_1 $CHAR78.; label Enrollment_Major_1 'Enrollment Major 1';
Informat Enrollment_CIP_1 BEST6.; format Enrollment_CIP_1 BEST6.; label Enrollment_CIP_1 'Enrollment CIP 1';
Informat Enrollment_Major_2 $CHAR60.; format Enrollment_Major_2 $CHAR60.; label Enrollment_Major_2 'Enrollment Major 2';
Informat Enrollment_CIP_2 BEST6.; format Enrollment_CIP_2 BEST6.; label Enrollment_CIP_2 'Enrollment CIP 2';
Informat Graduated_ $CHAR1.; format Graduated_ $CHAR1.; label Graduated_ 'Graduated?';
Informat Graduation_Date YYMMDD10.; format Graduation_Date YYMMDD10.; label Graduation_Date 'Graduation Date';
Informat Degree_Title $CHAR69.; format Degree_Title $CHAR69.; label Degree_Title 'Degree Title';
Informat Degree_Major_1 $CHAR80.; format Degree_Major_1 $CHAR80.; label Degree_Major_1 'Degree Major 1';
Informat Degree_CIP_1 BEST6.; format Degree_CIP_1 BEST6.; label Degree_CIP_1 'Degree CIP 1';
Informat Degree_Major_2 $CHAR49.; format Degree_Major_2 $CHAR49.; label Degree_Major_2 'Degree Major 2';
Informat Degree_CIP_2 BEST6.; format Degree_CIP_2 BEST6.; label Degree_CIP_2 'Degree CIP 2';
Informat Degree_Major_3 $CHAR34.; format Degree_Major_3 $CHAR34.; label Degree_Major_3 'Degree Major 3';
Informat Degree_CIP_3 BEST6.; format Degree_CIP_3 BEST6.; label Degree_CIP_3 'Degree CIP 3';
Informat Degree_Major_4 $CHAR20.; format Degree_Major_4 $CHAR20.; label Degree_Major_4 'Degree Major 4';
Informat Degree_CIP_4 BEST6.; format Degree_CIP_4 BEST6.; label Degree_CIP_4 'Degree CIP 4';
Informat College_Sequence BEST2.; format College_Sequence BEST2.; label College_Sequence 'College Sequence';
input your_unique_identifier -- college_sequence;
run;
Add INPUT statement.
Less than 5 minutes total.
Alternatively you could build a SAS program that read the file as shown and created the data import step, but @Tom is better at that than I am 🙂
Thanks @Reeza! I appreciate you taking the time to get back to me.
it's in the exact same format. Same columns, same order, same field names.
This is a trivial problem. Write the code to read ONE of the files. It will looks something like this:
data want;
infile "myfile.csv" firstobs=2 dsd truncover ;
length
Your_Unique_Identifier $1
...
Search_Date 8
College_Code_Branch $9
...
Enrollment_CIP_1 8
...
;
format Search_Date Enrollment_Begin Enrollment_End Graduation_Date yymmdd10.;
informat Search_Date Enrollment_Begin Enrollment_End Graduation_Date yymmdd.;
label
Your_Unique_Identifier='Your Unique Identifier'
...
;
input Your_Unique_Identifier -- College_Sequence;
run;
You could probably read the descriptor file and generate the code, but why bother since you only have to create it ONCE.
Now to run this for some other file just change the name of the file being read and perhaps the name of the dataset being written. For example by using macro variables.
%let dsname=want;
%let filename=myfile.csv;
data &dsname;
infile "&filename" firstobs=2 dsd truncover ;
...
Thanks, @Tom!
I tried the code you sent, and a dataset with the correct field names and datatypes prints out, but it's empty. I'm getting the following error:
ERROR: Physical file does not exist, C:\Users\celhaddad\Desktop\OM_EA_FY15_NSC_V20220118.csv.
I know the filepath is correct because I'm copying and pasting it.
I've Googled the error and browsed through about half a dozen articles, and it seems like the issue is that SAS can't connect directly to my C: Drive. (I did find one solution, but I'm not sure how I'd get this to work in my situation.) Do you have any ideas about how I can get around this?
Thanks again for your help!
If you are using Enterprise Guide as a front end for creating and running SAS code then SAS itself is probably running on some other machine. There should a Copy Files type TASK in EG that you can use to upload the file from your C drive to some place on the server were SAS is running so that SAS can read it.
@CeciliaE wrote:
Thanks, @Tom!
I tried the code you sent, and a dataset with the correct field names and datatypes prints out, but it's empty. I'm getting the following error:
ERROR: Physical file does not exist, C:\Users\celhaddad\Desktop\OM_EA_FY15_NSC_V20220118.csv.
I know the filepath is correct because I'm copying and pasting it.
I've Googled the error and browsed through about half a dozen articles, and it seems like the issue is that SAS can't connect directly to my C: Drive. (I did find one solution, but I'm not sure how I'd get this to work in my situation.) Do you have any ideas about how I can get around this?
Thanks again for your help!
If your SAS is running on a server then the file needs to be someplace that the server can see the file.
That could be
1) move the file to a location the server can see using any of a number of operating system tools
2) have your PC set up so that the server can read from the folder(s) where you store the text files. This would involve your organizations IT staff.
3) create a new storage location that both your PC and the Server can use. Your PC would need write privileges, the server would only need read privileges. Move or copy the files to this new location. A cloud storage service like OneDrive should work.
4) With EG there is a task that will upload a file to the server, but I am not sure if this is the best or not as I think it would still be an added manual step.
Thank you! This is really helpful.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.