I'm trying to convert some code to use SAS on demand so I need to use PROC IMPORT. I had this in my old statement, that defined column names, and the type of variables. How do I do this with PROC IMPORT?
old code:
DATA rawdata;
FILENAME DATA1 DDE "EXCEL|[rawdatav2-new-line-names-checks-forSAS-code-segs.xlsx]forsas!R2C1:R6301C13";
INFILE DATA1 NOTAB DLM= '09'X DSD MISSOVER lrecl = 10240;
INPUT straw_wt panicle_wt DateEntered $ notes $ height DateEntered2 $ rep block row col plant line $ plot_ID_final;
CARDS;
proc print;
RUN;
new code:
PROC IMPORT
OUT= WORK.rawdata
DATAFILE= "/home/paul.tanger140/RILpop2013phenotypes/rawdatav2-new-line-names-checks-forSAS-code-segs.xlsx"
DBMS=xlsx REPLACE;
SHEET="forsas";
GETNAMES=YES; * use names in file.. bad if names don't conform to stupid SAS rules (alphanumeric and underscores ONLY)
RUN;
The problem with SAS guessing rows is removed because your code becomes something like this for reading a tab-delimited file (much the same as DDE):
DATA rawdata;
FILENAME DATA1 "c:\MyFolder\rawdatav2-new-line-names-checks-forSAS-code-segs.csv";
INFILE DATA1 NOTAB DLM= '09'X DSD MISSOVER lrecl = 10240;
INPUT straw_wt panicle_wt DateEntered $ notes $ height DateEntered2 $ rep block row col plant line $ plot_ID_final;
run;
EXCEL is NOT a database.
I have no clue how your comment helps solve my issue. I did not create the old code. It was given to me, and it works, thats all that I care about. Unfortunately it doesn't work in SAS on demand.
I don't have much time or patience to learn the painful intricacies and nuances of SAS. The help pages are impossible to navigate compared to other tools like R. So can someone just tell me what I need to do to make SAS treat my variables correctly and allow me to rename them with "SAS valid" names?
Did you try the PROC IMPORT that you posted? What was the result?
Yes, it works, except it defines variables as numeric or char, and I need to rename some fields since they don't have "SAS approved" names.
Variables in SAS are either numeric or character. PROC IMPORT guesses at the data-type based on the data contained in the cells. If you don't like the guess you can convert. Perhaps adding OPTIONS validvarname=any; will help with the names, you can also use the RENAME statement after the data are IMPORTed.. SAS has rules for variable names you can't change that. EXCEL doesn't even have variables. Just because a SAS data set has some resemblance to certain EXCEL sheets doesn't really mean anything.
All I know is this line accomplished what I needed:
INPUT straw_wt panicle_wt DateEntered $ notes $ height DateEntered2 $ rep block row col plant line $ plot_ID_final;
I was able to name the fields in SAS regardless of what the first row in excel was, and I was able to force categorical variables even though SAS wanted them to be numeric, and would treat them as continuous in later steps. For example, the 12th field, "line" is line numbers, but need to be treated as categorical not continuous.
Thank you for your patience with me, but I'm still not quite sure how to replicate this one line of code with PROC IMPORT which I need to use now. So I guess my questions are:
Save you data in a text file instead. You can save from Excel to Tab delimited or CSV delimited file.
Ok, and then what? What does the import statement become? and don't I still have an issue with SAS guessing incorrect field types?
Then you can write a DATA step to read the data and have complete control. For example something like this.
DATA rawdata;
infile 'myfile.csv' dsd truncover lrecl=30000 firstobs=2 ;
LENGTH straw_wt panicle_wt DateEntered 8 notes $200 height DateEntered2 rep block row col plant 8 line $20 plot_ID_final 8;
informat DateEntered DateEntered2 YYMMDD10.;
format DateEntered DateEntered2 YYMMDD10.;
INPUT straw_wt -- plot_ID_final;
RUN;
Using this code, it doesn't actually import any data. Does it matter than the field names in the file are different?
Here is the code with notes:
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
57
58 DATA rawdata;
59 infile '/home/paul.tanger140/RILpop2013phenotypes/rawdatav2-new-line-names-checks-forSAS-code-segs.csv' dsd truncover
59 ! lrecl=30000 firstobs=2 ;
60 LENGTH straw_wt panicle_wt DateEntered 8 notes $200 height DateEntered2 rep block row col plant 8 line $20
60 ! plot_ID_final 8;
61 informat DateEntered DateEntered2 YYMMDD10.;
62 format DateEntered DateEntered2 YYMMDD10.;
63 INPUT straw_wt panicle_wt DateEntered $ notes $ height DateEntered2 $ rep block row col plant line $ plot_ID_final;
64 RUN;
NOTE: The infile '/home/paul.tanger140/RILpop2013phenotypes/rawdatav2-new-line-names-checks-forSAS-code-segs.csv' is:
Filename=/home/paul.tanger140/RILpop2013phenotypes/rawdatav2-new-line-names-checks-forSAS-code-segs.csv,
Owner Name=paul.tanger140,Group Name=oda,
Access Permission=-rw-r--r--,
Last Modified=17Mar2015:18:22:12,
File Size (bytes)=317061
NOTE: 0 records were read from the infile
'/home/paul.tanger140/RILpop2013phenotypes/rawdatav2-new-line-names-checks-forSAS-code-segs.csv'.
NOTE: The data set WORK.RAWDATA has 0 observations and 13 variables.
The problem with SAS guessing rows is removed because your code becomes something like this for reading a tab-delimited file (much the same as DDE):
DATA rawdata;
FILENAME DATA1 "c:\MyFolder\rawdatav2-new-line-names-checks-forSAS-code-segs.csv";
INFILE DATA1 NOTAB DLM= '09'X DSD MISSOVER lrecl = 10240;
INPUT straw_wt panicle_wt DateEntered $ notes $ height DateEntered2 $ rep block row col plant line $ plot_ID_final;
run;
Thanks! This is helpful. I'm getting an error though:
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
57
58 DATA rawdata;
59 FILENAME DATA1 "/home/paul.tanger140/RILpop2013phenotypes/rawdatav2-new-line-names-checks-forSAS-code-segs.csv";
60 INFILE DATA1 NOTAB DLM= '09'X DSD MISSOVER lrecl = 10240;
_____
23
ERROR 23-2: Invalid option name NOTAB.
here is my code:
DATA rawdata;
FILENAME DATA1 "/home/paul.tanger140/RILpop2013phenotypes/rawdatav2-new-line-names-checks-forSAS-code-segs.csv";
INFILE DATA1 NOTAB DLM= '09'X DSD MISSOVER lrecl = 10240;
INPUT straw_wt panicle_wt DateEntered $ notes $ height DateEntered2 $ rep block row col plant line $ plot_ID_final;
run;
NOTAB is option for DDE access method only. Just remove it.
Getting closer! Now it says this:
NOTE: Invalid data for straw_wt in line 1 1-10240.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
1 CHAR straw_wt,panicle_wt,BiomassDate,notes,height,HeightDate,rep,block,row,col,plant,line,plot_ID-final.2
ZONE 7776757727666666577246666774676266767266666724666674676276726666627672666276667266662766754426666603
NUMR 34217F74C01E93C5F74C29FD1334145CEF453C859784C8597844145C250C2CF3BC2F7C3FCC0C1E4CC9E5C0CF4F94D69E1CD2
and there is a bunch more of this..
These are the first two lines of the file:
straw_wt,panicle_wt,BiomassDate,notes,height,HeightDate,rep,block,row,col,plant,line,plot_ID-final
245.7,75.2,4/13/2012,,162,4/12/2012,1,1,1,1,1,845,113
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.