BookmarkSubscribeRSS Feed
AaronJ
Obsidian | Level 7

I am trying to import data from an excel file into SAS University. I have 5 species and a dozen or so variables.

When i run Proc IMPORT and Proc PRINT of the data it will read some of the Species and their rows of data, but not all. In fact, it will read 3 rows of data but not the other 275. 

Why isn't SAS reading data that is clearly visible in the excel file itself? or How can i force SAS to read the data that it won't?

this is the code i've written:

 

%web_drop_table(WORK.IMPORT);

FILENAME FILEREF '/folders/myfolders/Trees.xlsx';

PROC IMPORT DATAFILE=FILEREF
DBMS=XLSX
OUT=WORK.IMPORT;
GETNAMES=YES;
RUN;

data Trees;
set import;
if Measurement=2;
if GLD>0;
if DBH>0;
if Height>0;

run;

proc print data=Trees;
run;

 

Thanks in advance

11 REPLIES 11
Reeza
Super User

FILEREF may be a keyword, try changing your FILENAME reference to have a different word.

 

FILENAME myfile '/folders/myfolders/Trees.xlsx';

PROC IMPORT DATAFILE=myfile
DBMS=XLSX
OUT=WORK.IMPORT;
GETNAMES=YES;
RUN;

And post your log.

 


@AaronJ wrote:

I am trying to import data from an excel file into SAS University. I have 5 species and a dozen or so variables.

When i run Proc IMPORT and Proc PRINT of the data it will read some of the Species and their rows of data, but not all. In fact, it will read 3 rows of data but not the other 275. 

Why isn't SAS reading data that is clearly visible in the excel file itself? or How can i force SAS to read the data that it won't?

this is the code i've written:

 

%web_drop_table(WORK.IMPORT);

FILENAME FILEREF '/folders/myfolders/Trees.xlsx';

PROC IMPORT DATAFILE=FILEREF
DBMS=XLSX
OUT=WORK.IMPORT;
GETNAMES=YES;
RUN;

data Trees;
set import;
if Measurement=2;
if GLD>0;
if DBH>0;
if Height>0;

run;

proc print data=Trees;
run;

 

Thanks in advance


 

ballardw
Super User

Which data set only has 3 rows of data? Work.import or work.trees?

 

In

data Trees;
set import;
if Measurement=2;
if GLD>0;
if DBH>0;
if Height>0;

 

Each of those If statements removed records. Your proc print should only show the records where Measurement=2 and GLD>0 and DBH>0 and height > 0 because you removed everything else in the set import when writing it to Trees.

AaronJ
Obsidian | Level 7
yes, i am aware of that. and there are still 270 lines of data with numbers in them for each of the variables that SAS is not reading.
Reeza
Super User

Post your log. 

AaronJ
Obsidian | Level 7

Here's the log:

 

1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
61
62 %web_drop_table(WORK.IMPORT);
NOTE: Table WORK.IMPORT has been dropped.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
 
 
63
64 FILENAME FILEREF '/folders/myfolders/Trees.xlsx';
65
66 PROC IMPORT DATAFILE=FILEREF
67 DBMS=XLSX
68 OUT=WORK.IMPORT;
69 GETNAMES=YES;
70 RUN;
 
NOTE: The import data set has 12488 observations and 21 variables.
NOTE: WORK.IMPORT data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 6.95 seconds
cpu time 6.90 seconds
 
 
71
72 data Trees;
73 set import;
74 if Species=1;
75 if Measurement=2;
76 if GLD>0.01;
77 if DBH>0.01;
78 if Height>0.01;
79
80 run;
 
NOTE: There were 12488 observations read from the data set WORK.IMPORT.
NOTE: The data set WORK.TREES has 3 observations and 21 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.03 seconds
 
 
81
82 proc print data=Trees;
83 run;
 
NOTE: There were 3 observations read from the data set WORK.TREES.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.28 seconds
cpu time 0.28 seconds
 
 
84
85 /*proc univariate Normal Plot data=Trees;
86 var GLD DBH Height;
87 by Tree Measurement;
88 Histogram GLD DBH Height/Normal;
89 QQPLOT GLD DBH Height;
90 output out=Trees;
91 run;*/
92
93 proc sort data=Trees;
94 By Species Fert Nfix Block Plot Measurement;
95 run;
 
NOTE: There were 3 observations read from the data set WORK.TREES.
NOTE: The data set WORK.TREES has 3 observations and 21 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.01 seconds
cpu time 0.02 seconds
 
 
96
97 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
110
 
Reeza
Super User

And how many observations and variables were you expecting?

NOTE: The import data set has 12488 observations and 21 variables.

AaronJ
Obsidian | Level 7

of the 12k observations in total, with the constraints i put on the data with If statements i was expecting about 250 observations to be read by SAS.

I have five species, and i used an If statement to constrain the Sort to only one species in the example. When i shift that If statement to other species, they also return incorrect numbers of observations.

Reeza
Super User

So your problem is with the IF statements not your import. 

 

I would verify that the data is first being read in correctly and check how you're filtering. 

Remember when filtering on character variables they are case sensitive.

 

Also, your log shows code that's different from what you posted, so what exactly is the code you're having issues with?

And can you post sample data? If not, run a PROC FREQ or MEANS on your data sets to check that it's correct and you're not overlooking something.

ballardw
Super User

@AaronJ wrote:
yes, i am aware of that. and there are still 270 lines of data with numbers in them for each of the variables that SAS is not reading.

Are you basing the above statement from looking at the data in the original Excel files?

You may be thinking you have species values=1 or measurement values =2 that are actually 1.01 or .99 (for species) or 2.01 or 1.99 because the display in Excel was set to no decimals.

 

You might want to print some of those records you think should be included from the IMPORT set to show the values of measurement, gld, dbh and height.

 

It also helps to be consistent: Original post:

if Measurement=2;
if GLD>0;
if DBH>0;
if Height>0;

in the longer code example:

 

74 if Species=1;
75 if Measurement=2;
76 if GLD>0.01;
77 if DBH>0.01;
78 if Height>0.01;

which is more restrictive than with the addition of Species and increasing the limits of the last three variables.

 

 

 

 

BTW I notice you have code like this:

 

85 /*proc univariate Normal Plot data=Trees;
86 var GLD DBH Height;
87 by Tree Measurement;
88 Histogram GLD DBH Height/Normal;
89 QQPLOT GLD DBH Height;
90 output out=Trees;
91 run;*/
 
The first time you run this code you will replace your input set if that is actually executed. You need to be very aware of using that type of code in longer programs.
AaronJ
Obsidian | Level 7

Thanks for all the input everyone. I found my mistake: i didn't realize that the chain of IF statements was an implicit AND. Many of my data didn't have BDH values so the program i wrote was excluding them. When i removed the IF statement for DBH, SAS found all the data that i was expecting.

Thanks again.

Reeza
Super User
Actually it’s an implicit OR, not AND when you likely wanted an AND. This may seem pedantic but it’s an important differentiation.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1738 views
  • 4 likes
  • 3 in conversation