BookmarkSubscribeRSS Feed
PGStats
Opal | Level 21

Ok then, Read a single workbook and check that the types (character or number) of corresponding variables in datasets B12Y12, B24Y24, B25Y25 and B35Y35 all match. The error message says they don't, which makes the fields impossible to combine.

PG

PG
copony
Calcite | Level 5

Some of the corresponding values for the variables are missing. Could that be the problem?

PGStats
Opal | Level 21

Yes it could. As far as I can tell (this is not documented) the type of missing values in Excel is determined by their format. Missing values without a format as assumed to be character. Whether you want to correct the workbooks or fix the problem by programming depends on your patience and the number of workbooks involved. In other words, you could assign formats to Excel ranges or do some fancy footwork in SAS.

PG

PG
copony
Calcite | Level 5

I think I might have to use SAS, however complicated, because I can't run macros in Excel due to our security policies and there are many workbooks containing multiple worksheets. Where should I begin in SAS?

PGStats
Opal | Level 21

OK. Here is a new version that should tolerate the presence of character variables in the imported ranges. They are transformed into missing values.

%macro getdata(file,ds,Sheet);

libname xl Excel "&sasforum\datasets\&file..xlsx" header=no access=readonly usedate=yes scantime=yes;

data B12Y12;

set xl."&Sheet.$B12:Y12"n;

keep _NUMERIC_;

run;

data B24Y24;

set xl."&Sheet.$B24:Y24"n;

keep _NUMERIC_;

run;

data B25Y25;

set xl."&Sheet.$B25:Y25"n;

keep _NUMERIC_;

run;

data B35Y35;

set xl."&Sheet.$B35:Y35"n;

keep _NUMERIC_;

run;

proc sql;

create table id as select F1 as ID from xl."&Sheet.$B2:B2"n;

create table A10B10 as select catx(" ", F1, F2) as date from xl."&Sheet.$A10:B10"n;

create table A11B11 as select catx(" ", F1, F2) as var2 from xl."&Sheet.$A11:B11"n;

quit;

libname xl clear;

/* Empty dataset to ensure the presence of all 24 variables F1-F24 */

data dum; array F[24]; stop; run;

data t;

set dum B12Y12 B24Y24 B25Y25 B35Y35;

run;

proc transpose data=t out=tl; var f1-f24; run;

proc sql;

create table &ds as

select

    id,

    date,

    var2,

    col1 as t ,

    col2 as val1 format=best.,

    col3 as val2 format=best.,

    col4 as val3 format=best.

from id, A10B10, A11B11, tl;

drop table id, A10B10, A11B11, dum, B12Y12, B24Y24, B25Y25, B35Y35, tl, t;

quit;

%mend;

option mprint; /* for testing */

%getdata(copony,ds001,Sheet1);

/* ... repeat for all worksheets. give all output datasets names starting with ds */

/* Assemble all datasets that have names starting with ds */

data allDs;

set ds:;

run;

PG

PG
copony
Calcite | Level 5

Thanks, ... I think we are getting closer. I don't get any "errors" with this code but I get a lot of notes that the tables have been dropped. This may  be a dumb question, but since I'm literally  just learning SAS data steps for this project, what does it mean when a table is "dropped" in the SAS universe? Does that mean the table no longer exists? I also get a note (below) about performing one or more Cartesian product joins, and I'm not sure how that affects the tables. 

MPRINT(GETDATA):   data t;

MPRINT(GETDATA):   set dum B12Y12 B24Y24 B25Y25 B35Y35;

MPRINT(GETDATA):   run;

NOTE: There were 0 observations read from the data set WORK.DUM.

NOTE: There were 1 observations read from the data set WORK.B12Y12.

NOTE: There were 1 observations read from the data set WORK.B24Y24.

NOTE: There were 1 observations read from the data set WORK.B25Y25.

NOTE: There were 1 observations read from the data set WORK.B35Y35.

NOTE: The data set WORK.T has 4 observations and 24 variables.

NOTE: DATA statement used (Total process time):

      real time           0.06 seconds

      cpu time            0.06 seconds

MPRINT(GETDATA):   proc transpose data=t out=tl;

MPRINT(GETDATA):   var f1-f24;

MPRINT(GETDATA):   run;

NOTE: There were 4 observations read from the data set WORK.T.

NOTE: The data set WORK.TL has 24 observations and 6 variables.

NOTE: PROCEDURE TRANSPOSE used (Total process time):

      real time           0.15 seconds

      cpu time            0.03 seconds

MPRINT(GETDATA):   proc sql;

MPRINT(GETDATA):   create table ds001 as select id, date, var2, col1 as t , col2 as val1

format=best., col3 as val2 format=best., col4 as val3 format=best. from id, A10C10, A11B11, tl;

NOTE: The execution of this query involves performing one or more Cartesian product joins that

      can not be optimized.

NOTE: Table WORK.DS001 created, with 24 rows and 7 columns.

MPRINT(GETDATA):   drop table id, A10C10, A11B11, dum, B12Y12, B24Y24, B25Y25, B35Y35, tl, t;

NOTE: Table WORK.ID has been dropped.

NOTE: Table WORK.A10C10 has been dropped.

NOTE: Table WORK.A11B11 has been dropped.

NOTE: Table WORK.DUM has been dropped.

NOTE: Table WORK.B12Y12 has been dropped.

NOTE: Table WORK.B24Y24 has been dropped.

NOTE: Table WORK.B25Y25 has been dropped.

NOTE: Table WORK.B35Y35 has been dropped.

NOTE: Table WORK.TL has been dropped.

NOTE: Table WORK.T has been dropped.

MPRINT(GETDATA):   quit;

NOTE: PROCEDURE SQL used (Total process time):

      real time           0.24 seconds

      cpu time            0.10 seconds

116

117  /* ... repeat for all worksheets. give all output datasets names starting with ds */

PGStats
Opal | Level 21

When my code says

drop table id, A10B10, A11B11, dum, B12Y12, B24Y24, B25Y25, B35Y35, tl, t;


it simply means DELETE these tables. NOTEs in the log are just information message and usually do not require corrective action. I delete these intermediate tables to clean up, but also to make sure the next time the macro is called an old dataset will not be read instead of a new one that failed to be created.


PG

PG
jakarman
Barite | Level 11

PGstats used the Proc Sql and libname option to indicate parts of the Excel spreadsheet.

At the moment the union runs the parts are getting joined there is a runtime error.

The pitfall with this method is that the different parts are getting defined differently by the autodetection of variables.

The autodetection can be overridden by using DBTYPE DBSASTYPE options at the libname statement

There is an other way to work with ranges.

Ever asked: "WHAT IS THAT “$” CHARACTER"  see: http://www2.sas.com/proceedings/sugi31/024-31.pdf

It are the ranges as defined within Excel. Instead of defining the ranges in SAS define the ranges in Excel (using SAS naming conventions).

Then use:

libname xl Excel "&sasforum\datasets\&file..xlsx" header=no access=readonly usedate=yes scantime=yes

After this check the properties of the tables you have defines as Excel ranges.

For the date/time conversion I have http://www2.sas.com/proceedings/sugi29/068-29.pdf  (E.W Tilanus). Excerpt:

Excel uses 1 January 1900 as day one. Note negative dates are not allowed.  SAS is using 1 jan1960

Excel: a time in Excel is a fraction of a day. So for instance 12:00 noon in SAS is 43200 (seconds since midnight), in Excel it is 0.5 (half day).

SAS_date = Excel_date - 21916;     SAS_time = Excel_time * 86400;     SAS_date_time = (Excel_date_time - 21916) * 86400;

This conversion and formatting is often done automatically. When it fails check those backgrounds.

---->-- ja karman --<-----
PGStats
Opal | Level 21

It's true Jaap, SASDBTYPE dataset option can sometimes be useful, although it can be ackward for long lists of columns as its syntax doesn't support variable lists. I often which I could just say SASDBTYPE=(_ALL_=CHAR(200)) and do the conversions myself. The interplay between Excel (ACE) and SAS can get a bit confusing. The column type is always detected by Excel first. If Excel decides to return the values as character then setting SASDBTYPE=(F1=NUMERIC) will force SAS to convert the values back to numbers, with some loss of precision. 

Named ranges on the other hand are beside the point. The whole idea is to avoid having to modify a large number of worksheets.

PG

PG

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 23 replies
  • 6285 views
  • 7 likes
  • 4 in conversation