BookmarkSubscribeRSS Feed
WendyT
Pyrite | Level 9
Good morning!

I'm working on a series of programs that download data from an Oracle database. The first step downloads data for a particular project to the sas file 'PROJECT' based on a where condition.

There are some projects that have special additional calculation(s) that need to be done and appended back to the 'PROJECT' file.

My plan is to use PROJECT to generate the special data, concatenate the special dataset(s) to one dataset named SPECIAL, then use PROC DATASETS and APPEND to add SPECIAL back to the PROJECT file.

I want to use the attributes of the PROJECT file, so all the variable types and sizes match for the append.

A SET statement with (OBS=0) appears to work correctly.

Is there a better or more appropriate way to do this, and where might I find some documentation?

Thanks for your suggestions.

Wendy



*make PROJECT dataset from Oracle database ;
PROC SQL NOPRINT ;
CREATE PROJECT
AS SELECT … FROM …. WHERE… ;


*make special datasets from PROJECT or other sources ;

PROC SQL NOPRINT ;
CREATE SPECIAL1
AS SELECT … FROM PROJECT... WHERE … ;
QUIT ;

DATA SPECIAL2 ; SET PROJECT ;
Gyrate …gyrate… etc. ;
RUN ;

DATA SPECIAL3 ; SET SOMETHINGELSE ;
etc... ;
RUN ;


*concatenate special datasets using the PROJECT dataset as a model ;

DATA SPECIAL ;
SET PROJECT(OBS=0) SPECIAL1 SPECIAL2 SPECIAL3 ;
RUN ;

*add special datasets back to PROJECT ;

PROC DATASETS ;
APPEND BASE=PROJECT DATA=SPECIAL ;
RUN ;
4 REPLIES 4
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
With the SET you have constructed, a SAS ERROR condition is generated when a given SAS variable has a NUMERIC / CHARACTER type conflict in two or more datasets listed on the SET statement.

And, do consider that SAS will simply truncate a variable's length when the "type" conflict is not present but a "shorter length" with a WARNING message, that sometimes goes unnoticed.

My general approach (and recommendation) is to define either a AUTOCALL macro or INCLUDE member with specific LENGTH, FORMAT and LABEL definitions, however I do not typically have external DBMS data feeds to consider.

Scott Barry
SBBWorks, Inc.
ChrisNZ
Tourmaline | Level 20
While I see nothing odd in you post, I would agree with Scott: define the lengths (and types!) yourself, rather than let sas do it and then try to fix them.

It is difficult to add more comments without knowing more about your constraints.

Replacing the last 2 steps with one might be faster if SPECIAL is larger than PROJECT:

DATA PROJECT;
SET PROJECT SPECIAL1 SPECIAL2 SPECIAL3 ;
RUN ;
ballardw
Super User
You may find the information in Dictionary.columns useful. Look at the items related to your PROJECTS dataset.
WendyT
Pyrite | Level 9
The Oracle database I am using is for Water Quality at my agency, and the PROJECT datasets coming down from the Oracle database tend to be large.

My original code did SET all of the datasets, but I realized that I would need a more general solution to implement across other projects.

Everyone will have some sort of data to add through SPECIAL, but the contents of SPECIAL will differ between projects. For example, most projects will want a Total Nitrogen calculation, my project has parameters of light transmission in water that live in a different database, and others might have something like pesticide data.

SPECIAL should always be much smaller than PROJECT. For example, this morning's run for my project had 1.1 million rows in PROJECT, and 50K rows in SPECIAL.

In addition, I need the final result of PROJECT+SPECIAL to be sorted. I tested run and cpu time of SETting the datasets followed by PROC SORT, PROC SQL using OUTER UNION CORR and ORDER BY, and to my surprise, APPEND followed by PROC SORT was the clear winner.

Thanks also for the caution to watch my variables and lengths carefully. For most of the cases, I am using PROC SQL, and defining the value of a couple of fields, like

'C' AS REMARK,
'TN_CALC' AS ANALYTE,

and the text fields in Oracle are of length 20 and 240, respectively. I definitely could specify the LENGTH explicitly like this, but was hoping to not have to do that.

'C' AS REMARK LENGTH=20,
'TN_CALC' AS ANALYTE LENGTH=240,

I'm not familiar with AUTOCALL, so will have to do some reading.

Thanks so much for the suggestions!

Wendy Message was edited by: WendyT

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 747 views
  • 0 likes
  • 4 in conversation