An Idea Exchange for SAS software and services

Comments
by PROC Star
on ‎05-19-2013 09:08 PM

In what circumstance would be helpful?

I think it's generally a bad idea to have variable collisions in a merge (i.e. variables with the same name in two datasets, that are not listed on the BY statement).

And if the variable is listed on the BY statement, seems like probably good practice to have the lengths be the same.

--Q

.

by New Contributor David_N_Ross
on ‎05-20-2013 05:26 PM

This applies not only when merging data sets, but also when concatenating data sets.  Suppose you are concatenating sevaral data sets which may have common data elements, but they may not always have the same variable lengths.  In the following simple example, the length of Paycode is $1 in the first data set and $2 in the second data set.  It gets truncated to $1 in the final data set.

 

34 Data CA ;

35 HOSPST = 'CA' ;

36 Paycode = '1' ;

37 Run ;

NOTE: The data set WORK.CA has 1 observations and 2 variables.

NOTE: DATA statement used (Total process time):

real time 0.01 seconds

cpu time 0.00 seconds

 

38

39 Data AZ ;

40 HOSPST = 'AZ' ;

41 Paycode = '15' ;

42 Run ;

NOTE: The data set WORK.AZ has 1 observations and 2 variables.

NOTE: DATA statement used (Total process time):

real time 0.03 seconds

cpu time 0.01 seconds

 

43

44 Data CAAZ ;

45 Set CA AZ ;

46 By HOSPST ;

47 Run ;

WARNING: Multiple lengths were specified for the variable Paycode by input data set(s). This may

cause truncation of data.

NOTE: There were 1 observations read from the data set WORK.CA.

NOTE: There were 1 observations read from the data set WORK.AZ.

NOTE: The data set WORK.CAAZ has 2 observations and 2 variables.

NOTE: DATA statement used (Total process time):

real time 0.03 seconds

cpu time 0.01 seconds

 

48

49 Proc Contents ;

50 Run ;

NOTE: PROCEDURE CONTENTS used (Total process time):

real time 0.10 seconds

cpu time 0.01 seconds

The CONTENTS Procedure

1HOSPSTChar2
2PaycodeChar1
by PROC Star
on ‎05-20-2013 10:56 PM

Thanks,

I think there is a better case to be made for concatenation than for merge.

That said, I would suspect making the change to use the max length (or more likely, a new option to control the behavior) would be difficult.  So much of the datastep language seems centered around the idea that it compiles one statement at a time, building the PDV as it goes.  In order to find the max(length), it would need to delay creating the PDV until the full step had been compiled.

That said, can certainly agree that this could be helpful, and would wonder how DS2 would handle this.  Maybe there is hope there.

Regards,

-Q.

by Contributor das
on ‎04-03-2014 11:44 AM
Less perplexed today than I was a year ago. Liking this community and deriving much more than I can give back. Thank you. Uncertain still how to mark some of my posted questions as answered. I just answered my own question and posted that for any others, but I can't mark it as the correct answer. So, still a little perplexed. :-)
by Super User
on ‎06-02-2015 07:30 AM

There's no replacement for "Know your data".

Appending a 3-line dataset with a faulty specified variable to a 50-million line dataset will cause havoc.

Thanks, but - no, thanks!

Idea Statuses
Top Liked Authors