Re: create xyz table

kyath_sas · Posted 07-17-2018 02:38 AM

Hi could you please help on this

data one;
input x y;
datalines ;
1 2
1 3

2 4

2 5

2 9
;
run;
data two;
input x z;
cards;
1 4
1 5
1 6

2 3

2 7

;
run;

need output like this

x y z

1 2 4

1 3 5

1 . 6

2 4 3

2 5 7

2 9 .

PeterClemmensen · Posted 07-17-2018 03:08 AM

Do like this

data want;
   call missing(y,z);
   merge one two;
   by x;
run;

The DATA to DATA Step Macro
Blog: SASnrd

kyath_sas · Posted 07-17-2018 04:07 AM

I have a huge data

varible values are 'ab-05-abc-$200'

i am getting below error could you suggest on this

ERROR: Variable y has been defined as both character and numeric.
ERROR: Variable z has been defined as both character and numeric.

PeterClemmensen · Posted 07-17-2018 04:09 AM

The error message means exactly what it says: That the variable y is defined as character in one place and as numeric in another.

Do you only have two data sources as in your example? If so run a PROC CONTENTS on both data sets to find the error

The DATA to DATA Step Macro
Blog: SASnrd

kyath_sas · Posted 07-17-2018 04:22 AM

all are char only

PeterClemmensen · Posted 07-17-2018 04:39 AM

The SAS log does not lie to you. Somewhere you have a version of the variable y that is numeric.

Just to be clear: You want the variables to be character, correct? Because in the example data you post, all variables including y and z are numeric?

The DATA to DATA Step Macro
Blog: SASnrd

Kurt_Bremser · Posted 07-17-2018 04:53 AM

@kyath_sas wrote:
all are char only

Can't be. In the example data in your initial post, ALL variables are numeric. If you are unsure how to get correct data steps from your real data, use the macro from my footnote to convert a part of your real datasets to data steps.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Ksharp · Posted 07-17-2018 05:53 AM

data want;
   merge one two;
   by x;
   output;
   call missing(of _all_);
run;

janus58 · Posted 07-17-2018 09:05 AM

Ksharp, can you please explain how

output;
   call missing(of _all_);

gets rid of

ERROR: Variable y has been defined as both character and numeric.
ERROR: Variable z has been defined as both character and numeric.

is variable y and z character or numeric in data=want?

Thanks

Margaret

PeterClemmensen · Posted 07-17-2018 09:10 AM

@janus58 it does not.

It simply ensures that the variable values are missing when no table contributes to the merge. if you remove the CALL MISSING part, no missing values will appear. Instead the variables are retained

The DATA to DATA Step Macro
Blog: SASnrd

art297 · Posted 07-17-2018 06:09 PM

If your two files are already sorted by x I would use the following method:

data one;
  input x $ y $;
  datalines ;
1 2
1 3
2 4
2 5
2 9
;

data two;
  input x $ z $; 
  cards;
1 4
1 5
1 6
2 3
2 7
;

data one_a;
  set one;
  by x;
  if first.x then seq=1;
  else seq+1;
run;

data two_a;
  set two;
  by x;
  if first.x then seq=1;
  else seq+1;
run;

data want (drop=seq);
  merge one_a two_a;
  by x seq;
run;

@janus58: the placement of the call missing statement is critical. If it is the first assignment of those variables, it will make the fields numeric. That is why @Ksharp's code wouldn't produce a mixed case error, as the call missing will assign them as character (as they've already been assigned that way).

However, in the present case, it isn't needed. Since it is a many to many merge, that is the problem that has to be overcome. I did that by assigning seq numbers and including that variable in the by statement.

Art, CEO, AnalystFinder.com

hashman · Posted 07-17-2018 10:06 PM

This is the way it's supposed to be done. Moreover, the files with the surrogate key created in this manner can be used in a full SQL join by those who prefer it to MERGE. Note, however, that using views rather than physical data sets is better in terms of performance and space utilization.

The first time I saw this method proposed was 20 years ago, by Ian Whitlock on SAS-L. At the time, he noted that negating the normal behavior of a many-to-many MERGE statement by forcing non-key variables to missing values is bad programming practice. MISSING (of _ALL_) is an even more deadly sin since it's easy to perceive what kind of havoc it can wreak on unsuspecting variables other than those targeted for nullification.

Registration is open

SAS Training: Just a Click Away