Vertical Joining in the same dataset

Bounce · Posted 12-01-2019 11:38 PM

Hi all,

I would like to join 3 variables in a data set which is measuring the same thing - hence I would only like a new variable with all the information from the columns and no duplicates.

The data I have:

ID 1 2 3

1 A A B

2 A B C

3 A A A

4 A B B

The data I want:

ID New_Var

1 A

1 B

2 A

2 B

2 C

3 A

4 A

4 B

If anyone can please help me, that'd be great.

Thanks in advance.

mkeintz · Posted 12-01-2019 11:45 PM

Questions:

Are the data column values always in alphabetic order? (i.e. can you have B A B for a given ID?).
If they are not in alphabetic order, do you care whether the desired output honors the order the 3 values started out with? (i.e. from B A B, do you care whether the output is B A vs A B?)
Could you set up your sample data in a data step, the better to help us help you?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Bounce · Posted 12-02-2019 12:24 AM

Hi mkeintz, thanks for your reply.

1. No the column values are completely random, they are actually in the format of dates in the dataset I am using.

2. No the order of the output does not matter since I can sort them later - just need them all in one column.

3. Sorry do you mean this:

DATA Example;
format ID 1.
VAR1 VAR2 VAR3 $1.;
input ID VAR1 VAR2 VAR3;
datalines;
1 A A A
2 A B C
3 A A A
4 A B B
;

run;

Kurt_Bremser · Posted 12-02-2019 03:44 AM

Run a transpose:

proc transpose
  data=example
  out=want (rename=(col1=new_var))
;
by id;
var var:;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

mkeintz · Posted 12-02-2019 06:33 PM

There is a one-step solution, which benefits from the CALL SORTC subroutine:

DATA Example;
  format ID 1.  VAR1 VAR2 VAR3 $1.;
  input ID VAR1 VAR2 VAR3;
datalines;
1 A A A
2 A B C
3 A A A
4 A B B
run;

data want (keep=id var);
  set example;
  array  _v {*} dummy var1-var3;
  call sortc(of _v{*});
  do i=2 to dim(_v);
    if missing(_v{i}) then continue;
    var=_v{i};
    if var^=_v{i-1} then output;
  end;
run;

The "if missing(_v{i}) then continue statement skips to the next iteration of the do loop, skipping the later output stateement.
I put a dummy variable at the lower bound of the array, to smiplify the "if var^=_v{i-1}" test.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

ed_sas_member · Posted 12-02-2019 04:35 AM

Hi @Bounce

You can try this code (proc transpose + proc sort to remove duplicate records):

proc transpose data=example out=want (rename=(col1=New_Var) drop=_name_);
	by id;
	var var:;
run;

proc sort data=want nodupkey;
	by id New_Var;
run;

Output:

Vertical Joining in the same dataset

Re: Vertical Joining in the same dataset

Re: Vertical Joining in the same dataset

Re: Vertical Joining in the same dataset

Re: Vertical Joining in the same dataset

Re: Vertical Joining in the same dataset

Vertical Joining in the same dataset

Re: Vertical Joining in the same dataset

Re: Vertical Joining in the same dataset

Re: Vertical Joining in the same dataset

Re: Vertical Joining in the same dataset

Re: Vertical Joining in the same dataset

Ready to join fellow brilliant minds for the SAS Hackathon?