BookmarkSubscribeRSS Feed
Bounce
Fluorite | Level 6

Hi all,

 

I would like to join 3 variables in a data set which is measuring the same thing - hence I would only like a new variable with all the information from the columns and no duplicates.

 

The data I have:

ID  1  2  3

1   A  A  B

2   A  B  C

3   A  A  A

4   A  B  B

 

The data I want:

ID  New_Var

1     A

1     B

2     A

2     B

2     C

3     A

4     A

4     B

 

If anyone can please help me, that'd be great.

 

Thanks in advance.

 

5 REPLIES 5
mkeintz
PROC Star

Questions:

  1. Are the data column values always in alphabetic order?  (i.e. can you have B A B for a given ID?).
  2. If they are not in alphabetic order, do you care whether the desired output honors the order the 3 values started out with?  (i.e. from B A B, do you care whether the output is B A  vs A B?)
  3. Could you set up your sample data in a data step, the better to help us help you?
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Bounce
Fluorite | Level 6

Hi mkeintz, thanks for your reply.

1. No the column values are completely random, they are actually in the format of dates in the dataset I am using.

2. No the order of the output does not matter since I can sort them later - just need them all in one column.

3. Sorry do you mean this:

 

DATA Example;
format ID 1.
VAR1 VAR2 VAR3 $1.;
input ID VAR1 VAR2 VAR3;
datalines;
1 A A A
2 A B C
3 A A A
4 A B B
;

run;

 

mkeintz
PROC Star

There is a one-step solution, which  benefits from the CALL SORTC subroutine:

 

DATA Example;
  format ID 1.  VAR1 VAR2 VAR3 $1.;
  input ID VAR1 VAR2 VAR3;
datalines;
1 A A A
2 A B C
3 A A A
4 A B B
run;

data want (keep=id var);
  set example;
  array  _v {*} dummy var1-var3;
  call sortc(of _v{*});
  do i=2 to dim(_v);
    if missing(_v{i}) then continue;
    var=_v{i};
    if var^=_v{i-1} then output;
  end;
run;

 

  1. The "if missing(_v{i}) then continue statement skips to the next iteration of the do loop, skipping the later output stateement.
  2. I put  a dummy variable at the lower bound  of the array, to smiplify the  "if var^=_v{i-1}" test.
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ed_sas_member
Meteorite | Level 14

Hi @Bounce 

 

You can try this code (proc transpose + proc sort to remove duplicate records):

proc transpose data=example out=want (rename=(col1=New_Var) drop=_name_);
	by id;
	var var:;
run;

proc sort data=want nodupkey;
	by id New_Var;
run;

Output:

 Capture d’écran 2019-12-02 à 10.36.21.png

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 961 views
  • 2 likes
  • 4 in conversation