BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Dunne
Obsidian | Level 7

Hello,

 

I have a dataset like below

 

ID  Date    Dx1   Dx2    Dx3   Dx4

1   100      H1     H2      H3     H4

1   100      H3     H2

1   100      H5     H1

1   200      H5

1   200      H3    H2

 

I want to remove the partial duplicate data. If ID and Date are the same, then keep all the values of Dx1-Dx4 but just in 1 row only. My scenario is more likely like the ID 1 person in the first 3 rows.

 

The final data set look like below

 

ID  Date    Dx1   Dx2    Dx3   Dx4

1   100      H1     H2      H3     H4

1   200      H5     H3      H2

 

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
learsaas
Quartz | Level 8

why 1 500 H5 lost?

data  temp;
	set a;
	array array_a Dx1-Dx4;
	length newVal $2;
	do over array_a;
		if array_a^='' then do;
			newVal=array_a;
			output;
		end;
	end;
	drop Dx1-Dx4;
run;
proc sort data=temp nodupkey;
	by id date newVal;
run;
proc transpose data=temp out=result(drop=_name_) prefix=Dx;
	by id date;
	var newVal;
run;

View solution in original post

12 REPLIES 12
novinosrin
Tourmaline | Level 20

Can you clarify my assumption

 

1   200      H5

1   200      H3    H2

 

does the first obs on the by group as in the example H5 will occupy Dx1 replacing H3 and displacing H3 to Dx2?

 

 

Dunne
Obsidian | Level 7

Thanks! I want to keep H5, H3 and H2 also

learsaas
Quartz | Level 8

why 1 500 H5 lost?

data  temp;
	set a;
	array array_a Dx1-Dx4;
	length newVal $2;
	do over array_a;
		if array_a^='' then do;
			newVal=array_a;
			output;
		end;
	end;
	drop Dx1-Dx4;
run;
proc sort data=temp nodupkey;
	by id date newVal;
run;
proc transpose data=temp out=result(drop=_name_) prefix=Dx;
	by id date;
	var newVal;
run;
Reeza
Super User

@learsaas wrote:

why 1 500 H5 lost?

data  temp;
	set a;
	array array_a Dx1-Dx4;
	length newVal $2;
	do over array_a;
		if array_a^='' then do;
			newVal=array_a;
			output;
		end;
	end;
	drop Dx1-Dx4;
run;
proc sort data=temp nodupkey;
	by id date newVal;
run;
proc transpose data=temp out=result(drop=_name_) prefix=Dx;
	by id date;
	var newVal;
run;

A tranpose to long and then flip to wide after removing duplicates is a great solution. Only modification I'd consider to @learsaas excellent solution, is to replace DO OVER since it's deprecated for an explicit loop. Otherwise it will handle all various situations such as multiple lines and duplicate codes as shown in your example. 

janus58
Fluorite | Level 6

Reeza, how would you change the code to replace DO OVER? I didn't understand the part about it being deprecated for an explicit loop

Reeza
Super User
Do Over is deprecated - SAS 7 I believe, so more than 20 years ago. There’s a do loop in novinsrin answer that is close. Note the explicit I index usd, ie t( index ) rather than just t used. Explicit arrays make it clear when an array is being used compared to do over which is referenced similar to a variable. That confusion alone is why it’s a good reason to switch to explicit arrays that must be referenced via the index.
novinosrin
Tourmaline | Level 20

Hi @janus58  Take a look at this thread and Soren Lassen's reponse 

https://communities.sas.com/t5/SAS-Programming/what-s-the-limit-to-how-many-elements-variables-a-SAS...

 

 

PROC Star
s_lassen
Posts: 375
 

Re: what's the limit to how many elements(variables) a SAS array can hold?

 
 
Posted in reply to novinosrin

 

 

Implicit array references (do over) are not encouraged, because SAS Institute intends to do away with them (it has been an undocumented feature for years, now). So if you use them, you risk that your program will have to be rewritten when the next version of SAS comes out. Performance-wise, DO OVER is just the same as explicit array processing (a temporary variable named _I_ is created in the background and used to loop through the array).

janus58
Fluorite | Level 6

thanks a lot for the explanation

novinosrin
Tourmaline | Level 20

You're welcome. Actually we should rather thank @Reeza for the prompt and wise correction. I'm glad for your question. Have a good one!

Dunne
Obsidian | Level 7

Thank you! I want to keep H5 also. Sorry for the confusion!

Dunne
Obsidian | Level 7

ID  Date    Dx1   Dx2    Dx3   Dx4

1   100      H1     H2      H3     H4

2   200      A9     H3      H2     A1

3   340      B2     C1      H1     

4   250      X1

5   350      Y1     C1

 

My result now looks like this. But I get stuck with the next step. I want to find if any Dx1-Dx4 has  H* values, if there is any then create variable dx= value of the first Dx* which have H* appear (ex:id=2 the dx=value of dx2=H3) if not then take other value.

 

1. The result desires to look like this

ID  Date    Dx

1   100      H1 

2   200      H3

3   340      H1     

4   250      X1

5   200     Y1

 

2. Additionally I want to make a table (with one column) with all unique characters which are different from H* 

 

Many thanks!!

novinosrin
Tourmaline | Level 20
data have;
infile cards truncover;
input ID  Date    (Dx1   Dx2    Dx3   Dx4 ) ($);
cards;
1   100      H1     H2      H3     H4
1   100      H3     H2
1   100      H5     H1
1   200      H5
1   200      H3    H2
;

data _null_;
if _n_=1 then do;
 declare hash H (ordered: "A") ;
   h.definekey  ("id","date","dx") ;
   h.definedone () ;
end;
set have end=lr;
by id date; 
array t(*) dx1-dx4;
do _n_=1 to dim(t);
if not missing(t(_n_))  then do;
dx=t(_n_);
rc=h.add();
end;
end;
if lr then h.output(dataset:'w');
run;

proc transpose data=w out=want(drop=_:) prefix=Dx;
by id date;
var dx;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 12 replies
  • 2219 views
  • 5 likes
  • 5 in conversation