Hello. I would like to eliminate duplicate values within an observation, across many columns. For example, if I have these data:
id
var1
var2
var3
var4
var5
1
49800 Main St.
222 Horg Blvd
498 Main St.
329 Old Ave.
109 E. Chorg St.
2
178 Forg Pl.
781 Noodle Rd.
546 Sunshine Dr.
3
329 Old Ave.
329 Old Ave.
329 Old Ave.
80375 Zorg Ln.
4
3872 Gorg Rd.
5
491 N. Yay Way
2985 Cool St.
172 Harbor
172 Harbor
172 Harbor
I would instead like to have these data:
id
var1
var2
var3
var4
var5
1
49800 Main St.
222 Horg Blvd
329 Old Ave.
109 E. Chorg St.
2
178 Forg Pl.
781 Noodle Rd.
546 Sunshine Dr.
3
329 Old Ave.
80375 Zorg Ln.
4
3872 Gorg Rd.
5
491 N. Yay Way
2985 Cool St.
172 Harbor
keeping the first instance of a value within an observation. Transposing the data set is fine, but there are ~150k observations (and 332 columns) so there would be ~150k columns if transposed, which my computer has trouble with.
I would very much appreciate any help. I have been trying things all day and haven't yet found an approach. Below is code to generate the example data set. Thank you!
data have; infile datalines delimiter=','; length var1 $17 var2 $15 var3 $17 var4 $15 var5 $17; input id var1 var2 var3 var4 var5; datalines; 1,49800 Main St.,222 Horg Blvd,498 Main St.,329 Old Ave.,109 E. Chorg St. 2,178 Forg Pl.,781 Noodle Rd.,546 Sunshine Dr.,.,. 3,329 Old Ave.,329 Old Ave.,329 Old Ave.,80375 Zorg Ln.,. 4,3872 Gorg Rd.,.,.,.,. 5,491 N. Yay Way,2985 Cool St.,172 Harbor,172 Harbor,172 Harbor ;
Edited for better example data
... View more