Solved: label duplicate cases in data set

fengyuwuzu · Posted 12-21-2015 04:58 PM

My data set has variables user_id and others like v1, v2, .. v5.

There are some duplicate cases which have the same user_id.

is there a way to label the duplicate cases, by adding a new variable, value 1 for the primary cases, and value 0 for the second case with the same user_id? By doing this I do not need to delete the duplicate cases, but when analysis I can select the distinct cases by this new variable.

Thanks in advance.

Astounding · Posted 12-21-2015 05:04 PM

Sort your data if it's not yet in order:

proc sort data=have;

by user_id;

run;

Then it's simple:

data want;

set have;

by user_id;

new_variable = first.user_id;

run;

View solution in original post

Astounding · Posted 12-21-2015 05:04 PM

Sort your data if it's not yet in order:

proc sort data=have;

by user_id;

run;

Then it's simple:

data want;

set have;

by user_id;

new_variable = first.user_id;

run;

fengyuwuzu · Posted 12-21-2015 05:14 PM

great! this works!

This forum is really good.

label duplicate cases in data set

Re: label duplicate cases in data set

Re: label duplicate cases in data set

Re: label duplicate cases in data set

label duplicate cases in data set

Re: label duplicate cases in data set

Re: label duplicate cases in data set

Re: label duplicate cases in data set

Click image to register for webinar

Classroom Training Available!