My data set has variables user_id and others like v1, v2, .. v5.
There are some duplicate cases which have the same user_id.
is there a way to label the duplicate cases, by adding a new variable, value 1 for the primary cases, and value 0 for the second case with the same user_id? By doing this I do not need to delete the duplicate cases, but when analysis I can select the distinct cases by this new variable.
Thanks in advance.
Sort your data if it's not yet in order:
proc sort data=have;
by user_id;
run;
Then it's simple:
data want;
set have;
by user_id;
new_variable = first.user_id;
run;
Sort your data if it's not yet in order:
proc sort data=have;
by user_id;
run;
Then it's simple:
data want;
set have;
by user_id;
new_variable = first.user_id;
run;
great! this works!
This forum is really good.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.