Hello,
I am trying to create a unique ID variable based on 2 variables (record ID, and a date). The unique variable must restart at 1 whenever the record ID changes. The record ID can also be repeating for some records.
As an example, here is the data I have with the ID that I want:
data WORK.CLASS(label='Student Data'); infile datalines; input record_id date_reported; datalines; 1 04/06/2020 1 05/06/2020 2 03/06/2021 2 04/07/2021 3 07/09/2021 4 02 06/2021 ; run;
I have tried this but did not get the ID to repeat:
data want; set have; by record_id date_reported_f/* if need sort data have */; retain ID 0; if first.date_reported_f then ID+1; run;
Let me know if you have any suggestions!
Clare
It is a little hard to interpret what you want. If you want a new identifier that counts the observations per REPORT_ID then perhaps something like this.
data have ;
input record_id date_reported :mmddyy.;
format date_reported yymmdd10.;
datalines;
1 04/06/2020
1 05/06/2020
2 03/06/2021
2 04/07/2021
3 07/09/2021
4 02/06/2021
;
data want;
set have;
by record_id date_reported;
subid + 1;
if first.record_id then subid=1;
run;
Results:
record_ date_ Obs id reported subid 1 1 2020-04-06 1 2 1 2020-05-06 2 3 2 2021-03-06 1 4 2 2021-04-07 2 5 3 2021-07-09 1 6 4 2021-02-06 1
If instead you want the same ID*DATE_REPORTED values to have the same SUBID then we need to change the code (and change the test data since the current data the results would be exactly the same since no dates repeat).
data have ;
input record_id date_reported :mmddyy.;
format date_reported yymmdd10.;
datalines;
1 04/06/2020
1 05/06/2020
1 05/06/2020
1 05/06/2020
2 03/06/2021
2 04/07/2021
3 07/09/2021
4 02/06/2021
;
data want;
set have;
by record_id date_reported;
subid + first.date_reported;
if first.record_id then subid=1;
run;
Results
record_ date_ Obs id reported subid 1 1 2020-04-06 1 2 1 2020-05-06 2 3 1 2020-05-06 2 4 1 2020-05-06 2 5 2 2021-03-06 1 6 2 2021-04-07 2 7 3 2021-07-09 1 8 4 2021-02-06 1
It is a little hard to interpret what you want. If you want a new identifier that counts the observations per REPORT_ID then perhaps something like this.
data have ;
input record_id date_reported :mmddyy.;
format date_reported yymmdd10.;
datalines;
1 04/06/2020
1 05/06/2020
2 03/06/2021
2 04/07/2021
3 07/09/2021
4 02/06/2021
;
data want;
set have;
by record_id date_reported;
subid + 1;
if first.record_id then subid=1;
run;
Results:
record_ date_ Obs id reported subid 1 1 2020-04-06 1 2 1 2020-05-06 2 3 2 2021-03-06 1 4 2 2021-04-07 2 5 3 2021-07-09 1 6 4 2021-02-06 1
If instead you want the same ID*DATE_REPORTED values to have the same SUBID then we need to change the code (and change the test data since the current data the results would be exactly the same since no dates repeat).
data have ;
input record_id date_reported :mmddyy.;
format date_reported yymmdd10.;
datalines;
1 04/06/2020
1 05/06/2020
1 05/06/2020
1 05/06/2020
2 03/06/2021
2 04/07/2021
3 07/09/2021
4 02/06/2021
;
data want;
set have;
by record_id date_reported;
subid + first.date_reported;
if first.record_id then subid=1;
run;
Results
record_ date_ Obs id reported subid 1 1 2020-04-06 1 2 1 2020-05-06 2 3 1 2020-05-06 2 4 1 2020-05-06 2 5 2 2021-03-06 1 6 2 2021-04-07 2 7 3 2021-07-09 1 8 4 2021-02-06 1
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.