A subjid can have multiple time points as shown below
Subjid timepoint
1 0
1 0.25
1 1
1 3
So for each time, a refid needs to be assigned
Subjid timepoint Refid
1 0 001
1 0.25 002
1 1 003
1 3 004
But if the timepoint repeats refid needs to be higher value
Subjid timepoint Refid
1 0 001
1 0.25 002
1 1 003
1 3 004
1 0 005
1 1 006
I tried enumeration logic,but did not work.
Your last example is still 1 to 6.....
Please show what you’ve tried so far. I would also recommend extending your logic to more than one ID so the solution works for you.
@SASPhile wrote:
A subjid can have multiple time points as shown below
Subjid timepoint
1 0
1 0.25
1 1
1 3
So for each time, a refid needs to be assigned
Subjid timepoint Refid
1 0 001
1 0.25 002
1 1 003
1 3 004
But if the timepoint repeats refid needs to be higher value
Subjid timepoint Refid
1 0 001
1 0.25 002
1 1 003
1 3 004
1 0 005
1 1 006
I tried enumeration logic,but did not work.
I have to admit I am completely befuddled by your explanation.
All I can see in your examples is that Refid is sequential. You have written a lot of words to replace "sequential", and so I still get the feeling that "sequential" is not what you want, but ...
So you would first create a ranking of timepoints (spanning over the whole "have" dataset across all id's) from lowest to highest, assign refids according to the ranks, and then use those refids for creating the "want" dataset?
Sounds like you want to give each distinct value of TIMEPOINT a number.
Do you care what order the numbers are assigned? Do you just want to number them as they are seen in the data? Or do you want to number them based on the sort order of the values of TIMEPOINT?
Let's take your example data (even though the values of REFID don't look like they are what you want).
data have;
input Id $ TimePoint Refid $ Date $ ;
cards;
122005 0 001 Sep1
122005 0.25 002 Sep2
122005 1 003 Sep3
122005 3 004 Sep4
122006 0.25 002 Sep1
122006 1 003 Sep2
122006 3 004 Sep3
122006 0 005 Sep4
122006 1 006 Sep5
122006 3 007 Sep6
;
We can get the distinct values and number them. If we add an INDEX we use the indexed dataset to look up the new REFID2 values for each value in the original.
proc sort data=have(keep=timepoint) out=timepoints nodupkey;
by timepoint;
run;
data timepoints(index=(timepoint));
set timepoints;
refid2 +1;
format refid2 z3. ;
run;
data want ;
set have;
set timepoints key=timepoint;
run;
results
Time Obs Id Point Refid Date refid2 1 122005 0.00 001 Sep1 001 2 122005 0.25 002 Sep2 002 3 122005 1.00 003 Sep3 003 4 122005 3.00 004 Sep4 004 5 122006 0.25 002 Sep1 002 6 122006 1.00 003 Sep2 003 7 122006 3.00 004 Sep3 004 8 122006 0.00 005 Sep4 001 9 122006 1.00 006 Sep5 003 10 122006 3.00 007 Sep6 004
You could also investigate how to do it in one step using a HASH object instead of the indexed dataset.
Why is the second occurrence of 3 re-using the refid of 004 that was created the first time it appeared but the third time it appears it gets a new refid?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.