I have a sample longitudinal data of measurements with three columns (ID, Time, and Level) and 10 rows. ID1 has 6 measures and ID2 has 4 measures. The value has three ordinal levels of Low < Medium < High.
ID Time Level
1 1 Medium
1 2 Medium
1 3 Low
1 4 Low
1 5 Medium
1 6 High
2 1 Medium
2 2 Low
2 3 High
2 4 High
I want to create a new column Level1. This new column is created based on Level by a rule: if Level moves from lower level to higher level (i.e., Low to Medium), then Level1 is the same as Level. But if Level moves from higher level to lower level (i.e., Medium to Low) then Level1 keeps all the rest as higher level (Medium) until Level moves to a higher one.
Here for ID1, from Time 1 to Time 4, Level moves from Medium to Low, so Level1 keep all as Medium. Time 5 and 6, Level moves to High, then Level1 is the same as Level.
For ID2, Time 1 and 2, Medium to Low, Level1 keeps all as Medium, and for Time 3 and 4, both are High that is higher than Medium, Level1 is the same as Level.
ID Time Level Level1
1 1 Medium Medium
1 2 Medium Medium
1 3 Low Medium
1 4 Low Medium
1 5 Medium Medium
1 6 High High
2 1 Medium Medium
2 2 Low Medium
2 3 High High
2 4 High High
data have;
input ID Time Level $;
cards;
1 1 Moderate
1 2 Moderate
1 3 Severe
1 4 Severe
1 5 Moderate
1 6 Severe
2 1 Moderate
2 2 Moderate
2 3 Severe
2 4 Severe
;
proc format;
invalue ifmt
'Moderate'=1
'Severe'=2
;
value fmt
1='Moderate'
2='Severe'
;
run;
data want;
set have;
by id;
retain rank;
if first.id then call missing(rank);
rank=max(rank,input(Level,ifmt.));
Level1=put(rank,fmt.);
drop rank;
run;
Are Moderate and Severe the only values that appear in Level? Is the variable Level ever missing (blank)?
If other or missing values appear in the variable Level what are the rules.
Note to later readers: OP original example data only had Moderate and Severe values and significantly changed the data after several questions had been asked. So many of the questions may not make sense when compared to the final version of the "question".
Do you have any other variable in this set that is supposed to have the order of these records? This could be very important if all of your data for ID is not grouped in order at the moment, or if other data is to be appended and processed with the same rules.
This statement ". For ID2, since all move from Moderate to Severe, Level is the same as Level1." is very confusing. First because the NEW variable is Level1 and this sort of implies that Level is based on Level1 and worse, this does not show the same result as rows 3 and 4 for Id 1 and I can't tell from you description why.
Why does your topic Subject line say "Keep first value". There is nothing in this description that indicates "first value" is kept at all.
Is Row actually a variable in the data set or just shown for description purposes?
I cannot edit the post? First time post here. Yes, You can consider Level as a ordinal variable that include three variables Low, Medium and High. There is another variable Time I should include. So for ID1, Time is listed as 1 to 6 and for ID2, Time is listed as 1 to 4.
@SeaMoon_168 wrote:
I cannot edit the post?
Your post should show three vertical dates near the upper left corner in front of the subject line. Click on them to bring up a context menu that will let you edit a post
Thank you for your advice on how to edit and improve the post. I appreciate if you could give me some help on how to resolve this problem.
@SeaMoon_168 wrote:
Thank you for your advice on how to edit and improve the post. I appreciate if you could give me some help on how to resolve this problem.
Still think that you need to provide a bit more in the way of rules , not just an example, to describe why ID 2 time 9 and 10 are not Severe as is the case for ID 1 at time 3 and 4. Something must be different but you have not provided a clear description of what that would allow programming the rule.
As I mentioned, for ID1 from Medium to Low so keep the rest as Medium. Does it make sense?
I am sorry I just found I entered the wrong level to Level1 (Time 3 and 4) for ID2 and revised them. Thank you for your help.
data have;
input ID Time Level $;
cards;
1 1 Moderate
1 2 Moderate
1 3 Severe
1 4 Severe
1 5 Moderate
1 6 Severe
2 1 Moderate
2 2 Moderate
2 3 Severe
2 4 Severe
;
proc format;
invalue ifmt
'Moderate'=1
'Severe'=2
;
value fmt
1='Moderate'
2='Severe'
;
run;
data want;
set have;
by id;
retain rank;
if first.id then call missing(rank);
rank=max(rank,input(Level,ifmt.));
Level1=put(rank,fmt.);
drop rank;
run;
You are genius. The result using your code is the exact what I expect. Many many thanks 👍
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.