DATA Step, Macro, Functions and more

flagging cases

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 5
Accepted Solution

flagging cases

Hi,

This is my first post/question on this site and I am a beginner.

I want to have a flag variable corresponding to a data set such that it changes value as the value of a certain variable changes.

For ex: if variable Age has the values a, b, c, d then variable flag takes the value 1,2,3,4 respectively for each occurence of values a,b,c,d

Please help.

 

Regards,

Yogesh Singh


Accepted Solutions
Solution
‎07-12-2016 12:05 PM
Super User
Posts: 11,343

Re: flagging cases

Posted in reply to Yogesh2601

There are many ways to conditionally assign values to another variable. This will usually be done in a datastep.

The basic approach is IF THEN ELSE.

 

Data want;

    set have;

    if Age='a' then flag=1;

    else if Age='b' then flag=2;

    else if Age='c' then flag=3;

    else if Age='d' then flag=4;

run;

However this construct can get lengthy if you have many values. If you have a range of values such as a, b, c, d should have a flag of 1 and e, f, g, h should have a flag of 2 you can use an IN operator with a list:

if Age in ('a' 'b' 'c' 'd') then flag=1;

else if Age in ('e' 'f' 'g' 'h') then flag=2;

If your comparison is a range of numeric values such as 1 to 10

if Age in (1:10) then flag=1;

 

Longer or more complex approaches might be a Select statement.

Custom formats are also often used especially if multiple variables may have a similar coding need. In some cases formats are perferred because by assigning a format to the existing variable you may not need to add another variable at all for many analysis tasks.

View solution in original post


All Replies
Super User
Posts: 19,770

Re: flagging cases

Posted in reply to Yogesh2601

This is called recoding a variable. 

 

You can use IF/THEN statements

http://www.ats.ucla.edu/stat/sas/modules/vars.htm

 

If this will be done multiple times or is really long then you can consider using a format instead.

http://www2.sas.com/proceedings/sugi30/001-30.pdf

Occasional Contributor
Posts: 5

Re: flagging cases

I did it with IF/THEN/ELSE but is there any easier way to do it?
Super User
Posts: 5,498

Re: flagging cases

Posted in reply to Yogesh2601

Here are some issues that are important to finding the right solution.

 

First, are other variables involved?  For example, are there multiple records for each patient and the pattern (1, 2, 3, 4) needs to start over again when there is a different patient?

 

Second, what about repeats?  For example, what should the output look like if the variable in question follows the pattern a, b, a, c so that the variable is changing but one of the values repeats an earlier value?

Occasional Contributor
Posts: 5

Re: flagging cases

Posted in reply to Astounding
No other variables are involved. And for repeat cases the flage variable should have the same value.

For ex: Age Flag
11 1
12 2
13 3
14 4
11 1
15 5

I did it with IF/THEN/ELSE. Please suggest an alternative solution.
Super User
Posts: 19,770

Re: flagging cases

Posted in reply to Yogesh2601

If you only have 4 values your unlikely to find a solution that's actually less code. 

The only one I can think of is an array look up,,

 

 

Array age_fmt(*) _temporary_ (11:15);

 

Flag = whichn(age, of age_fmt(*));

 

 

Also, since you seem to have a pattern in your age/flag

 

Flag = age - 10;

 

Solution
‎07-12-2016 12:05 PM
Super User
Posts: 11,343

Re: flagging cases

Posted in reply to Yogesh2601

There are many ways to conditionally assign values to another variable. This will usually be done in a datastep.

The basic approach is IF THEN ELSE.

 

Data want;

    set have;

    if Age='a' then flag=1;

    else if Age='b' then flag=2;

    else if Age='c' then flag=3;

    else if Age='d' then flag=4;

run;

However this construct can get lengthy if you have many values. If you have a range of values such as a, b, c, d should have a flag of 1 and e, f, g, h should have a flag of 2 you can use an IN operator with a list:

if Age in ('a' 'b' 'c' 'd') then flag=1;

else if Age in ('e' 'f' 'g' 'h') then flag=2;

If your comparison is a range of numeric values such as 1 to 10

if Age in (1:10) then flag=1;

 

Longer or more complex approaches might be a Select statement.

Custom formats are also often used especially if multiple variables may have a similar coding need. In some cases formats are perferred because by assigning a format to the existing variable you may not need to add another variable at all for many analysis tasks.

Occasional Contributor
Posts: 5

Re: flagging cases

I did it with IF/ELSE. Is there any easier way to do it?
Super User
Posts: 5,498

Re: flagging cases

Posted in reply to Yogesh2601

When you have just a few possible values, there's no easier way to do it. 

 

When you have large numbers of values, there are still many ways to do it.  But it's difficult to find a short, simple way.  Here's one possible approach:

 

proc sql;

select distinct age into : agelist separated by ' ' from have where age > .;

quit;

 

data want;

array ages {&sqlobs} _temporary_ (&agelist);

set have;

flag = whichn(age, of ages{*});

run;

 

Note that this will not assign a value to FLAG when age is missing.  And it requires AGE to be numeric.  (There are slightly more complex ways that this approach could be adapted to character variables.)

Super User
Posts: 11,343

Re: flagging cases

[ Edited ]
Posted in reply to Yogesh2601

You may want to define "easier". Things to consider in choosing between methods for a task revolve around how easy it may be to understand the code, how robust to modify and sometimes use of resources such as memory or cpu cycles.

 

IF-then-else usually satisfies the first for single variables, not so much if you are using multiple varaibles and multiple "ors" and "ands" in the comparisons. It may also be easy to modify when adding on or two more values.

 

A similar structure is the SELECT-when which evaluates an expression a the top and then branches to the specific value:

 

select (age);

   when ('a') flag=1;

   when ('b') flag=2;

   when ('c') flag=3;

   when ('d') flag=4;

   otherwise; /* this what you would do when none of the other values is encountered*/

end;

 

Select has a big advantage if the the number of values gets moderately large in that is usually easier to identify the condition and the result quicker than if/then/else. Also the part after the condition can be a do/end code block and execute multiple statements which gets much messier with if/then/else.

You can also provide lists of values in the when such as when ('a','f','g').

 

Occasional Contributor
Posts: 5

Re: flagging cases

Posted in reply to Yogesh2601

Thank you guys for your help.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 399 views
  • 1 like
  • 4 in conversation