DATA Step, Macro, Functions and more

Create flag based on multiple observations for each subject

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 104
Accepted Solution

Create flag based on multiple observations for each subject

Hi there,

 

For your kind information, I am trying to create a flag based on multiple observations for each subject. For example, multiple observations for a subject, if any observation value is missing, one missing flag will be created as well as if any of the value is invalid, an invalid flag will be created. In my example given bleow, valid range of values for category are AAA, BBB and CCC. So, if category contains DDD must be flagged as invalid. 

data have;
length id $3. category $3. ;
infile datalines TRUNCOVER;
input id $ category $;
datalines;
101 AAA
101 BBB
101 CCC
201 AAA
201    
201 BBB
301 AAA
301    
301 DDD
;
run;

data want;
length id $3. missing_flag $1. invalid_flag $1;
infile datalines TRUNCOVER;
input id $  missing_flag $ invalid_flag $ ;
datalines;
101 0 0 
201 1 0
301 1 1
;
run;

Thank you in advance for your kind guidance. 

 

Regards,

Swain

Accepted Solutions
Solution
‎04-27-2017 09:53 AM
Super User
Posts: 5,516

Re: Create flag based on multiple observations for each subject

Posted in reply to DeepakSwain

I would suggest that you are better off if:

 

  • Your "flags" should be numeric, not character, and
  • they should be counts, not 0/1 flags

 

Getting that:

 

data want;

set have;

by id;

if first.id then do;

   missing_count=0;

   invalid_count=0;

end;

if category=' ' then missing_count + 1;

else if category not in ('AAA', 'BBB', 'CCC') then invalid_count + 1;

if last.id;

keep id missing_count invalid_count;

run;

View solution in original post


All Replies
Super User
Super User
Posts: 7,977

Re: Create flag based on multiple observations for each subject

Posted in reply to DeepakSwain

Something like:

data have;
  length id $3. category $3. ;
  infile datalines TRUNCOVER;
  input id $ category $;
datalines;
101 AAA
101 BBB
101 CCC
201 AAA
201    
201 BBB
301 AAA
301    
301 DDD
;
run;
data want (keep=id missing_flag invalid_flag);
  set have;
  by id;
  retain missing_flag invalid_flag;
  if first.id then do;
    missing_flag=0;
    invalid_flag=1;
  end;
  if category="" then missing_flag=1;
  if category not in ("AAA","BBB","CCC") then invalid_flag=0;
  if last.id then output;
run;
Solution
‎04-27-2017 09:53 AM
Super User
Posts: 5,516

Re: Create flag based on multiple observations for each subject

Posted in reply to DeepakSwain

I would suggest that you are better off if:

 

  • Your "flags" should be numeric, not character, and
  • they should be counts, not 0/1 flags

 

Getting that:

 

data want;

set have;

by id;

if first.id then do;

   missing_count=0;

   invalid_count=0;

end;

if category=' ' then missing_count + 1;

else if category not in ('AAA', 'BBB', 'CCC') then invalid_count + 1;

if last.id;

keep id missing_count invalid_count;

run;

Frequent Contributor
Posts: 104

Re: Create flag based on multiple observations for each subject

Posted in reply to Astounding

Hi Astounding,

Thank you for providing me a proactive solution by including counter. 

Regards,

Swain
Frequent Contributor
Posts: 93

Re: Create flag based on multiple observations for each subject

Posted in reply to DeepakSwain

An alternative solution, using sql code (which may be used in many statistical and database applications such as R, SPSS, MSSQL, etc.):

 

data have;
length id $3. category $3. ;
infile datalines TRUNCOVER;
input id $ category $;
datalines;
101 AAA
101 BBB
101 CCC
201 AAA
201    
201 BBB
301 AAA
301    
301 DDD
;
run;

proc sql;
create table flags as
select id
, max(case when category = "" then 1 else 0 end) as missing_flag
, max(case when category in ('DDD') then 1 else 0 end) as invalid_flag
from have
group by id;
quit;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 169 views
  • 2 likes
  • 4 in conversation