DATA Step, Macro, Functions and more

Creating an indicator variable based on whether a row exists

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 9
Accepted Solution

Creating an indicator variable based on whether a row exists

Hi all,

 

I have a specific question that is not easy to google. Most certainly someone has already asked it, so apologies for asking it twice.

 

I have a dataset that looks like this:

 

Client Code

1 A

1 B

2 A

3 A

3 C

3 D

4 A

5 A

5 B

...

 

For every Client number I have the code A, so I want to make my dataset a little smaller and easier to handle by creating indicators if a client also has the codes B, C and/or D.

 

So, my final dataset (after removing the redundant rows) looks like this:

 

Client Code ind_B ind_C ind_D

1 A 1 0 0

2 A 0 0 0

3 A 0 1 1

4 A 0 0 0

5 A 1 0 0

...

 

I succeeded to do this by creating 2 datasets and joining them, but that is not time and memory efficient. I prefer not to loop over all clients and I'm convinced there is a better way to do this, I just cannot come up with it.

 

Can anyone help me?


Accepted Solutions
Solution
‎04-03-2018 10:19 AM
Super User
Posts: 10,689

Re: Creating an indicator variable based on whether a row exists

data have;
input client code $;
cards;
1 A
1 B
2 A
3 A
3 C
3 D
4 A
5 A
5 B
;
run;
proc sql;
select client,'A' as code,
 sum(code='B') as B,sum(code='C') as C,sum(code='D') as D 
  from have
   group by client;
quit;

View solution in original post


All Replies
Super User
Super User
Posts: 9,416

Re: Creating an indicator variable based on whether a row exists

TBH I would strongly suspect that your want dataset here would be larger than the have dataset.  In the have dataset you are only capturing data where it appears, in the want data you are creating empty data items where there is no data.  For example:
Client 1 has two datapoints, a and b, but in the want dataset the same client has 3 datapoints, b, c, and d two of which are redundant.  Unless there is a very good reason (and I can't see one) to have that want structure, I would stick with your first, and filter out where code="A" if they all have it, as that is not needed at all.  This would effecively shrink your have to 4*2 fields which is far smaller than your want of 5*5.  

 

Occasional Contributor
Posts: 9

Re: Creating an indicator variable based on whether a row exists

You are right in my example.

 

I did not mention though that each row contains much more columns with client information.

These columns are independent of the code, so the dataset contains a lot of duplicative information.

Therefore it is more efficient to create an indicator such that I can remove the rows that have not code A.

Super User
Super User
Posts: 9,416

Re: Creating an indicator variable based on whether a row exists

Sorry, you going to have to clarify, test data/required output etc.  This:

"I can remove the rows that have not code A"

To me resolves as:

proc sql;
  create table want as
select *
from have where client not in (select distinct client from have where code="A"); quit;

Ie create a copy of the data with only clients who have an "A" record.

Super User
Posts: 9,890

Re: Creating an indicator variable based on whether a row exists

See this:

data have;
input client code $;
cards;
1 A
1 B
2 A
3 A
3 C
3 D
4 A
5 A
5 B
;
run;

data int;
set have (where=(code ne 'A')) ;
val = 1;
run;

proc transpose
  data=int
  out=codes (drop=_name_)
  prefix=ind_
;
by client;
id code;
var val;
run;

data want;
merge
  have (
    in=a
    where=(code = 'A')
  )
  codes
;
by client;
if a;
run;

proc print data=want noobs;
run;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Occasional Contributor
Posts: 9

Re: Creating an indicator variable based on whether a row exists

Posted in reply to KurtBremser
Thank you, this gives the output I needed, although I prefer Ksharp's answer
Solution
‎04-03-2018 10:19 AM
Super User
Posts: 10,689

Re: Creating an indicator variable based on whether a row exists

data have;
input client code $;
cards;
1 A
1 B
2 A
3 A
3 C
3 D
4 A
5 A
5 B
;
run;
proc sql;
select client,'A' as code,
 sum(code='B') as B,sum(code='C') as C,sum(code='D') as D 
  from have
   group by client;
quit;
Occasional Contributor
Posts: 9

Re: Creating an indicator variable based on whether a row exists

Thanks this is exactly what I wanted
Respected Advisor
Posts: 3,846

Re: Creating an indicator variable based on whether a row exists

Seems like a method works without knowing the values of CODE would be useful.

 

data have;
   input client code :$1. @@;
   cards;
1 A 1 B 2 A 3 A 3 C 3 D 4 A 5 A 5 B
;;;;
   run;
proc transreg data=have design;
   id client;
   model class(code/zero=none CPREFIX=0);
   output out=design(drop=Int: _:);
   run;
proc print;
   run;
proc means noprint nway;
   class client;
   output out=want(drop=_:) sum=;
   run;
proc print;
   run;

Capture.PNG

 

Super User
Posts: 10,689

Re: Creating an indicator variable based on whether a row exists

Posted in reply to data_null__

John King,

Long time no see.

Sure . That would be easy. 

1) PROC FREQ to get all these levels.

2)CALL EXECUTE() to make it happen .

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 108 views
  • 1 like
  • 5 in conversation