DATA Step, Macro, Functions and more

Using SQL to capture first of many variables

Accepted Solution Solved
Reply
Regular Contributor
Posts: 165
Accepted Solution

Using SQL to capture first of many variables

I have some data structured as follows (reasons are 0/1):

 

ID     reason1      reason2     reason3   reason4     reason5

1          1                 0                 0                0             0

2          1                 1                 0                0             0

3          0                 0                 1                0             1

4          0                 0                 1                1             0

5          0                 0                 0                0             1

 

I would like to filter through the reason1-reason5 and keep the first "1" across, so my final dataset looks like this:

 

ID       reason

 1             1

 2             1

 3             3

 4             3

 5             5

 

So that even if the ID has more than 1 reason, we prioritize the earlier 1. How can I do this with SQL?

 


Accepted Solutions
Solution
‎06-20-2018 07:20 PM
Super User
Posts: 13,583

Re: Using SQL to capture first of many variables


@Melk wrote:
That was simpler than I thought! So lets say I want to prioritize 5 over 4 now, so priority will be 1 then 2 then 3 then 5 then 4 - can I make a simple edit to the code for this?

NOW you have to provide what your output would look like to demonstrate mean by "prioritize" in this fashion.

 

One way might be to change the order of the variables instead of using whichn(1, of reasons(*))

whichn(1,reason5, reason4, reason1,reason2,reason3); would return the number in this list with a result of 1 means Reason5 had a 1; 2 means reason4 had a 1 but reason5 did not and so forth.

View solution in original post


All Replies
PROC Star
Posts: 1,845

Re: Using SQL to capture first of many variables

May i ask why not datastep using whichn function?

Super User
Posts: 23,778

Re: Using SQL to capture first of many variables

With SQL you're going to have to use manual CASE statements, similar to IF/THEN.

case when reason1=1 then 1
else when reason2=1 then ... etc.
PROC Star
Posts: 1,286

Re: Using SQL to capture first of many variables

[ Edited ]

Why do you want to do this in SQL and not simply... ?

 

data have;
input ID $ reason1-reason5;
datalines;
1 1 0 0 0 0
2 1 1 0 0 0
3 0 0 1 0 1
4 0 0 1 1 0
5 0 0 0 0 1
;

data want;
   set have;
   array reasons{*} reason1-reason5;
   reason=whichn(1, of reasons[*]);
   keep ID reason;
run;
Regular Contributor
Posts: 165

Re: Using SQL to capture first of many variables

Thought it would be easier than a data step - is it not?

Esteemed Advisor
Posts: 5,543

Re: Using SQL to capture first of many variables

Note that function whichn will also work within SQL, but you will have to list the variables:

 

proc sql;
select 
    id,
    whichn(1, reason1, reason2, reason3, reason4, reason5) as reason
from have;
quit;
PG
Regular Contributor
Posts: 165

Re: Using SQL to capture first of many variables

That was simpler than I thought! So lets say I want to prioritize 5 over 4 now, so priority will be 1 then 2 then 3 then 5 then 4 - can I make a simple edit to the code for this?
PROC Star
Posts: 1,286

Re: Using SQL to capture first of many variables

When you say "prioritize 5 over 4" do you mean Column 5 and 4?

 

So if both Column 4 and 5 are 1 and the rest are 0, you want reason=5, correct?

Regular Contributor
Posts: 165

Re: Using SQL to capture first of many variables

I mean reason5 over reason4 (based on var name).

Esteemed Advisor
Posts: 5,543

Re: Using SQL to capture first of many variables

If you want to reverse the order, do like this (or its equivalent with a datastep)

 

proc sql;
select 
    id,
    6 - whichn(1, reason5, reason4, reason3, reason2, reason1) as reason
from have;
quit;
PG
Esteemed Advisor
Posts: 5,543

Re: Using SQL to capture first of many variables

For an arbitrary priority order, I suggest you use two arrays :

 

data want;
   set have;
   /* In order of priority */
   array reasons{*} reason1 reason2 reason3 reason5 reason4;
   /* Reason number in reasons list, padded with special value 0 */
   array order{5} (1,2,3,5,4,0);
   reason = order{whichn(1, of reasons[*], 1)};
   keep ID reason;
run;
PG
Super User
Posts: 23,778

Re: Using SQL to capture first of many variables


@Melk wrote:
That was simpler than I thought! So lets say I want to prioritize 5 over 4 now, so priority will be 1 then 2 then 3 then 5 then 4 - can I make a simple edit to the code for this?

Try reordering your list in the array based on priorities, not sure if that will work or not. 

Solution
‎06-20-2018 07:20 PM
Super User
Posts: 13,583

Re: Using SQL to capture first of many variables


@Melk wrote:
That was simpler than I thought! So lets say I want to prioritize 5 over 4 now, so priority will be 1 then 2 then 3 then 5 then 4 - can I make a simple edit to the code for this?

NOW you have to provide what your output would look like to demonstrate mean by "prioritize" in this fashion.

 

One way might be to change the order of the variables instead of using whichn(1, of reasons(*))

whichn(1,reason5, reason4, reason1,reason2,reason3); would return the number in this list with a result of 1 means Reason5 had a 1; 2 means reason4 had a 1 but reason5 did not and so forth.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 178 views
  • 5 likes
  • 6 in conversation