Hello,
I have requirement to convert If statements to CASE in proc sql.
My data looks as below
var1 | var2 | var3 | sum_var | condition |
a | b | c | 10 | If sum_var <5 |
a | b | c | 20 | If sum_var > 15 |
a | b | c | 30 | If sum_var > 35 |
d | e | f | 40 | If sum_var > 50 |
d | e | g | 50 | If sum_var <80 |
Now I need to create output table which satisfies group and cases in proc sql as this code will run in DI studio.
So my var1, var2 and var3 are combinations of variables. This cant be hardcoded. So I would need output something as below
var1 | var2 | var3 | sum_var | condition |
a | b | c | 10 | If sum_var <5 |
a | b | c | 20 | If sum_var > 15 |
d | e | g | 50 | If sum_var <8 |
Need your advice.
P.S.Sent from blackberry. Please ignore Spelling mistakes and typos.
Assuming condition is always a simple inequality comparison, you could use something like this:
data have;
length var1 var2 var3 $4 condition $20;
input var1 var2 var3 sum_var condition &;
datalines;
a b c 10 If sum_var <5
a b c 20 If sum_var > 15
a b c 30 If sum_var > 35
d e f 40 If sum_var > 50
d e g 50 If sum_var <80
;
proc sql;
create table want(drop= target keep) as
select *,
input(scan(condition,3,' <>='), best.) as target,
case
when index(condition,'<=') > 0
then sum_var <= calculated target
when index(condition,'>=') > 0
then sum_var >= calculated target
when index(condition,'<') > 0
then sum_var < calculated target
when index(condition,'>') > 0
then sum_var > calculated target
else 0
end
as keep
from have
where calculated keep;
quit;
PG
You need to clarify your output and the rules you're trying to follow.
The condition is a character variable in the data set?
I don't understand the SQL restriction though, even though its DI studio, if its a code node or User written transformation you can still use data step code can't you?
Completely agree with you Reeza. But since this piece of code is going to be used in extract transformation, my client is adamant on sql code.
Also, you got req correctly. I have combination of these 3 vars. The conditions are present in another variable so we need to output only observations which satisfies conditions present there.
Hope I am making sense
Your output doesn't seem to match your data requirements.
Can you create a format that would do it, or you have to do straight SQL. I think formats would be better...or macro variable generation for a datastep if code.
You need to post all possible conditions though, otherwise this will be a long back and forth.
Assuming condition is always a simple inequality comparison, you could use something like this:
data have;
length var1 var2 var3 $4 condition $20;
input var1 var2 var3 sum_var condition &;
datalines;
a b c 10 If sum_var <5
a b c 20 If sum_var > 15
a b c 30 If sum_var > 35
d e f 40 If sum_var > 50
d e g 50 If sum_var <80
;
proc sql;
create table want(drop= target keep) as
select *,
input(scan(condition,3,' <>='), best.) as target,
case
when index(condition,'<=') > 0
then sum_var <= calculated target
when index(condition,'>=') > 0
then sum_var >= calculated target
when index(condition,'<') > 0
then sum_var < calculated target
when index(condition,'>') > 0
then sum_var > calculated target
else 0
end
as keep
from have
where calculated keep;
quit;
PG
Worked like charm. Although I have several other conditions not only inequality but your logic has helped a lot.. thanks
var1 | var2 | var3 | sum_var | condition |
a | b | c | 20 | If sum_var > 15 |
d | e | g | 50 | If sum_var <80 |
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.