Hello,
I have requirement to convert If statements to CASE in proc sql.
My data looks as below
var1 | var2 | var3 | sum_var | condition |
a | b | c | 10 | If sum_var <5 |
a | b | c | 20 | If sum_var > 15 |
a | b | c | 30 | If sum_var > 35 |
d | e | f | 40 | If sum_var > 50 |
d | e | g | 50 | If sum_var <80 |
Now I need to create output table which satisfies group and cases in proc sql as this code will run in DI studio.
So my var1, var2 and var3 are combinations of variables. This cant be hardcoded. So I would need output something as below
var1 | var2 | var3 | sum_var | condition |
a | b | c | 10 | If sum_var <5 |
a | b | c | 20 | If sum_var > 15 |
d | e | g | 50 | If sum_var <8 |
Need your advice.
P.S.Sent from blackberry. Please ignore Spelling mistakes and typos.
Assuming condition is always a simple inequality comparison, you could use something like this:
data have;
length var1 var2 var3 $4 condition $20;
input var1 var2 var3 sum_var condition &;
datalines;
a b c 10 If sum_var <5
a b c 20 If sum_var > 15
a b c 30 If sum_var > 35
d e f 40 If sum_var > 50
d e g 50 If sum_var <80
;
proc sql;
create table want(drop= target keep) as
select *,
input(scan(condition,3,' <>='), best.) as target,
case
when index(condition,'<=') > 0
then sum_var <= calculated target
when index(condition,'>=') > 0
then sum_var >= calculated target
when index(condition,'<') > 0
then sum_var < calculated target
when index(condition,'>') > 0
then sum_var > calculated target
else 0
end
as keep
from have
where calculated keep;
quit;
PG
You need to clarify your output and the rules you're trying to follow.
The condition is a character variable in the data set?
I don't understand the SQL restriction though, even though its DI studio, if its a code node or User written transformation you can still use data step code can't you?
Completely agree with you Reeza. But since this piece of code is going to be used in extract transformation, my client is adamant on sql code.
Also, you got req correctly. I have combination of these 3 vars. The conditions are present in another variable so we need to output only observations which satisfies conditions present there.
Hope I am making sense
Your output doesn't seem to match your data requirements.
Can you create a format that would do it, or you have to do straight SQL. I think formats would be better...or macro variable generation for a datastep if code.
You need to post all possible conditions though, otherwise this will be a long back and forth.
Assuming condition is always a simple inequality comparison, you could use something like this:
data have;
length var1 var2 var3 $4 condition $20;
input var1 var2 var3 sum_var condition &;
datalines;
a b c 10 If sum_var <5
a b c 20 If sum_var > 15
a b c 30 If sum_var > 35
d e f 40 If sum_var > 50
d e g 50 If sum_var <80
;
proc sql;
create table want(drop= target keep) as
select *,
input(scan(condition,3,' <>='), best.) as target,
case
when index(condition,'<=') > 0
then sum_var <= calculated target
when index(condition,'>=') > 0
then sum_var >= calculated target
when index(condition,'<') > 0
then sum_var < calculated target
when index(condition,'>') > 0
then sum_var > calculated target
else 0
end
as keep
from have
where calculated keep;
quit;
PG
Worked like charm. Although I have several other conditions not only inequality but your logic has helped a lot.. thanks
var1 | var2 | var3 | sum_var | condition |
a | b | c | 20 | If sum_var > 15 |
d | e | g | 50 | If sum_var <80 |
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.