Using the above 18-line data example, I am currently trying to do a slightly different data extract. This is a little more involved but Value1 and Value2 are still used to identify each grouping and all variables are still character. The new extract criteria is: if Value4 is equal to 3 or 5, and Value5 is equal to 1234, and Value6 is equal to 1, and if Value4 = 2 and Value5 is NOT equal to 1234, then write that group to the output SAS dataset.
I can't figure out how to handle/incorporate the latter condition (i.e., "and if Value4 = 2 and Value5 is NOT equal to 1234") with the set of conditions before it.
William
Below is the Proc SQL code I was trying to use for the above conditions. I keep getting an empty output dataset.
proc sql;
create table want as
select * from have
group by value1,value2 having
sum (catx (' ',value4,value5,value6) = '3 1234 1') and
sum (catx (' ',value4,value5,value6) = '5 1234 1') and
sum (catx (' ',value4) = '2' and sum (catx (' ',value5) <> '1234') gt 0;
quit;
Is the following what you want to use?:
proc sql;
create table want as
select * from have
group by value1,value2 having
sum (catx (' ',value4,value5,value6) = '3 1234 1') or
sum (catx (' ',value4,value5,value6) = '5 1234 1') or
(sum (catt (value4) = '2') and sum (catt (value5) <> '1234'))
;
quit;
Hello,
I tried your suggested Proc SQL code above, but I get all 18 observations written to the output dataset. Given the newer extract criteria, would expect the groups of observations to be excluded from the output dataset would be:
- first grouping of data (first three obs) under Value1 = 01 due to the first observation in the group having Value4 = 2 and Value5 = 1234, and the third observation having Value6 = 2), and
- first grouping of data (first three obs) under Value1 = 02 due to the first observation in the group having Value5 = 1234, and the second and third observations having Value5 = 1236 and 1235, respectively, and
- second grouping of data (second three obs) under Value1 = 03 due to the first observation in the group having Value5 = 1234
The other three groups should make it into the output dataset. I guess this extract is a little trickier.
William
Still not sure if I correctly understand your selection rules. Is the following what you are trying to accomplish?:
proc sql;
create table want as
select * from have
group by value1,value2 having
(sum (catx (' ',value4,value5,value6) = '3 1234 1') or
sum (catx (' ',value4,value5,value6) = '5 1234 1') ) and
sum (catx (' ',value4,value5) = '2 1234') eq 0
;
quit;
It's a little trickier extract, but this looks pretty close. For the last "sum" line, the criteria has to be Value4 equal to 2 and Value5 not equal to '1234'. Out of curiosity, what does the "eq 0" part do?
William
My misunderstanding! The 0 is the result of the sum statement. In this case it is saying that a group can't contain a record that has a Value4 eq to 2 and a Value5 eq 1234. Does the following match what you are looking for?:
proc sql;
create table want as
select * from have
group by value1,value2 having
(sum (catx (' ',value4,value5,value6) = '3 1234 1') or
sum (catx (' ',value4,value5,value6) = '5 1234 1') ) and
not ( value4 = '2' and value5 ne '1234')
;
quit;
I modified the last post three times. The current version with not ( value4 = '2' and value5 ne '1234') is what I think you might be looking for.
Hello,
I played around some with your Proc SQL example and came up with the below which is now getting the correct results. The observation that has value4 = '2' and value5 ne '1234' also has to be in the group written to the output dataset given that the other two conditions are also met. I end up getting nine observations written to the output dataset which also has the correct groups in it.
proc sql;
create table want as
select * from have
group by value1,value2 having
(sum(catx(' ',value4,value5,value6) = '3 1234 1') or
sum(catx(' ',value4,value5,value6) = '5 1234 1') and
(sum(value4 = '2' and value5 ne '1234'));
quit;
William
Post the code that actually produced what you wanted. The code you posted wouldn't run as it has unbalanced parentheses.
Forgot the second parenthesis in the second "sum" line (typo). This produced what I needed:
proc sql;
create table want as
select * from have
group by value1,value2 having
(sum(catx(' ',value4,value5,value6) = '3 1234 1') or
sum(catx(' ',value4,value5,value6) = '5 1234 1')) and
(sum(value4 = '2' and value5 ne '1234'));
quit;
When I run that I get 0 records, regardless of whether the catx statements use ' ' or ' '. With two spaces (as in your posts) it shouldn't work at all, regardless, but I get 0 records either way.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.