I have the following groupings of data (fictitious data) and need a little help.
Value1 Value2 Value3 Value4 Value5 Value6
01 123 00 2 1234 1
01 123 01 3 1235 1
01 123 02 3 1234 2
02 124 00 2 1234 1
02 124 01 3 1236 1
02 124 02 3 1235 1
03 125 00 2 1230 1
03 125 01 3 1234 1
03 125 03 5 1235 1
The Value2 variable will always be the same within each group (account), such as shown above. Where the Value4 = 2 and Value5 = 1234, then I have to include ALL of the observations in each group (i.e., all three obs in the first two groups above) into an output SAS dataset. The last grouping of data would NOT be included.
Any help/examples are appreciated (have not yet done something like this in SAS; I'm a novice user).
Forgot the second parenthesis in the second "sum" line (typo). This produced what I needed:
proc sql;
create table want as
select * from have
group by value1,value2 having
(sum(catx(' ',value4,value5,value6) = '3 1234 1') or
sum(catx(' ',value4,value5,value6) = '5 1234 1')) and
(sum(value4 = '2' and value5 ne '1234'));
quit;
WLP165, thanks for coming back to the community. I hope you find an answer to your question soon.
proc sql;
create table want as
select * from have
where value1 in (select value1 from have where value4=2 and value5=1234);
quit;
Thanks so much for the help!
I expanded the data example somewhat to be more realistic. See below. Given the same value4 and value5 extract criteria, the first group of obs for value1 = 01 would be pulled, the first group of obs where value1=02 would be pulled, and the second group of obs where value=03 would be pulled. Would this change your code solution any?
Value1 Value2 Value3 Value4 Value5 Value6
01 123 00 2 1234 1
01 123 01 3 1235 1
01 123 02 3 1234 2
01 127 00 2 1237 1
01 127 01 3 1234 1
01 127 02 3 1234 1
02 124 00 2 1234 1
02 124 01 3 1236 1
02 124 02 3 1235 1
02 128 00 2 1238 1
02 128 01 3 1234 1
02 128 02 3 1234 1
03 125 00 2 1230 1
03 125 01 3 1234 1
03 125 03 5 1235 1
03 129 00 2 1234 1
03 129 01 3 1234 1
03 129 02 3 1234 1
If I correctly understand what you want to do, one minor correction needed to be made for stat's proposed code to work, namely:
proc sql;
create table want as
select * from have
where value2 in (select value2 from have where value4=2 and value5=1234);
quit;
data have; input (Value1 Value2 Value3 Value4 Value5 Value6 ) ($); cards; 01 123 00 2 1234 1 01 123 01 3 1235 1 01 123 02 3 1234 2 01 127 00 2 1237 1 01 127 01 3 1234 1 01 127 02 3 1234 1 02 124 00 2 1234 1 02 124 01 3 1236 1 02 124 02 3 1235 1 02 128 00 2 1238 1 02 128 01 3 1234 1 02 128 02 3 1234 1 03 125 00 2 1230 1 03 125 01 3 1234 1 03 125 03 5 1235 1 03 129 00 2 1234 1 03 129 01 3 1234 1 03 129 02 3 1234 1 ; run; proc sql; create table want as select * from have group by value1,value2 having sum(catx(' ',value4,value5)='2 1234') gt 0; quit;
Xia Keshan
I tried Xia's code. I do get a SAS dataset created, but with no observations and six columns. Also, the SAS Log shows a message that says "The query requires remerging summary statistics back with the original data."
One additional note. I bring in the data from an external text file via a data step using infile and input statements. However, the suggested Proc SQL code used was from Xia.
William
Are Value4 and Value5 both character variables? Also, are both Value1 and Value2 needed to identity a unique group, or just Value2?
The remerging summary statistics note is simply a note produced when incorporating summary functions. Nothing to be concerned about. But not getting the desired data is definitely a concern.
The Value4 and Value5 are both character variables. Also, yes, both Value1 and Value2 would be needed to identify unique groups. The Value1 is basically a fictitious code (for, let's say, a State), and Value2 is simply a unique fictitious number identifier of each group within each Value1. In other words, there will be many unique Value2's within each Value1.
My actual data does have Value4 of 2s, which are also inputted as character format. The Value5 values (also inputted as character) are actually six-digit numbers (just did them as four-digits here for illustrative purposes). Also, there is no summing done in the mathematical sense. Just trying to extract certain observations (records) from a database based on the criteria noted, and then output the records to a separate dataset.
William
's code is simply counting the number of times that value4 and value5 contain a 2 and 1234, respectively, within the groups records. If the combination exists at least once, all of the groups records are output. You now said that value5 is actually a 6 character variable, thus obviously 1234 won't match. I think you have to provide a better example of what your data look like and what your criteria are. His code can handle any length variable.
My apologies, I inadvertently used an incorrect Value5 in my extract criteria. The Proc SQL code worked and got what I needed. Thanks so much for your help!
William
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.