Thanks art297
First let me apologize to everyone, I am green when it comes to SAS code so forgive my naivete, but this code is confusing to me, Usint this code means I do or do not have to seperate the "TYPE"? I am also confused by the "from have as a" statement as well as the " group by key1,key2,key3,key4) as b on a.key1 = b.key1" statement.
@art297 I'm still under the impression that the OP wants one row with seven groups of variables for each key combination, not 7 rows per combo with one set of vars.
@mkeintz, you could of course be correct, but that isn't what I gleaned from the OP's example. Of course, if my latest suggested code does what he/she wants, a simple proc transpose of the results would satisfy that requirement.
Art, CEO, AnalystFinder.com
p.s. Are you going to Orlando? I didn't think I was, but just got a nice surprise from SAS Canada and will, after all, be attending.
You didn't show/mention type in your example dataset. Adding it would be simple.
The code simply looks for two things: (1) all instances sharing the same key1 values and (2) all instances sharing all four key values. It creates counts for each (i.e., rec_count and comp_count) and only selects those key1 values that have equal rec_count and comp_count values (i.e., don't have any records that have multiple key2, key3 and key4 values).
The from have as a and from have as b statements are simply referring to the two versions of your data (i.e., the first instance where it is getting the counts of all records that share the same key1 and type values AND referring to it later as file a, the other instance where it is getting the counts of all records that share the same key1, key2, key3, key4 and type values AND referring to it later as file b).
I'd think that the following modification of the code would do what you need regarding the type variable:
proc sql;
create table want (drop=rec_count comp_count) as
select *, count(*) as rec_count
from have as a
join (
select key1,key2,key3,key4,type,count(*) as comp_count
from have
where type in ('A','B','C','D','E','F','G')
group by key1,key2,key3,key4,type) as b
on a.key1 = b.key1 and a.type eq b.type
group by a.key1,a.type
having rec_count eq comp_count and
type in ('A','B','C','D','E','F','G')
;
quit;
Art, CEO, AnalystFinder.com
Thanks all
The types were used because i began the project thinking that it was best to seperate the types. The types are actually already in my dataset but thought it would be easier bumping them up against each other thats why I wanted to differentiate them but I am finding that its use is not really necessary in the grand scheme of things.
@wheddingsjr wrote:
Thanks all
The types were used because i began the project thinking that it was best to seperate the types. The types are actually already in my dataset but thought it would be easier bumping them up against each other thats why I wanted to differentiate them but I am finding that its use is not really necessary in the grand scheme of things.
Again, if you want the TYPES set up side by side, this a Transpose task rather than all the data steps and sql steps.
If you want code look at PROC TRANSPOSE. And you're correct, typically the long format is just as useful if not easier to work with than a wide format.
hHi again Art
I was almost there but at the very end got this error message:
415 proc sql;
416 create table results (drop=rec_count comp_count) as
417 select *, count(*) as rec_count
418 from Step2 as a
419 join (select MEMNO,BEGDATE,SProv_Spec,TAXID,count(*) as comp_count
420 from Step2
421 group by MEMNO,BEGDATE,SProv_Spec,TAXID) as b
422 on a.MEMNO = b.MEMNO
423 group by MEMNO
424 having rec_count eq comp_count;
ERROR: Ambiguous reference, column MEMNO is in more than one table.
(tired of writing key1 etc because that juat may be the disconnect so using the real field names)
My first guess would be to just change (toward the bottom of your code):
group by MEMNO
to
group by a.MEMNO
Art, CEO, AnalystFinder.com
Do you mean that you want only those keys with all 7 types 'A' to 'G' ?
There are several ways to do it:
1) add sort of step2 dataset, before the 7 sql selections partA to partG and continue with the merge with sligh changes:
data want;
merge partA (where=(type='A' )) in=inA
partB (where=(type='B')) in=inB
partC (where=(type='C')) in=inC
......
partG(where=(type='G')) in=inG
;
by key1 key2 key3 key4; /* equvalent to sql partA.1 partA.2 partA.3 partA.4 */
if inA and inB and inC and ... and inG;
run;
OR
2) After sorting and selectcting by type:
proc sort data=step2(where=(type in ('A', 'B', 'C', 'D' 'E' 'F' 'G')))
out=TEMP;
by key1 key2 key3 key4;
run;
add two steps:
2.1) check wether all 7 types exist in 4 keys group:
data keys;
set TEMP;
by key1 key2 key3 key4;
length types $7;
retain types;
if first.key4 then types = ' ';
select (type);
when ("A") substr(types,1,1) = 'A';
when ("B") substr(types,2,1) = 'B';
when ("C") substr(types,3,1) = 'C';
.....
when ("G") substr(types,7,1) = 'G';
end;
if last.key4 and types = 'ABCDEFG' then output;
run;
2.2) dataset KEYS holds now only those with all 7 types.
next merge will select them only:
data WANT;
merge TEMP (in=inT)
KEYS (in=inK);
by key1 key2 key3 key4;
if inK;
run;
Both methods should have same results.
You are invited to check them and decide what is your preference;
Hi Shmuel
No I do not need the type...it was my fault for including it. I did so because I was under the impression that in order to do what I wanted to do I needed to have the 7 different tables until I learned in this forum that I didnt require it....
I hadn't tested the code before, but did this time. I was correct in my last suggestion. The following worked for me:
proc sql; create table want (drop=rec_count comp_count) as select *, count(*) as rec_count from have as a join ( select memno,begdate,SProv_Spec,TAXID,count(*) as comp_count from have group by memno,begdate,SProv_Spec,TAXID) as b on a.memno = b.memno group by a.memno having rec_count eq comp_count ; quit;
Art, CEO, AnalystFinder.com
Hi Art
It worked, unfortunately there still remains lines that do not fit the criteria....the line count however is significantly less than before though.
Glad to hear that it worked but, if there is still a problem, you'd have to post examples of your have and want datasets in order for any of us to figure out what might be wrong.
Art, CEO, AnalystFinder.com
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.