So I have this task where I have the variable "team number" and "tec". Unfortunately, the value of "tec" isn't always correct. The correct team numbers and corresponding tec- values are listed in a separate document, I'm charged with listing the of the incorrect ones.
I start by writing
proc sql feedback; select a.teamno as teamno, a.tec as tec1, b.tec as tec2 from tabell_1 as a, val2 as b where a.teamno=b.teamno and tec1 ^= tec2 ; quit;
Which would be all well and good....if each team number was assigned to one and only one tec-number, unfortunately, this isn't the case, and the following six teams can have multiple tec numbers assigned to them:
Team Tec
120 3
120 9
302 5
302 8
502 15
502 16
512 14
512 15
520 9
520 15
523 11
523 16
I want to make an exception for these values (I don't know if that's the most elegant solution), so I write:
proc sql feedback; select a.teamno as teamno, a.tec as tec1, b.tec as tec2 from tabell_1 as a, val2 as b where a.teamno=b.teamno and tec1 ^= tec2 and (a.teamno ^=120 and (a.tec=3 or 9 and b.tec=3 or 9)) ; quit;
Unfortunately, only the first part of the added condition seems to be excecuted:
NOTE: Statement transforms to: select A.teamno as teamno, A.tec as tec1 label='tec', B.tec as tec2 label='tec' from WORK.TABELL_1 A, WORK.VAL2 B where (A.teamno = B.teamno) and (A.tec1 not = B.tec2) and (A.teamno not = 120); 1155 quit;
what an I doing wrong, and is there a better way of doing this?
Right, I'm trying the following atm, I "think" this is a better solution but do tell me if I'm wrong.
First I perform a full join on two conditions, where team number "and" tec-value is equal for values in the two tables, I save the resulting join as a new table.
proc sql feedback; create table kanske as select a.teamno as teamno, a.tec as tec1 label='tec1', b.tec as tec2 label='tec2' from tabell_1 as a full join val2 as b on a.teamno=b.teamno and a.tec=b.tec; quit;
Then I make a query on this new table, for the observatoins where tec2 does not equal tec1
proc sql feedback; select teamno, tec1, tec2 from kanske where tec1^=tec2; quit;
I make a manual inspection of the results, this seems legit.
Finally I tell SAS to count every observation for which pec1=/=pec2:
proc sql feedback; select count(*) from kanske where tec1^=tec2; quit;
Can you provide some representative data of what tabell_1 and val2 looks like? And also what you want the desired result to look like given this data?
Makes it much easier to provide a usable code answer 🙂
Tabell_1 is enormous but I'll provide a snippet:
(variable) (variable) Teamno tec (variable) (variable) (variable) 710 B212610 301 5 5 17 10 1038 B212607 301 5 8 14 20 846 B212608 301 5 9 15 12 897 B213053 302 5 16 15 51 722 B213054 302 5 18 14 38 796 B215140 302 16 35 16 45 721 B213055 302 5 19 17 9 1257 B213050 302 5 12 11 22 867 B213047 302 5 10 6 58 1370 B213048 302 5 13 8 55 1369 B213049 302 5 14 9 48 898 B213052 302 5 15 15 8 919 B213051 302 8 11 12 10 332 B213058 302 5 14 19 45 333 B213057 302 5 1 18 55 334 B213056 302 5 17 18 4 1234 B211609 303 6 22 12 15
The "val2"-file looks like this:
229 19 230 9 300 5 301 5 302 5 302 8 303 6 304 6 305 6 306 4 307 4 308 4 309 4 310 4 320 9 322 4
In my query I try to compare "real" and "reported" tec-numbers, that would look something like this:
Teamno Tec1 Tec(real) 302 5 8 302 5 8 302 16 5 302 16 8 302 5 8 302 5 8 302 5 8 302 5 8 302 5 8 302 5 8 302 8 5 302 5 8 302 5 8 302 5 8 307 9 4 320 12 9
We can see that team 302 is assigned to two different areas (or "pecs"). There's no unique ID-number present in the val2-file so I can't really work through standard procedures. The only values I can list as being incorrect (with any degree of certainty) is the ones not allowed for that team number. In this section, I would only really want to include the teams with a recorded PEC-value of 16, Values of 5 and 8 could both be fine, since we don't have any unique ID-values to work with!
Right, I'm trying the following atm, I "think" this is a better solution but do tell me if I'm wrong.
First I perform a full join on two conditions, where team number "and" tec-value is equal for values in the two tables, I save the resulting join as a new table.
proc sql feedback; create table kanske as select a.teamno as teamno, a.tec as tec1 label='tec1', b.tec as tec2 label='tec2' from tabell_1 as a full join val2 as b on a.teamno=b.teamno and a.tec=b.tec; quit;
Then I make a query on this new table, for the observatoins where tec2 does not equal tec1
proc sql feedback; select teamno, tec1, tec2 from kanske where tec1^=tec2; quit;
I make a manual inspection of the results, this seems legit.
Finally I tell SAS to count every observation for which pec1=/=pec2:
proc sql feedback; select count(*) from kanske where tec1^=tec2; quit;
If your data sets were smaller, this would be a simple sort-and-merge problem:
proc sort data=table1;
by team tec;
run;
proc sort data=val;
by team tec;
run;
data want;
merge table1 (in=in1) val (in=in2);
by team tec;
if in2=0;
run;
I'm assuming the reason you aren't doing this is the size of TABLE1 and the time it would take to sort it. You can overcome that in a couple of ways. Hashing is one. Formats is another, as shown here:
data valid_vals;
set val;
start = catx(' ', team, tec);
label='VALID';
fmtname = '$found';
run;
proc format cntlin=valid_vals;
run;
This gives you a format that translates valid combinations of TEAM + TEC into the word "VALID". That's the hard part. Using the format is easier:
data want;
set table1;
where put(catx(' ', team, tec), $found.) ne 'VALID';
run;
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.