So I have this task where I have the variable "team number" and "tec". Unfortunately, the value of "tec" isn't always correct. The correct team numbers and corresponding tec- values are listed in a separate document, I'm charged with listing the of the incorrect ones.
I start by writing
proc sql feedback; select a.teamno as teamno, a.tec as tec1, b.tec as tec2 from tabell_1 as a, val2 as b where a.teamno=b.teamno and tec1 ^= tec2 ; quit;
Which would be all well and good....if each team number was assigned to one and only one tec-number, unfortunately, this isn't the case, and the following six teams can have multiple tec numbers assigned to them:
Team Tec
120 3
120 9
302 5
302 8
502 15
502 16
512 14
512 15
520 9
520 15
523 11
523 16
I want to make an exception for these values (I don't know if that's the most elegant solution), so I write:
proc sql feedback; select a.teamno as teamno, a.tec as tec1, b.tec as tec2 from tabell_1 as a, val2 as b where a.teamno=b.teamno and tec1 ^= tec2 and (a.teamno ^=120 and (a.tec=3 or 9 and b.tec=3 or 9)) ; quit;
Unfortunately, only the first part of the added condition seems to be excecuted:
NOTE: Statement transforms to: select A.teamno as teamno, A.tec as tec1 label='tec', B.tec as tec2 label='tec' from WORK.TABELL_1 A, WORK.VAL2 B where (A.teamno = B.teamno) and (A.tec1 not = B.tec2) and (A.teamno not = 120); 1155 quit;
what an I doing wrong, and is there a better way of doing this?
Right, I'm trying the following atm, I "think" this is a better solution but do tell me if I'm wrong.
First I perform a full join on two conditions, where team number "and" tec-value is equal for values in the two tables, I save the resulting join as a new table.
proc sql feedback; create table kanske as select a.teamno as teamno, a.tec as tec1 label='tec1', b.tec as tec2 label='tec2' from tabell_1 as a full join val2 as b on a.teamno=b.teamno and a.tec=b.tec; quit;
Then I make a query on this new table, for the observatoins where tec2 does not equal tec1
proc sql feedback; select teamno, tec1, tec2 from kanske where tec1^=tec2; quit;
I make a manual inspection of the results, this seems legit.
Finally I tell SAS to count every observation for which pec1=/=pec2:
proc sql feedback; select count(*) from kanske where tec1^=tec2; quit;
Can you provide some representative data of what tabell_1 and val2 looks like? And also what you want the desired result to look like given this data?
Makes it much easier to provide a usable code answer 🙂
Tabell_1 is enormous but I'll provide a snippet:
(variable) (variable) Teamno tec (variable) (variable) (variable) 710 B212610 301 5 5 17 10 1038 B212607 301 5 8 14 20 846 B212608 301 5 9 15 12 897 B213053 302 5 16 15 51 722 B213054 302 5 18 14 38 796 B215140 302 16 35 16 45 721 B213055 302 5 19 17 9 1257 B213050 302 5 12 11 22 867 B213047 302 5 10 6 58 1370 B213048 302 5 13 8 55 1369 B213049 302 5 14 9 48 898 B213052 302 5 15 15 8 919 B213051 302 8 11 12 10 332 B213058 302 5 14 19 45 333 B213057 302 5 1 18 55 334 B213056 302 5 17 18 4 1234 B211609 303 6 22 12 15
The "val2"-file looks like this:
229 19 230 9 300 5 301 5 302 5 302 8 303 6 304 6 305 6 306 4 307 4 308 4 309 4 310 4 320 9 322 4
In my query I try to compare "real" and "reported" tec-numbers, that would look something like this:
Teamno Tec1 Tec(real) 302 5 8 302 5 8 302 16 5 302 16 8 302 5 8 302 5 8 302 5 8 302 5 8 302 5 8 302 5 8 302 8 5 302 5 8 302 5 8 302 5 8 307 9 4 320 12 9
We can see that team 302 is assigned to two different areas (or "pecs"). There's no unique ID-number present in the val2-file so I can't really work through standard procedures. The only values I can list as being incorrect (with any degree of certainty) is the ones not allowed for that team number. In this section, I would only really want to include the teams with a recorded PEC-value of 16, Values of 5 and 8 could both be fine, since we don't have any unique ID-values to work with!
Right, I'm trying the following atm, I "think" this is a better solution but do tell me if I'm wrong.
First I perform a full join on two conditions, where team number "and" tec-value is equal for values in the two tables, I save the resulting join as a new table.
proc sql feedback; create table kanske as select a.teamno as teamno, a.tec as tec1 label='tec1', b.tec as tec2 label='tec2' from tabell_1 as a full join val2 as b on a.teamno=b.teamno and a.tec=b.tec; quit;
Then I make a query on this new table, for the observatoins where tec2 does not equal tec1
proc sql feedback; select teamno, tec1, tec2 from kanske where tec1^=tec2; quit;
I make a manual inspection of the results, this seems legit.
Finally I tell SAS to count every observation for which pec1=/=pec2:
proc sql feedback; select count(*) from kanske where tec1^=tec2; quit;
If your data sets were smaller, this would be a simple sort-and-merge problem:
proc sort data=table1;
by team tec;
run;
proc sort data=val;
by team tec;
run;
data want;
merge table1 (in=in1) val (in=in2);
by team tec;
if in2=0;
run;
I'm assuming the reason you aren't doing this is the size of TABLE1 and the time it would take to sort it. You can overcome that in a couple of ways. Hashing is one. Formats is another, as shown here:
data valid_vals;
set val;
start = catx(' ', team, tec);
label='VALID';
fmtname = '$found';
run;
proc format cntlin=valid_vals;
run;
This gives you a format that translates valid combinations of TEAM + TEC into the word "VALID". That's the hard part. Using the format is easier:
data want;
set table1;
where put(catx(' ', team, tec), $found.) ne 'VALID';
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.