Hello~
I need help finding specific records in a dataset:
I have one dataset called CRASH with contains auto crash records. The dataset has crashid, driverid, and atfaultYN. The crashid is unique for two records: the at fault driver and the not at fault driver. Each record has the driverid as well as the atfaultYN variable which = 0 for atfault and 1 for not at fault. Of course, one driver can have many crashes so driverid is repeated throughout the dataset but the crashID is unique to the two records of each crash. Because the analysis was previously on drivers with high frequency of not at fault crashes to determine fraud, we have another dataset REAREND which has crash records for the drivers who have high frequency of not at fault crashes.
So for example, I have:
CRASH
Crashid driverID AtfaultYN
110 2 0
110 91 1
118 6 1
118 210 0
122 91 1
122 13 0
256 87 0
256 123 1
REAREND
Crashid driverID AtfaultYN
110 91 1
118 6 1
122 91 1
Now our analysis is on the drivers at fault. I need to extract the crash record of the at fault drivers (atfaultYN=0) from CRASH for each crash for the not at fault drivers in REAREND.
So I need:
VICTIM
Crashid driverID atfaultYN
110 2 0
118 210 0
122 13 0
There are 3 million crash records and only 796k crash records that I need, so I don't want the entire dataset, just the at fault crash records for the driverid in REAREND.
Thanks in advance for your help!
-C
Use a SQL query to select records with the same crashID AND the atfaultYN is 0.
proc sql;
create table want as
select * from crash
where atfaultyn=0 and crashid in (Select crashId from rearend);
quit;
Use a SQL query to select records with the same crashID AND the atfaultYN is 0.
proc sql;
create table want as
select * from crash
where atfaultyn=0 and crashid in (Select crashId from rearend);
quit;
If these are SAS tables then you could also use a SAS hash table for this.
data CRASH;
input Crashid driverID AtfaultYN;
datalines;
110 2 0
110 91 1
118 6 1
118 210 0
122 91 1
122 13 0
256 87 0
256 123 1
;
run;
data REAREND;
input Crashid driverID AtfaultYN;
datalines;
110 91 1
118 6 1
122 91 1
;
run;
data want(drop=_:);
set crash(where=(atfaultYN=0));
if _n_=1 then do;
dcl hash rearend (dataset:'rearend(where=(AtfaultYN=1))');
_rc=rearend.defineKey('Crashid');
_rc=rearend.defineDone();
end;
if rearend.check()=0 then output;
run;
How about:
data want;
merge
CRASH (
in=a
where=(atfaultYN=0)
)
REAREND (
in=b
keep=crashid
)
;
by crashid;
if a and b;
run;
Thank you all very much!
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.