Hello everyone,
I am trying to match 2 data sets that I had previously split in two : Active and Inactive (companies).
I split them because I needed to check some info on both and now I want to merge them by matching their SICH code and their Fyear .(as you can see in the pictures)
I had previously used SQL to match data but can't figure out why my program is not working now.
Here is my code :
proc sql;
create table Matching as
select distinct b.*
from Active as a, Inactive as b
where a.SICH=b.SICH and ((((a.DLDTE)-3)<= b.fyear) and (b.fyear <= (a.DLDTE)));
quit;
My goal is to be able to match Inactive companies with Active companies that sahre the same SICH number and also a pretty close Fyear year so they are comparable to one another for research
Please post code or log results directly into the forum as text in a code box opened using the forum's {I} icon. Should be much less work than creating an image of text and trying to use that.
Text is much better because we can edit that easily or point out where the issues are.
Data is best represented as data step code. That way we can recreate your data and actually test code.
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have.
We do not need a lot of records, just enough to cover the use cases you describe, and preferably only the variables that are needed for the exercise.
Since one of your pictures only shows missing values of DLDTE in ACTIVE then bits such as DLDTE-3 are pretty meaningless as the result would be missing and any value of Fyear is not less than missing. And why compare DLDTE when you say you want "a pretty close Fyear year"?
I might start with
proc sql; create table Matching as select distinct b.* from Active as a right join Inactive as b on a.SICH=b.SICH where b.fyear between (a.fyear-3) and a.fyear; quit;
I will keep in mind about the code/Log for the updates. I did a mistake copying my old code and changed it to Fyear afterwards when I realized but kept the wrong one here.
I will try your code and post an update as soon as the program runs it (or not)
Thanks for the help
So I tried your code but unsuccesfully.
Here is what the log told me:
309 ;;;;
310 proc sql;
311 create table Matching as
312 select distinct b.*
313 from Inactive as a
314 right join
315 Active as b
316 on a.SICH=b.SICH
317 where b.fyear between (a.fyear-3) and a.fyear;
NOTE: Table WORK.MATCHING created, with 0 rows and 16 columns.
318 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 9:52.22
cpu time 9:51.92
I remembered why I was using the DLDTE year (DLDTE means deletation year). Since in the inactive companies (from the inactive data set) have a deletation year. I want to match an Active company (from my active dataset) that was active at the DLDTE time (or as close as possible to it)
Even when modifying the code as follow:
proc sql; create table Matching as select distinct b.* from Inactive as a right join Active as b on a.SICH=b.SICH where b.fyear between (a.DLDTE-3) and a.DLDTE; quit;
The log screen still show`s me:
324 proc sql;
325 create table Matching as
326 select distinct b.*
327 from Inactive as a
328 right join
329 Active as b
330 on a.SICH=b.SICH
331 where b.fyear between (a.DLDTE-3) and a.DLDTE;
NOTE: Table WORK.MATCHING created, with 0 rows and 16 columns.
332 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 6:41.28
cpu time 6:40.93
The Join has a direction. If you use Right Join it selects records from the second set and matches them to first.
If you want the records from the Inactive selected and then match from Active try LEFT JOIN.
@Pabster wrote:
So I tried your code but unsuccesfully.
Here is what the log told me:309 ;;;;
310 proc sql;
311 create table Matching as
312 select distinct b.*
313 from Inactive as a
314 right join
315 Active as b
316 on a.SICH=b.SICH
317 where b.fyear between (a.fyear-3) and a.fyear;
NOTE: Table WORK.MATCHING created, with 0 rows and 16 columns.318 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 9:52.22
cpu time 9:51.92
I remembered why I was using the DLDTE year (DLDTE means deletation year). Since in the inactive companies (from the inactive data set) have a deletation year. I want to match an Active company (from my active dataset) that was active at the DLDTE time (or as close as possible to it)
Even when modifying the code as follow:proc sql; create table Matching as select distinct b.* from Inactive as a right join Active as b on a.SICH=b.SICH where b.fyear between (a.DLDTE-3) and a.DLDTE; quit;The log screen still show`s me:
324 proc sql;
325 create table Matching as
326 select distinct b.*
327 from Inactive as a
328 right join
329 Active as b
330 on a.SICH=b.SICH
331 where b.fyear between (a.DLDTE-3) and a.DLDTE;
NOTE: Table WORK.MATCHING created, with 0 rows and 16 columns.332 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 6:41.28
cpu time 6:40.93
However your ranges of values don't match. The example data you showed above has the DLDTE variable as a two digit value and the Fyear value as four digits.. So 78 is going to be way smaller than 1978.
So I suggest using DLDTE+1900 for some of those years. But if you have any 2000 or later needed that is going to complicate your code a bit more.
And yet again a reasons why 2-digit years are generally poor idea.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.