Hi Guys,
I used Proc SQL to create a data-set from three different data sets based on identification code and year and quarter. However, the final sample size does not add up. Is there a command that can help me to make sure the final data is ok and includes all three data sets(it's a big data)? Variable wise, it is right and I checked.
proc sql;
CREATE TABLE SASUSER.FINALBEH AS
select *
from SASUSER.feb20INF a, allanalyst b
where a.permno=b.permno
and a.YEAR=b.YEAR AND A.QUARTER=B.QUARTER;
quit;
Thanks!
It's hard to say without more information about your sampling methodology and data.
How big is big?
Is it worth creating a view of the data and then using proc surveyselect?
thank you for replying!
My first data set has 17931 observations, second data 5429(183 variables) observations and thrid one has 535000(117 variables) obs. in total I have 355 variables. All data sets share common identifier and dates. I need to match data set 2 and 3 with 1(the main data base). So basically, I'm matching observations from data set 2 and 3 to data set 1 based on unique identifier and date(year and quarter).
I acquire different results with left join or natural join.
I'm not familiar with proc surveyselect.
To find entries in feb20inf that are not matched in allanalyst, try
proc sql;
create table test as
select *
from
sasuser.feb20inf as a
where not exists (
select *
from allanalyst
where permno=a.permno and year=a.year and quarter=a.quarter )
;
quit;
If there are duplicated value for permno or YEAR or QUARTER , you will get more obs than expect , Try MERGE statement.
data want;
merge SASUSER.feb20INF allanalyst
by permno YEAR QUARTER;
run;
Hi @HannahPhD,
If I had to investigate this situation, I would create a temporary dataset from each of the three original tables by restricting them to the key variables.
Example:
data keyall;
set allanalyst(keep=permno year quarter);
run;
Using the resulting three comparably small datasets you can easily
The results of these operations will give you a clear picture of the situation and hence guide you how to proceed with your "big" datasets. You will then know for sure in advance how many observations to expect from your intended joins/merges.
You should also make sure that there are no common variables except the key variables or, if there are some, that they are handled properly.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.