Have embedded my COMMENTS in the codes, to get an understanding of my requirement.
/*8 observation from Shoes dataset*/
data shoes;
set sashelp.shoes;
if region='Canada' and Subsidiary='Montreal';
run ;
/*8 observations from org dataset since same data replicated*/
data org;
set sashelp.shoes;
if region='Canada' and Subsidiary='Montreal';
run ;
/*TASK: JOINING both the datasets to get the count of PRODUCT variable and storing the result in CNT variable using COUNT FUNCTION. Since there
are only 8 observation the CNT should also be 8,but because of CARTESIAN product CNT variable is showing 64 . Different results
with different usage of DISTINCT function*/
/*Scenario 1 Not using DISTINCT getting 64 observations in the dataset and CNT varaible
has 64 observations*/
proc sql noprint;
create table want1 as select org.region ,shoes.product ,shoes.Subsidiary,shoes.stores,
count(shoes.product) as cnt
from org as org left join
shoes as shoes on org.region = shoes.region
group by org.region
order by org.region;
quit;
/*Scenario 2 using DISTINCT getting 8 observations and CNT variable has 64 observations */
proc sql noprint;
create table want2 as select distinct org.region ,shoes.product ,shoes.Subsidiary,shoes.stores,
count(shoes.product) as cnt
from org as org left join
shoes as shoes on org.region = shoes.region
group by org.region
order by org.region;
quit;
/*Scenario 3: using DISTINCT at COUNT level getting 64 observations and CNT variable has 8 observations */
proc sql noprint;
create table want3 as select org.region ,shoes.product ,shoes.Subsidiary,shoes.stores,
count(distinct(shoes.product)) as cnt
from org as org left join
shoes as shoes on org.region = shoes.region
group by org.region
order by org.region;
quit;
/*Scenario 4: using DISTINCT at COUNT level getting 8 observations and CNT variable has 64 observations */
proc sql noprint;
create table want4 as select distinct org.region ,shoes.product ,shoes.Subsidiary,shoes.stores,
count(distinct(shoes.product)) as cnt
from org as org left join
shoes as shoes on org.region = shoes.region
group by org.region
order by org.region;
quit;
My requirement is NOT TO use DISTINCT at COUNT function but to get the COUNT(CNT) value as 8 since the actual count is
only 8 in the dataset. If we use distinct at COUNT FUNCTION we will get 8 . But i have customer requirements to use both
DISTINCT and without DISTINCT but my current issues is not to USE DISTINCT at COUNT function
to get 8 as CNT value though 64 observations are present. Any other alternative/solution for this.
Your join will always end up as a cartesian join, because that's how SQL works; when you have M tuples for a key in one dataset and N tuples in the other, the join will give you M*N tuples as result. Unless you prevent that with DISTINCT.
Depending on the actual form of the join, you should contemplate using a data step merge, or prepare your datasets in a way that you'l have a M to 1 join instead of a M to N.
Do you have to use SQL?
I don't think the fix is the count/distinct, it's in the joins most likely - you need to make it a distinct join which usually means adding a join condition. Can you post an example of your org table?
I think what you want to do is something like this:
proc sql noprint;
create table want as select distinct org.region ,shoes.product ,shoes.Subsidiary,shoes.stores,shoes.cnt
from org as org left join
(select *,count(*) as cnt from shoes) shoes
on org.region = shoes.region
group by org.region
order by org.region;
quit;
In other words: do the counting before the join by creating an inner query for shoes.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.