Dear all,
how can I keep a group of data if one of them satisfy the requirement?
for example
table 1
company_name, country
apple,US
apple,
apple,UK
tesco,UK
sony,Japan
I would like to keep all rows relevant with company 'apple' and 'Tesco' if a country recorded with them is UK
table 2
company_name, country
apple,US
apple,
apple,UK
tesco,UK
could you please give me some suggestion about this?
thanks in advance.
Assuming I understand your requirement-->
data have;
infile cards dsd truncover;
input company_name $ country $;
cards;
apple,US
apple,
apple,UK
tesco,UK
sony,Japan
;
proc sql;
create table want(drop=t) as
select *,max(country='UK' and company_name in ('apple','tesco')) as t
from have
having t and company_name in ('apple','tesco');
quit;
Dear novinosrin,
Thank you for your advice.
However, my sample covers thousands of companies.
besides, I would like to explain my requirements.
I expect to include all companies which are relevant with 'GB' country. and there are two types of revelations.
the first is that the 'company_name' is directly related to 'GB' country, such as
company_name, country
Tesco, GB
I would like to include this row in the new dataset.
the second is, the 'company_name' is related with at least two country records(i.e., other country code or blank), and one of them is 'GB' country, for example,
company_name, country
Tesco,GB
Tesco,US
Tesco,
I would like to include all these three rows in the new dataset because the company names in the second and third row are as same as the company name in the first row which recorded with 'GB' country.
Could you please give me suggestions based on this requirement?
many thanks in advance.
Hi @France Ok, try this
data have;
infile cards dsd truncover;
input company_name $ country $;
cards;
apple,US
apple,
apple,UK
tesco,UK
sony,Japan
;
proc sql;
create table want(drop=t) as
select *,count(distinct country)>1 and sum(country='UK')>=1 as t
from have
group by company_name
having t or country='UK';
quit;
Logic:
1. check whether each company whether is present n the UK and is present in any other country
2. If 1 is true,output that
3. If the company is only present in UK and not in any other country, output that too
If the above is correct, it's still piece of cake
Or Datastep
data have;
infile cards dsd truncover;
input company_name $ country $;
cards;
apple,US
apple,
apple,UK
tesco,UK
sony,Japan
;
proc sort data=have;
by company_name;
run;
data want;
do until(last.company_name);
set have;
by company_name ;
if country='UK' then f=1;
end;
do until(last.company_name);
set have;
by company_name ;
if f then output;
end;
drop f;
run;
Even simpler
data have;
infile cards dsd truncover;
input company_name $ country $;
cards;
apple,US
apple,
apple,UK
tesco,UK
sony,Japan
;
proc sort data=have;
by company_name;
run;
data want;
merge have(in=a where=(country='UK')) have(in=b);
by company_name;
if a and b ;
run;
or a join
proc sql;
create table want as
select a.*
from have a inner join have(where=(country='UK')) b
on a.company_name=b.company_name;
quit;
If your data are already sorted by company_name (but the UK entry appears randomly within the duplicate company_name group, then (note I've sorted the sample data):
data have;
infile cards dsd truncover;
input company_name $ country $;
cards;
apple,US
apple,
apple,UK
sony,Japan
tesco,UK
;
data want;
merge have (where=(country='UK') in=ingb) have;
by company_name;
if ingb;
run;
The MERGE does many-to-one matches. So the where=(country='UK') single entry will match all entry for the same company_name. And if there are multiple UK entries (say 2) for a company, then the last 2nd such entry will be matched against the 2nd and all subsequent entries for that company.
Also the merge statement will overwrite value in the left entry with values from the right entry. So the UK values will be overwritten by the values coming from the right-hand have records.
Now if the data are not sorted by company_name, and sorting is expensive, then (date here is unsorted):
data have;
infile cards dsd truncover;
input company_name $ country $;
cards;
apple,US
apple,
tesco,UK
sony,Japan
apple,UK
;
data want;
set have;
if _n_=1 then do;
declare hash h (dataset:"have (where=(country='UK'))");
h.definekey('company_name');
h.definedone();
end;
if h.check()=0;
run;
This program builds a hash object H, keyed on company_name (meaning you can do a table lookup by company_name). But the only company_name values in H are those with country='UK'. Note, the H object is retained across observations, so it is instantiated only once, namely when the first iteration of the data step is executed - at which time the entire dataset is processed once for building the hash object.
The h.check() method returns a zero only when successful, i.e. only when the incoming company_name is found in h.
Note this program effectively reads the data set HAVE twice - once to build H, and once via the normal data step processing sequence. Even so, it's cheaper than sorting then data, followed by another data step - and possibly by another sort to reproduce original order, if needed.
data have;
infile cards dsd truncover;
input company_name $ country $;
cards;
apple,US
apple,
apple,UK
sony,Japan
tesco,UK
;
proc sql;
select *
from have
group by company_name
having sum(country='UK') ne 0;
quit;
Would definitely go with a hash solution
data want;
if _n_=1 then do;
declare hash h (dataset:"have(where=(country='UK'))");
h.definekey('company_name');
h.definedone();
end;
set have;
if h.check()=0;
run;
Hi France
As I understand you, you want all records for a given company, if the company is associated with GB.
With respect for the suggestions already given, I would recommend a simple two-step approace: First find the relevant companies, and then get all records for these companies.
The following code wil give you that. If you want to include UK as well (if no GB record exists), just replace the where statements with the one in comments.
data have;
infile cards dsd truncover;
input company_name $ country $;
cards;
Tesco,GB
Tesco,US
Tesco,
apple,US
apple,
apple,UK
sony,Japan
tesco,UK
;
proc sql;
create table temp as
select distinct company_name
from have
where country = 'GB';
/* where country = 'GB' or country = 'UK';*/
create table want as
select *
from have
where company_name in (select company_name from temp);
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.