dear all,
how can I create dataset A and dataset B based on the original dataset (e.g., dataset C)?
for example,
for the original dataset (dataset C)
Company name | Country | Matched BvD ID | Matched company name |
02 MICRO | |||
02 MICRO | TW | ||
02 MICRO | US | ||
1... | GB04165791 | BH (CITY FORUM) LIMITED (Previous name: 1) | |
1... | GB | GB04165791 | BH (CITY FORUM) LIMITED (Previous name: 1) |
1... | US | ||
21(TWO-ONE) COMPANY | |||
21(TWO-ONE) COMPANY | JP | ||
21(TWO-ONE) COMPANY | US | ||
3-D MATRIX | JP4010001087940 | 3-D MATRIX,LTD. | |
3-D MATRIX | JP | JP4010001087940 | 3-D MATRIX,LTD. |
3-D MATRIX | KR | ||
3-D MATRIX | US | US138675448L | MATRIX 3D LLC |
I would like to have the dataset A like
Company name | Country | Matched BvD ID | Matched company name |
1... | GB04165791 | BH (CITY FORUM) LIMITED (Previous name: 1) | |
1... | GB | GB04165791 | BH (CITY FORUM) LIMITED (Previous name: 1) |
1... | US |
in the dataset A each group of Company_name has only one distinct Matched_company_name (which is BH (CITY FORUM) LIMITED (Previous name: 1)).
I would like to also create the dataset B like,
Company name | Country | Matched BvD ID | Matched company name |
3-D MATRIX | JP4010001087940 | 3-D MATRIX,LTD. | |
3-D MATRIX | JP | JP4010001087940 | 3-D MATRIX,LTD. |
3-D MATRIX | KR | ||
3-D MATRIX | US | US138675448L | MATRIX 3D LLC |
in dataset B, each group of Company_name has at least two distinct Matched_company_name (which are 3-D MATRIX,LTD. and MATRIX 3D LLC).
I would like to exclude observations which Company_name are '02 MICRO' and '21(TWO-ONE) COMPANY' as none of them have Matched_company_name variables.
could you please give me some suggestion about this?
As I see it, you plan to ignore any company for which the matchname is always blank. But otherwise blank matchname records are output to a dataset depending on the number of unique (non-blank) matchnames, right? If so:
data want1 want2;
do until (last.company_name);
set have;
by company_name matchname notsorted;
if last.matchname and matchname^=' ' then nmatches=sum(nmatches,1);
end;
do until (last.company_name);
set have;
by company_name ;
if nmatches=1 then output want1; else
if nmatches>1 then output want2;
end;
drop nmatches;
run;
Notes:
Dear mkeintz,
thank you for your suggestion.
thanks for your description, that is what I need. however, some company_name variables which recorded with unique (non-blank) Matched_company_name variable are also included in the dataset 'want2'.
I add a sample in the attachment (include 1000 observations) would you like to check?
thanks in advance.
I think you're the right person to check with your new sample. See if the program produces what you intend.
Creating new tables is seldom needed and even less often a good idea.
In your case, you can probably do a BY processing
proc XXX;
by Company_Name;
where Matched_Company_Name ne ' ';
run;
Would that work for you?
Why do some matched records have no matched value?
I think the OP wanted to distinguish companies with more than one non-blank matchname value. So a simple where statement would not likely capture it. Companies with nothing but blanks seem to ignored in the required sample output, but otherwise blank records go to the same destination as the non-blank records.
I suspect the OP has his/her own data set that is (fuzzy?) matched by name against company data from Bureau van Dijk (the BVD_ID column). Sometimes this yields multiple possibilities, and there likely needs to be a good deal of further "disambiguation", or some sort of data consolidation.
One of the problems with data from BvD, as I recall, was that (unlike many other vendors of corporate data) it did not provide tracking from year to year when there was a spin off or merger. So a user desiring a longer data history would have to try some sort of other ways (including historical name matching) to properly link different data "vintages". It's not a historic research friendly database.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.