BookmarkSubscribeRSS Feed
France
Quartz | Level 8

dear all,

 

how can I create dataset A and dataset B based on the original dataset (e.g., dataset C)?

 

for example,

for the original dataset (dataset C)

 

Company nameCountryMatched BvD IDMatched company name
02 MICRO   
02 MICROTW  
02 MICROUS  
1... GB04165791BH (CITY FORUM) LIMITED (Previous name: 1)
1...GBGB04165791BH (CITY FORUM) LIMITED (Previous name: 1)
1...US  
21(TWO-ONE) COMPANY   
21(TWO-ONE) COMPANYJP  
21(TWO-ONE) COMPANYUS  
3-D MATRIX JP40100010879403-D MATRIX,LTD.
3-D MATRIXJPJP40100010879403-D MATRIX,LTD.
3-D MATRIXKR  
3-D MATRIXUSUS138675448LMATRIX 3D LLC

 

I would like to have the dataset A like

Company nameCountryMatched BvD IDMatched company name
1... GB04165791BH (CITY FORUM) LIMITED (Previous name: 1)
1...GBGB04165791BH (CITY FORUM) LIMITED (Previous name: 1)
1...US  

in the dataset A each group of Company_name has only one distinct Matched_company_name (which is BH (CITY FORUM) LIMITED (Previous name: 1)).

 

I would like to also create the dataset B like,

Company nameCountryMatched BvD IDMatched company name
3-D MATRIX JP40100010879403-D MATRIX,LTD.
3-D MATRIXJPJP40100010879403-D MATRIX,LTD.
3-D MATRIXKR  
3-D MATRIXUSUS138675448LMATRIX 3D LLC

 in dataset B, each group of Company_name has at least two distinct Matched_company_name (which are 3-D MATRIX,LTD. and MATRIX 3D LLC).

 

I would like to exclude observations which Company_name are '02 MICRO' and '21(TWO-ONE) COMPANY' as none of them have Matched_company_name variables. 

 

could you please give me some suggestion about this?

 

6 REPLIES 6
PeterClemmensen
Tourmaline | Level 20
How do you determine a ‘match’? How similar should the strings be?
mkeintz
PROC Star

As I see it, you plan to ignore any company for which the matchname is always blank.   But otherwise blank matchname records are output to a dataset depending on the number of unique (non-blank) matchnames, right?  If so:

 

data want1 want2;
  do until (last.company_name);
    set have;
    by company_name matchname notsorted;
    if last.matchname and matchname^=' ' then nmatches=sum(nmatches,1);
  end;
  do until (last.company_name);
    set have;
    by company_name ;
    if nmatches=1 then output want1; else
    if nmatches>1 then output want2;
  end;
  drop nmatches;
run;

 

Notes:

  1. This assumes your dataset is sorted by company_name.
  2. Within each company_name group, the data are sub-grouped (but not necessarily in sorted order) by matchname.
  3. It also assumes that there is no blank matchname in the middle of a non-blank matchname group.  I.e. it doesn't synthetically generate more matchname groups that actually exist.
  4. Again, if matchname is always blank, then there is no output, per your example.
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
France
Quartz | Level 8

Dear mkeintz,

 

thank you for your suggestion.

 

thanks for your description, that is what I need. however, some company_name variables which recorded with unique (non-blank) Matched_company_name variable are also included in the dataset 'want2'.

 

I add a sample in the attachment (include 1000 observations) would you like to check?

 

thanks in advance.

mkeintz
PROC Star

I think you're the right person to check with your new sample.  See if the program produces what you intend.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ChrisNZ
Tourmaline | Level 20

Creating new tables is seldom needed and even less often a good idea.

In your case, you can probably do a BY processing

proc XXX;

  by Company_Name;

  where Matched_Company_Name ne ' ';

run;

Would that work for you?

Why do some matched records have no matched value?

mkeintz
PROC Star

@ChrisNZ

 

I think the OP wanted to distinguish companies with more than one non-blank matchname value.  So a simple where statement would not likely capture it.  Companies with nothing but blanks seem to ignored in the required sample output, but otherwise blank records go to the same destination as the non-blank records.

 

I suspect the OP has his/her own data set that is (fuzzy?) matched by name against company data from Bureau van Dijk (the BVD_ID column).  Sometimes this yields multiple possibilities, and there likely needs to be a good deal of further "disambiguation", or some sort of data consolidation.

 

One of the problems with data from BvD, as I recall, was that (unlike many other vendors of corporate data) it did not provide tracking from year to year when there was a spin off or merger.  So a user desiring a longer data history would have to try some sort of other ways (including historical name matching) to properly link different data "vintages".  It's not a historic research friendly database.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 838 views
  • 1 like
  • 4 in conversation