I have a data set that looks like the following.
ID | TVG_MSO1 | TVG_MSO2 | TVG_MSO3 | NTIA_MSO1 | NTIA_MSO2 | NTIA_MSO3 | NTIA_MSO4 |
391517108002054 | Comcast Cable Communications | Lamont Digital Systems, Inc. | Time Warner Cable | AT&T | Time Warner Cable | NULL | NULL |
191130017006002 | ImOn Communications, LLC | Mediacom, L. L. C. | NULL | CenturyLink | Dynamic Broadband | ImOn Communications, LLC | Mediacom, L. L. C. |
320030016123003 | CenturyLink | Commercial Satellite System | Cox Cable Communications Inc | CenturyTel TeleVideo, Inc. | Cox Cable Communications Inc | NULL | NULL |
360450613002014 | Time Warner Cable | NULL | NULL | Time Warner Cable | Verizon Service Corp | WesTel Systems | NULL |
220550020021029 | NULL | NULL | NULL | AT&T | Cox Cable Communications Inc | NULL | NULL |
480291203002025 | CS Wireless Systems, Inc. | Grande ClearSource Comm. | Time Warner Cable | AT&T | Grande ClearSource Comm. | Time Warner Cable | NULL |
132551612001003 | Comcast Cable Communications | DirecPath | NULL | AT&T | Comcast Cable Communications | Covad Communications | NULL |
260150108002035 | Wide Open West | NULL | NULL | Barry County Telephone Company | NULL | NULL | NULL |
210490202023015 | Time Warner Cable | NULL | NULL | AT&T | Time Warner Cable | NULL | NULL |
160219702005039 | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
My problem is that that I need find matches where TVG_MSO1 is equal to NTIA_MSO1 or NTIA_MSO1 or NTIA_MSO2 or NTIA_MSO3 or NTIA_MSO4. But the macth could come from any combination of the TVG_ and NTIA_ columns. So for example, in the first row TVG_MSO3 equals NTIA_MSO2, so I would like to have a final column that would say the name of the match "Time Warner Cable". If there is no match, that is fine. Similarly in row two the match is TVG_MSO2 and NTIA_MSO4 and the output column would be "Mediacom, L. L. C.".
If there is any one that could help me out with a solution, I would be forever grateful.
Your friend in SAS,
John Wynne
As this query is still marked 'Not answered' I thought you might be looking for code. Here is my suggestion. I've assumed the word NULL in your table is a token for a null value, not the literal value; however if that is not the case you can substitute <> 'NULL' for IS NOT NULL. This code is not tested. The results may not be in the same ID order as the input data.
* Assuming ID is unique ;
Proc Transpose
Data = Communications
Prefix = TVG_Values_
Name = Original_TVG_Column
Out = TVG_Data
;
By ID Notsorted ;
Var TVG_MSO: ;
* Selects all columns beginning with TVG_MSO ;
Run ;
Proc Transpose
Data = Communications
Prefix = NTIA_Values_
Name = Original_NTIA_Column
Out = NTIA_Data
;
By ID Notsorted ;
Var NTIA_MSO: ;
* Selects all columns beginning with NTIA_MSO ;
Run ;
Proc SQL ;
Create Table Comm_Matches As
Select
Distinct
TVG.ID
, TVG.TVG_Values_1 As Matching
, TVG.Original_TVG_Column
, NTI.Original_NTIA_Column
From TVG_Data TVG
, NTIA_Data NTI
Where NTI.ID = TVG.ID
And NTI.NTIA_Values_1 = TVG.TVG_Values_1
And NTI.NTIA_Values_1 IS NOT NULL
And TVG.TVG_Values_1 IS NOT NULL
;
Quit ;
Can you have multiple matches? And would you care which one matches?
Assuming you can't have multiple matches.
data want;
set have;
array tvg(*) tvg_mso1-tvg_mso3;
array ntia(*) ntia_mso1-ntia_mso4;
do i=1 to dim(tvg);
do j=1 to dim(ntia);
if whichc(tvg(i), of ntia(*))>0 then match=tvg(i);
if match ne '' then leave; *I can't remember if its leave or break to end the loop;
end;
end;
run;
I would suggest the following (assuming your data is already sorted) :
I do not at the moment have access to SAS to test and debug but I could sketch out some code if that would help.
Forgot to mention: if this is a very large table you can get by with one pass of proc transpose and join the resulting table to itself. The SQL gets a bit more messy, though you could use views to create two virtual tables and proceed as outlined in my first reply.
Thanks for your help
As this query is still marked 'Not answered' I thought you might be looking for code. Here is my suggestion. I've assumed the word NULL in your table is a token for a null value, not the literal value; however if that is not the case you can substitute <> 'NULL' for IS NOT NULL. This code is not tested. The results may not be in the same ID order as the input data.
* Assuming ID is unique ;
Proc Transpose
Data = Communications
Prefix = TVG_Values_
Name = Original_TVG_Column
Out = TVG_Data
;
By ID Notsorted ;
Var TVG_MSO: ;
* Selects all columns beginning with TVG_MSO ;
Run ;
Proc Transpose
Data = Communications
Prefix = NTIA_Values_
Name = Original_NTIA_Column
Out = NTIA_Data
;
By ID Notsorted ;
Var NTIA_MSO: ;
* Selects all columns beginning with NTIA_MSO ;
Run ;
Proc SQL ;
Create Table Comm_Matches As
Select
Distinct
TVG.ID
, TVG.TVG_Values_1 As Matching
, TVG.Original_TVG_Column
, NTI.Original_NTIA_Column
From TVG_Data TVG
, NTIA_Data NTI
Where NTI.ID = TVG.ID
And NTI.NTIA_Values_1 = TVG.TVG_Values_1
And NTI.NTIA_Values_1 IS NOT NULL
And TVG.TVG_Values_1 IS NOT NULL
;
Quit ;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.