- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I import a list of strings stored in Excel in column A to search:
TATA |
TCS |
Accenture |
I also import the below data set:
REPORT_CAT | TRANS_AMOUNT | EXPEND_COMMENT | VENDOR |
Sal Dist & Contractor | 5177.36 | AP: TATA CONSULTANCY SERVICES LTD : INV USNI219089615 : : MF 85217900 | TATA CONSULTANCY SERVICES LTD |
Sal Dist & Contractor | -10811 | FB:TATA:Oct-18:Accrual WO 16079 | non-AP |
Sal Dist & Contractor | 10811 | FB:TATA:Nov-18:Accrual WO 16079 | non-AP |
IT Hardware | 52000 | AP: ACCENTURE LLP : INV 1100457766 : : MF 79310800 | ACCENTURE LLP |
Sal Dist & Contractor | 1023.31 | Internal:Inventory Management:Dec-18:project distribution | non-AP |
Sal Dist & Contractor | -50000 | FB:Accenture:Aug-18:Accrual WO 16389 | non-AP |
I want to use the imported list as strings to search in column EXPEND_COMMENT and Vendor
When there is a match, I want to sum by string on the TRANS_AMOUNT column and have a breakout by string and REPORT_CAT.
The table below would provide the desired output. Since one of the rows doesn't contain any of the strings, it's grouped in Other:
STRING | REPORT_CAT | TRANS_AMOUNT |
TATA | Sal Dist & Contractor | 5177.36 |
Accenture | IT Hardware | 52000 |
Accenture | Sal Dist & Contractor | -50000 |
Other | Sal Dist & Contractor | 1023.31 |
Thank you a lot for any help you provide.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What if EXPEND_COMMENT contains both TATA and TCS?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What if EXPEND_COMMENT contains both TATA and TCS?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
In the data set, there are only two companies that have multiple iterations:
Tata Consultancy Services is listed as Tata and TCS, the ideal scenario would be that these strings are grouped together under Tata. In a single row, Tata and uppercase (TCS) are never listed together and Tata and uppercase (TCS) are unique in the data such that they won't make up another word such as Tatanica or Logistcs.
Ernst and Young is listed as Ernst and Young and Ernst. Similar to above, ideally, these would be grouped under Ernst. And similar to the above, they are not found as part of a word.
The strings I will be looking for are unique. Any row that is not matched to a string will be grouped as Other.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.
I am not going to try to dummy up data for something like this.
Your example data should be small enough you can manually indicate the result.
Not need whatsoever to "make a list in Excel" to get values into SAS:
data search; informat string $25.; input string; datalines; TATA TCS Accenture ; run;
Are your search for values in the "list" case sensitive? Do you want to consider TATA found if the value searched is "FB:Tata:Oct-18:Accrual WO 16079"
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the detailed response, and sorry I didn't know about the forum etiquette. I attached a sample data set of 50 observations.
In these particular observations, I only see one string I would want to search for: TATA, the rest would be grouped into other. The full data set is about 100,000 rows and grows by about 30,000 rows per month. There are about 500 strings I will want to search for and return String, REPORT_CAT, and Sum of (TRANS_AMOUNT), grouping by REPORT_CAT.
All the strings will not be case sensitive, except in the case of EY and TCS which will be case sensitive. Therefore, I will consider TATA to be found if Tata or tata or TAta. I will only consider EY to be found if EY, and TCS to be found if TCS, which are the only two case sensitive strings I will be looking for.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Looks like it can't be done. Thanks everyone for the help!