- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to create a column that will show a count of how many times a variable (deal_id) appears in a column.
Using the below example I would want deal_id's 100477 and 101797 to have a 2 and 102913 to have a one. The 2 items that broken out need to be that way due to LGD diving factors downstream.
I am using PROC sql to create my tables and have tried using count(deal_id) and count(distinct deal_id) but they both returned a 1 value for all rows.
This is just a portion of what the entire table consists of
reporting_date | source_system_code | deal_id | Notional_USD | LGD |
04/30/2017 | ACBS | 100477 | 251019.46 | 0.31684 |
04/30/2017 | ACBS | 100477 | 10000 | 0 |
04/30/2017 | ACBS | 101797 | 5506337.48 | 0.31684 |
04/30/2017 | ACBS | 101797 | 50000 | 0 |
04/30/2017 | ACBS | 102913 | 6142612.78 | 0.17466 |
04/30/2017 | ACBS | 103074 | 11416126.66 | 0.31684 |
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
proc sql noprint; create table want as select *,count(deal_id) as count from have group by deal_id ; quit;
Art, CEO, AnalystFinder.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Is the desired output a data set or a report? If a data set are you merging the results back onto the original data so the same value appears on each record?
It really helps to show exactly what the actual output should be.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am creating a data set to be used for reporting
I am combining a few tables to be used for reporting ( source data is very large.) My problem arose when I tried to combine another table to pull in an interest column (which has deal_id and interest no duplicates) and saw that values were being duplicated due to the deal_id being on multiple rows.
I have been spinning my wheels on how to get this dat to pull once and my thought is to generate a count of deal_id and divide by that to remedy this.
porting_date | source_system_code | deal_id | Notional_USD | LGD | COUNT |
4/30/2017 | ACBS | 100477 | 251019.46 | 0.3168 | 2 |
4/30/2017 | ACBS | 100477 | 10000 | 0 | 2 |
4/30/2017 | ACBS | 101797 | 5506337.48 | 0.3168 | 2 |
4/30/2017 | ACBS | 101797 | 50000 | 0 | 2 |
4/30/2017 | ACBS | 102913 | 6142612.78 | 0.1747 | 1 |
4/30/2017 | ACBS | 103074 | 11416126.7 | 0.3168 | 1 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
proc sql noprint; create table want as select *,count(deal_id) as count from have group by deal_id ; quit;
Art, CEO, AnalystFinder.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What SQL code did you try?
I would try something like:
data have;
input deal_id Notional_USD;
cards;
100477 251019.46
100477 10000
101797 5506337.48
101797 50000
102913 6142612.78
103074 11416126.66
;
proc sql;
select * , count(*) as DealCount
from have
group by deal_id
;
quit;
Returns:
deal_id Notional_USD DealCount 100477 10000 2 100477 251019.5 2 101797 5506337 2 101797 50000 2 102913 6142613 1 103074 11416127 1
Next up: Troy Martin Hughes presents Calling Open-Source Python Functions within SAS PROC FCMP: A Google Maps API Geocoding Adventure on Wednesday April 23.
Register now at https://www.basug.org/events.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think my issue arises from my data being much larger than my example. I have attached an example (actual details have over 20k rows) of what I am trying to accomplish along with the SQL I wrote.
I either would like to count the times a variable appears in my table and do some simple math, or pull data from the AIR table only once.
When I try adding Count(Deal_id) or Count (distinct deal_id) I get a value of 1 in return.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Next up: Troy Martin Hughes presents Calling Open-Source Python Functions within SAS PROC FCMP: A Google Maps API Geocoding Adventure on Wednesday April 23.
Register now at https://www.basug.org/events.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You're grouping by a large collection of variables rather than just the one(s) you're interested in. Also, I didn't find a file in your attachments that showed the raw data.
Art, CEO, AnalystFinder.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That was it! I limited the grouping and now I'm getting the result I was looking for!
Thank you everyone!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.
Better would be to provide some example data and what the final report should look like. If you haven't explored Procs Report and Tabulate you might be surprised what is possible though all of the variables of interest do need to be in one data set for those.
Paste code into a code box using the forum {i} menu icon.