I have an Excel sheet with a few dozen columns. One of these columns is called "distribution options". In this column you find general keywords to indicate how it will be distributed. Below you will find all the available "distribution options".
Fast Track Heir
Fast Track Living Expenses
Fast Track Balance
Fast Track Surety
Manual Distribution Office
Distribution Notery
Distribution Standard
I wanted to read out the entire column and receive a table how many times each of these distribution options appear in the column.
The result in the end should look similar to this.
Distribution Options | Count |
Fast Track Heir | 102 |
Fast Track Living Expenses | 230 |
Fast Track Balance | 17 |
Fast Track Surety | 43 |
Manual Distribution Office | 2 |
Distribution Notery | 18 |
Distribution Standard | 43 |
I wrote a script, but for some reason it doesn't count the Fast Track Heir, Fast Track Surety and Manual Distribution Office. All the other values (the same as above) are correct.
I used this script, but perhaps there is an easier approach. I simply can't figure out why Heir, Surety and Manual return "0" values, whereas the rest returns the values shown above. There are no errors in the log either. This time I checked the data myself and used COUNTIF(range;"word") to count it in Excel. The values above are correct. The script somehow isn't.
PROC SORT IN=atropos.dossiers OUT=atropos.dos_opt;
BY dossier_manager;
RUN;
DATA atropos.dos_rap;
SET atropos.dos_opt;
BY dossier_manager;
RETAIN
Heir
Living_Expenses
Balance
Surety
Manual
Notary
Standard;
IF first.dossier_manager
THEN DO;
Heir=0;
Living_Expenses=0;
Balance=0;
Surety=0;
Manual=0;
Notary=0;
Standard=0;
END;
SELECT (distributionoption);
WHEN ('Fast Track 1 Heir') Heir +1;
WHEN ('Fast Track Living Expenses') Living_Expenses +1;
WHEN ('Fast Track Balance') Balance +1;
WHEN ('Fast Track Surety") Surety +1;
WHEN ('Manual Distributionassignment Office') Manual +1;
WHEN ('Distribution Notery') Notery +1;
WHEN ('Distribution Standard') Standard +1;
OTHERWISE;
END;
IF last.dossier_manager THEN OUTPUT;
KEEP dossier_manager Heir Living_Expenses Balance Surety Manual Notary Standard;
RUN;
Hello,
Are you sure the values are written the way you have added in the select clause ?
By the way select clause is useful when:
- you have a long series of mutually exclusive numeric conditions
- data values are uniformly distributed
In your case I believe an aprroach using if then else condition would be more appropriate.
Hello,
Are you sure the values are written the way you have added in the select clause ?
By the way select clause is useful when:
- you have a long series of mutually exclusive numeric conditions
- data values are uniformly distributed
In your case I believe an aprroach using if then else condition would be more appropriate.
Yes, the values are written correctly. I have copy pasted the values from the Excel sheet. I even tried to change
WHEN ('Fast Track 1 Heir') Heir +1;
to
WHEN ('%Heir%') Heir +1;
Even then it didn't work.
So you're suggestion would look like this?
DATA atropos.dos_rap;
SET atropos.dos_opt;
BY dossier_manager;
RETAIN
Heir
Living_Expenses
Balance
Surety
Manual
Notary
Standard;
IF first.dossier_manager
THEN DO;
Heir=0;
Living_Expenses=0;
Balance=0;
Surety=0;
Manual=0;
Notary=0;
Standard=0;
END;
SELECT (distributionoption);
IF distributionoption = ('Fast Track 1 Heir') THEN Heir +1;
ELSE IF distributionoption = ('Fast Track Living Expenses') THEN Living_Expenses +1;
ELSE IF distributionoption = ('Fast Track Balance') THEN Balance +1;
ELSE IF distributionoption = ('Fast Track Surety") THEN Surety +1;
ELSE IF distributionoption = 'Manual Distributionassignment Office') THEN Manual +1;
ELSE IF distributionoption = ('Distribution Notery') THEN Notery +1;
ELSE IF distributionoption = ('Distribution Standard') THEN Standard +1;
END;
IF last.dossier_manager THEN OUTPUT;
KEEP dossier_manager Heir Living_Expenses Balance Surety Manual Notary Standard;
RUN;
I tried a different approach.
I imported the file. Then I took the imported data and only selected the distribution options, while storing this new selection as a new table. I noticed that some of the distribution options were incomplete and thus the query didn't work.
For example: "Fast Track 1 Heir" became "Fast Track 1 He" as well as with the other two.
I did copy and paste the exact values from the Excel file, but that wasn't the table my query tried to work with. In fact it was an import in my SAS library. Instead of checking the raw data, I should also check my imported data more. Consider this a lesson learned, because after I adjusted the names in the first script, everything ran smoothly!
Thanks for helping me get there.
Hi,
Firstly, please code in consistent casing/indentaiton, makes it much easier to read, and posting example test data in the form of a datastep will help you get tested code.
Now your problem. I thinkk you are looking at it the wrong way round, you have list of what values you want out, and then trying to count from that. The other way of looking at it is to proc means the data you have, and then filtering the results for those you want to keep. This has the benefit of being able to see what counted data is present. For example (untested);
proc means data=have; by distribution_options; output out=results; run; /* Filter out the ones you want to keep */ proc sql; create table WANT as select * from DISTRIBUTION_OPTIONS in (select OPTIONS from LIST_OF_VALUES); quit;
Nice and simple way, though to be honest I don't think you need the second step, just drop records you don't want in your further processing, then its just one step.
Just to add, on your second post, a select clause can look like:
select(<test>);
when(<result>) <code>;
Or ir can look like:
select;
when (index(strip(upcase(<text>)),"HIKE")>0) <code>;
If this is really EXACTLY your code, look at the log, because you have non-matching quotes in
WHEN ('Fast Track Surety") Surety +1;
No it isn't.
The original code (in my SAS client) has variable names in Dutch. To make it more understandable, I manually changed all the names in here. Kinda screwed me over, but rest assured: no syntax errors. 🙂
Thanks!
Not sure I under the purpose of this. Seems to me as a hard coded recode of column values, count, and then a transpose...?
Similar to @RW9, I think there are smarter ways of accomplish this.
A format/lookup table can recode/rename your distribution options.
PROC TABULATE does cross tabulation out of the box - no need to transpose your data set.
Another way to do this is using proc sql:
data have;
input distribution_options $40.;
datalines;
Fast Track Heir
Fast Track Living Expenses
Fast Track Balance
Fast Track Surety
Manual Distribution Office
Distribution Notery
Distribution Standard
Fast Track Heir
Fast Track Living Expenses
Fast Track Balance
;
proc sql;
create table want as
select trim(distribution_options) as distribution_options , count(*) as Count
from have
group by trim(distribution_options);
quit;
Maybe I am missing something, but Proc Freq seems to be designed for this kind of job.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.