I have a SAS Dataset that looks like this (excerpt)
Organization County County_count
African Family Holistic Org Washington 1
Lane 2
Asian Health Service Ctr Baker 1
Linn 2
Gilliam 3
Consejo Hispano Clackamas 1
Curry 2
Lincoln 3
Linn 4
. . .
and so on....
What I need but is code that will produce an output dataset like
Organization County_all
African Family Holistic Org 2
Asian Health Service Ctr 3
Consejo Hispano 4
. . .
and so. In others words rather than an individual count of counties where the
Organizations serve, I need the total count of the number of counties served by the Organization. There are some Organizations that serve up to 12 or more counties.
What coding would produce the needed data set?
Thank you for your help.
wklierman
So you read in a report that had the values of the grouping variable filled in only once to make it easier for humans to review the report? And now you want to convert that to an actual dataset?
So essentially you want to do a last observation carried forward operation on that first/grouping variable.
So let's assume the dataset you created from the XLSX file is named HAVE and the variable with blanked out values is named ORGANIZATION. Here is a data step that will make a new dataset called WANT that fixes that by making a new variable whose value is RETAINed across iterations of the data step.
data want ;
set have;
if not missing(organization) then new_organization = organization;
else organization = new_organization ;
retain new_organization;
run;
Now you can run PROC FREQ on NEW_ORGANIZATION and see the proper counts.
If you don't have duplicates in the second column and assuming organization is repeated for each line not only on the first line as shown then a standard proc freq should be fine.
The PROC FREQ will create a displayed output and the OUT= option saves it to a data set called WANT.
proc freq data=have;
table organization / out=want;
run;
proc print data=want;
run;
If you do have duplicates then you need a count distinct, here's examples (that you can copy and fully run) that illustrate how to do a distinct count per group.
https://github.com/statgeek/SAS-Tutorials/blob/master/count_distinct_by_group.sas
@wlierman wrote:
I have a SAS Dataset that looks like this (excerpt)
Organization County County_count
African Family Holistic Org Washington 1
Lane 2
Asian Health Service Ctr Baker 1
Linn 2
Gilliam 3
Consejo Hispano Clackamas 1
Curry 2
Lincoln 3
Linn 4
. . .
and so on....
What I need but is code that will produce an output dataset like
Organization County_all
African Family Holistic Org 2
Asian Health Service Ctr 3
Consejo Hispano 4
. . .
and so. In others words rather than an individual count of counties where the
Organizations serve, I need the total count of the number of counties served by the Organization. There are some Organizations that serve up to 12 or more counties.
What coding would produce the needed data set?
Thank you for your help.
wklierman
Thank you very much for your help.
The Organization is only repeated once as in the first example. So that is a little wrinkle in the mix. (Hopefully a small wrinkle.)
Organization County County_count
African Family Holistic Org Washington 1
Lane 2
I did think of going into the xlsx file and copying the name of the Organization, since there are fewer than 50 lines.
But then when the dataset is several thousand or more, that option would not be feasible.
wklierman
So you read in a report that had the values of the grouping variable filled in only once to make it easier for humans to review the report? And now you want to convert that to an actual dataset?
So essentially you want to do a last observation carried forward operation on that first/grouping variable.
So let's assume the dataset you created from the XLSX file is named HAVE and the variable with blanked out values is named ORGANIZATION. Here is a data step that will make a new dataset called WANT that fixes that by making a new variable whose value is RETAINed across iterations of the data step.
data want ;
set have;
if not missing(organization) then new_organization = organization;
else organization = new_organization ;
retain new_organization;
run;
Now you can run PROC FREQ on NEW_ORGANIZATION and see the proper counts.
Thank you & Reeza for your expertise.
wklierman
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.