BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
wlierman
Lapis Lazuli | Level 10

I have a SAS Dataset that looks like this (excerpt)

Organization                                   County                         County_count

African Family Holistic Org           Washington                              1

                                                      Lane                                        2

Asian Health Service Ctr              Baker                                       1

                                                      Linn                                         2

                                                      Gilliam                                     3

Consejo Hispano                           Clackamas                              1

                                                       Curry                                      2

                                                       Lincoln                                   3

                                                       Linn                                       4

                                               . . .

and so on....

 

What I need but is code that will produce an output dataset like

 

Organization                                   County_all                        

 

African Family Holistic Org                    2

 

Asian Health Service Ctr                       3

 

Consejo Hispano                                   4

                                              . . .

 

and so.  In others words rather than an individual count of counties where the

Organizations serve, I need the total count of the number of counties served by the Organization.  There are some Organizations that serve up to 12 or more counties.

 

What coding would produce the needed data set?

 

Thank you for your help.

 

wklierman

  

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

So you read in a report that had the values of the grouping variable filled in only once to make it easier for humans to review the report?  And now you want to convert that to an actual dataset?

 

So essentially you want to do a last observation carried forward operation on that first/grouping variable.

 

So let's assume the dataset you created from the XLSX file is named HAVE and the variable with blanked out values is named ORGANIZATION.  Here is a data step that will make a new dataset called WANT that fixes that by making a new variable whose value is RETAINed across iterations of the data step.

data want ;
  set have;
  if not missing(organization) then new_organization = organization;
  else organization = new_organization ;
  retain new_organization;
run;

Now you can run PROC FREQ on NEW_ORGANIZATION and see the proper counts.

View solution in original post

5 REPLIES 5
Reeza
Super User

If you don't have duplicates in the second column and assuming organization is repeated for each line not only on the first line as shown then a standard proc freq should be fine. 

 

The PROC FREQ will create a displayed output and the OUT= option saves it to a data set called WANT. 

 

proc freq data=have;
table organization / out=want;
run;

proc print data=want;
run;

If you do have duplicates then you need a count distinct, here's examples (that you can copy and fully run) that illustrate how to do a distinct count per group.

https://github.com/statgeek/SAS-Tutorials/blob/master/count_distinct_by_group.sas

 

 


@wlierman wrote:

I have a SAS Dataset that looks like this (excerpt)

Organization                                   County                         County_count

African Family Holistic Org           Washington                              1

                                                      Lane                                        2

Asian Health Service Ctr              Baker                                       1

                                                      Linn                                         2

                                                      Gilliam                                     3

Consejo Hispano                           Clackamas                              1

                                                       Curry                                      2

                                                       Lincoln                                   3

                                                       Linn                                       4

                                               . . .

and so on....

 

What I need but is code that will produce an output dataset like

 

Organization                                   County_all                        

 

African Family Holistic Org                    2

 

Asian Health Service Ctr                       3

 

Consejo Hispano                                   4

                                              . . .

 

and so.  In others words rather than an individual count of counties where the

Organizations serve, I need the total count of the number of counties served by the Organization.  There are some Organizations that serve up to 12 or more counties.

 

What coding would produce the needed data set?

 

Thank you for your help.

 

wklierman

  

 

 


 

wlierman
Lapis Lazuli | Level 10

Thank you very much for your help.

 

The Organization is only repeated once as in the first example.  So that is a little wrinkle in the mix. (Hopefully a small wrinkle.)

 

 

Organization                                     County                County_count

African Family Holistic Org          Washington                      1

                                                    Lane                                 2

wlierman
Lapis Lazuli | Level 10

I did think of going into the xlsx file and copying the name of the Organization, since there are fewer than 50 lines.

 

But then when the dataset is several thousand or more, that option would not be feasible.

 

wklierman

Tom
Super User Tom
Super User

So you read in a report that had the values of the grouping variable filled in only once to make it easier for humans to review the report?  And now you want to convert that to an actual dataset?

 

So essentially you want to do a last observation carried forward operation on that first/grouping variable.

 

So let's assume the dataset you created from the XLSX file is named HAVE and the variable with blanked out values is named ORGANIZATION.  Here is a data step that will make a new dataset called WANT that fixes that by making a new variable whose value is RETAINed across iterations of the data step.

data want ;
  set have;
  if not missing(organization) then new_organization = organization;
  else organization = new_organization ;
  retain new_organization;
run;

Now you can run PROC FREQ on NEW_ORGANIZATION and see the proper counts.

wlierman
Lapis Lazuli | Level 10

Thank you & Reeza for your expertise.

 

wklierman

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 664 views
  • 2 likes
  • 3 in conversation