Solved: Disaggregating frequency data

Riley_Bug · Posted 04-04-2019 04:47 PM

I have simple frequency data that I'd like to disaggregate in order to have a better look at the distribution. I have been messing around with this to no avail! Proc expand is for temporal data, but is is simple frequencies.

My data consists of the number of words used to describe symptoms in one field (Field A) in a report. The variable 'number of reports' is the total number of reports that had a specific word count in Field A. For example, the first line of the table can be described as the following: 941 reports had 1 word in Field A.

Thoughts/suggestions?

Example data:

Word Count	Number of Reports
1	3
2	7
3	5
4	6
5	10
6	8
7	7
8	2
9	3
10	1

What I want it to look like:

Word Count	Arbitrary Report ID
1	A
1	B
1	C
2	D
2	E
2	F
2	G
2	H
2	I
2	J

Thank you!

Riley_Bug · Posted 04-04-2019 05:18 PM

YO!!!!!!! I found that in a PDF and am messing with similar coding right now. I should have posted here first so that I didn't need to spend an hour googling!

You are so fabulous and thank you so much! The code that worked for me is:

data table5;
set dataset;
do i=1 to Number_of_Reports;
output;
end;
keep count ;
run;

The output drops the frequency variable (Number of reports) so that all I have are single rows each with a number that corresponds to a report's word count in Field A.

View solution in original post

PeterClemmensen · Posted 04-04-2019 04:55 PM

Hi and welcome to the SAS Community 🙂

I don't fully understand your request. Eg why does your desired data have three observations for Word Count=1?

The DATA to DATA Step Macro
Blog: SASnrd

Riley_Bug · Posted 04-04-2019 05:06 PM

What I want is 941 rows to be output with the value of 1 for wordcount. The bottom table is just an example of the output, as I didn't want to type 941 rows. Does that make sense?

***EDIT* I edited the tables to have smaller values. Thank you!

PeterClemmensen · Posted 04-04-2019 05:10 PM

It does. And what about your arbitrary report id. Is that in fact arbitrary or do you have 941 actual reports with actual names that should be retrieved from somewhere?

Id your problem is as simple as your data suggests, you can do something like this

data have;
input WordCount NumberofReports;
datalines;
1 3
2 7
3 5
4 6
5 10
6 8
7 7
8 2
9 3
10 1
;

data want(keep=WordCount Report_ID);
   set have;
   do i=1 to NumberofReports;
      Report_ID+1;
      output;
   end;

   format Report_ID z5.;
run;

The DATA to DATA Step Macro
Blog: SASnrd

Riley_Bug · Posted 04-04-2019 05:18 PM

YO!!!!!!! I found that in a PDF and am messing with similar coding right now. I should have posted here first so that I didn't need to spend an hour googling!

You are so fabulous and thank you so much! The code that worked for me is:

data table5;
set dataset;
do i=1 to Number_of_Reports;
output;
end;
keep count ;
run;

The output drops the frequency variable (Number of reports) so that all I have are single rows each with a number that corresponds to a report's word count in Field A.

PeterClemmensen · Posted 04-04-2019 05:21 PM

Cool stuff 🙂 Glad you found your answer.

The DATA to DATA Step Macro
Blog: SASnrd

Reeza · Posted 04-04-2019 06:46 PM

@Riley_Bug wrote:

YO!!!!!!! I found that in a PDF and am messing with similar coding right now. I should have posted here first so that I didn't need to spend an hour googling!

When you're googling, restrict your results to two sites if you don't find what you need:

key words site:communities.sas.com

Or

key words site:lexjansen.com

90% of the time with those you'll find an answer.

ballardw · Posted 04-04-2019 05:13 PM

A bit more context of your data might help.

For instance you have

Word Count	Number of Reports
1	941

Does this mean that somewhere there are 941 reports that consist of a single word?

A specific word that has been encoded as "1" appears in 941 reports?

A report named 941 only has 1 instance of a specific word?

One of the issues with any summarized data is knowing what was summarized to begin with.

If the "disaggregation" is supposed to attempt to recover some original data (which I have seen requests for) then lots of information about the actual data is important.

Example:

The total number of shoppers at a store in a week is 1000. How many appeared on each day of the week? Might be able to get some model if we have an idea of things like ratio of purchasers to shoppers and number of sales per day. But without additional information getting a count per day of the week is pretty problematic. Though if the store is closed on a given day you can likely assume 0 shoppers.

Disaggregating frequency data

Re: Disaggregating frequency data

Re: Disaggregating frequency data

Re: Disaggregating frequency data

Re: Disaggregating frequency data

Re: Disaggregating frequency data

Re: Disaggregating frequency data

Re: Disaggregating frequency data

Re: Disaggregating frequency data

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away