I have simple frequency data that I'd like to disaggregate in order to have a better look at the distribution. I have been messing around with this to no avail! Proc expand is for temporal data, but is is simple frequencies.
My data consists of the number of words used to describe symptoms in one field (Field A) in a report. The variable 'number of reports' is the total number of reports that had a specific word count in Field A. For example, the first line of the table can be described as the following: 941 reports had 1 word in Field A.
Thoughts/suggestions?
Example data:
Word Count | Number of Reports |
1 | 3 |
2 | 7 |
3 | 5 |
4 | 6 |
5 | 10 |
6 | 8 |
7 | 7 |
8 | 2 |
9 | 3 |
10 | 1 |
What I want it to look like:
Word Count | Arbitrary Report ID |
1 | A |
1 | B |
1 | C |
2 | D |
2 | E |
2 | F |
2 | G |
2 | H |
2 | I |
2 | J |
Thank you!
YO!!!!!!! I found that in a PDF and am messing with similar coding right now. I should have posted here first so that I didn't need to spend an hour googling!
You are so fabulous and thank you so much! The code that worked for me is:
data table5;
set dataset;
do i=1 to Number_of_Reports;
output;
end;
keep count ;
run;
The output drops the frequency variable (Number of reports) so that all I have are single rows each with a number that corresponds to a report's word count in Field A.
Hi and welcome to the SAS Community 🙂
I don't fully understand your request. Eg why does your desired data have three observations for Word Count=1?
What I want is 941 rows to be output with the value of 1 for wordcount. The bottom table is just an example of the output, as I didn't want to type 941 rows. Does that make sense?
***EDIT* I edited the tables to have smaller values. Thank you!
It does. And what about your arbitrary report id. Is that in fact arbitrary or do you have 941 actual reports with actual names that should be retrieved from somewhere?
Id your problem is as simple as your data suggests, you can do something like this
data have;
input WordCount NumberofReports;
datalines;
1 3
2 7
3 5
4 6
5 10
6 8
7 7
8 2
9 3
10 1
;
data want(keep=WordCount Report_ID);
set have;
do i=1 to NumberofReports;
Report_ID+1;
output;
end;
format Report_ID z5.;
run;
YO!!!!!!! I found that in a PDF and am messing with similar coding right now. I should have posted here first so that I didn't need to spend an hour googling!
You are so fabulous and thank you so much! The code that worked for me is:
data table5;
set dataset;
do i=1 to Number_of_Reports;
output;
end;
keep count ;
run;
The output drops the frequency variable (Number of reports) so that all I have are single rows each with a number that corresponds to a report's word count in Field A.
Cool stuff 🙂 Glad you found your answer.
@Riley_Bug wrote:
YO!!!!!!! I found that in a PDF and am messing with similar coding right now. I should have posted here first so that I didn't need to spend an hour googling!
When you're googling, restrict your results to two sites if you don't find what you need:
key words site:communities.sas.com
Or
key words site:lexjansen.com
90% of the time with those you'll find an answer.
A bit more context of your data might help.
For instance you have
Word Count | Number of Reports |
1 | 941 |
Does this mean that somewhere there are 941 reports that consist of a single word?
A specific word that has been encoded as "1" appears in 941 reports?
A report named 941 only has 1 instance of a specific word?
One of the issues with any summarized data is knowing what was summarized to begin with.
If the "disaggregation" is supposed to attempt to recover some original data (which I have seen requests for) then lots of information about the actual data is important.
Example:
The total number of shoppers at a store in a week is 1000. How many appeared on each day of the week? Might be able to get some model if we have an idea of things like ratio of purchasers to shoppers and number of sales per day. But without additional information getting a count per day of the week is pretty problematic. Though if the store is closed on a given day you can likely assume 0 shoppers.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.