I have a question about dynamically generate dataset and also from dataset to generate table/report dynamically.
Says I have an ever-changing source file, maybe in an excel format as below:
| Date | Field 1 | Field 2 | Field 3 |
| JUL01 | 100 | 101 | 88 |
| JUL02 | 331 | 72 | 111 |
| ... | ... | ... | ... |
But from time to time, the number of "Field X" might increase, like:
| Date | Field 1 | Field 2 | Field 3 | Field 4 |
| AUG01 | 20 | 61 | 98 | 172 |
| AUG02 | 341 | 27 | 121 | 12 |
| ... | ... | ... | ... | .... |
(**Note: the total number of "Field N" could be either increased or decreased)
Is there a way I can get the information from the fields/columns dynamically without hardcoding the names? As if I use keep=field_1 field_2 ... field_n then everytime there is a change in the source I'd need to update the code as well, but I'd like to avoid this.
I heard from people saying proc tabulate might help, but I don't see how it could help, could someone kindly give me a direction to look into? Many thanks!
It depends on the SAS code you want to maintain and where it is maintained.
After a SAS member has been generated, you always have DICTIONARY.COLUMNS to reference with PROC SQL -- or SAS/View SASHELP.VCOLUMN. Either of these facilities can be used to keep track and can be used in SAS application code, where needed. Search the SAS.COM support website for references.
Or, SAS coding techniques can also be implemented depending on variable name structure, as in using a SAS variable prefix like FIELD_: in a KEEP= (dataset option) or KEEP statement.
For other, more specific reference / usage examples, you will want to share your SAS application code in a forum post reply.
As was mentioned IMPORT does not require preknowledge of the number of incoming fields. In a DATA step the following code places each of the fields on its own observation so that any number of fields are acceptable (the number can even vary within a incoming table).
data a(keep=date num field);
infile datalines missover;
input date date9. @;
format date date9.;
input field @;
if field = . then num=0;
23jul2010 1 2 3 4 5 6
24jul2010 11 22 33 44
proc transpose data=a
proc print data=b;
As Scott says, PROC IMPORT would be most useful for reading in from Excel.
Not sure how using PROC TABULATE will help in anything but a report, however. Your question (How to dynamically change the number of columns in a dataset) seems to imply that one month, your dataset (let's call it PERM.WOMBAT) could have 3 FIELD: variables and the next month, when you read in your SOURCE or EXCEL file, your dataset (PERM.WOMBAT) could now have 5 FIELD: variables because the Excel file had a column added to it.
If you built programs based on the 3 FIELD: variable version of the dataset, and if your dataset PERM.WOMBAT suddenly turns out to have 5 FIELD: variables in a new month, your existing PROGRAMS would have no visibility of the 2 new FIELD: variables. Sometimes folks use the DICTIONARY files to see what variables are in a file before they overwrite that file with new data (and possible new columns). And then sometimes, people will write macro programs to allow for a bit more flexibility when the number of columns or the names of columns change.
At any rate, just some more to think about. I'm not sure whether you want a dataset or a report, though. I still don't understand how PROC TABULATE would help. Although, you CAN use a colon modifier in TABULATE syntax, in order to design your table, for more complicated tables, eventually, you may need to use the specific variable names. But in the meantime, the program below makes some fake data -- one file with 3 variables named FIELD1-FIELD3 and the other file with 5 variables FIELD1-FIELD5 and when you use the colon modifier in the VAR and TABLE statement, the correct number of FIELD variables appear in either table.
data class1(keep=name sex field1 field2 field3)
class2(keep=name sex field1-field5);
field1 = age;
field2 = weight;
field3 = height;
field4 = age * weight;
field5 = age * height;
ods html file='c:\temp\show_colon.html' style=sasweb;
proc tabulate data=class1;
title 'should see 3 columns under min and 3 columns under mean';
(min mean)* field:;
proc tabulate data=class2;
title 'should see 5 columns under min and 5 columns under mean';
(min mean)* field:;
ods html close;
I don't think the issue is so much about how to handle the dynamic data structure when the number of columns is unknown, as much as deciding what to do with these extra columns!
How can you decide that an extra column headed 'New_Balance" is the same information each time it appears - one day it might contain money, another time it might be a flag to indicate a new account.
Easily columns can be loaded as text. There are plenty of examples already in the Forum archives. What defeats my programming here is knowing what to do with "unknown" columns.
please can you define your data a little more: -
can you offer some more definition of what the additional columns might contain - numbers, messages, timestamps, boolean-flags, filenames
or perhaps something simpler ?
Please can you define what you might want to do with the extra columns?
I have given a try on the proc import and indeed I successfully retrieved the columns I need from my dummy .txt file. Many thanks to SBB and Cythia for the suggestion.
As for questions about the extra columns, they would be containing just simple number values.
Besides, I think I was mis-understood my task, though knowing to get data from source files do help me somehow, I am also looking a way to generate reports on WRS from datasets dynamically.
Like the stories I have told, I will be dealing with datasets that varies in number of columns from time to time, and it seems (currently with my knowledge) when generating the WRS reports, when there is a change in the dataset structure, something has to be done manually to update the report.
And thats what I want to avoid.
I learnt that stored process seems might be helpful, while currently I am studying on how the whole thing works (from dataset to mapping then to report etc).
I just started learning SAS for a month, hope my questions won't cause too much troubles, and thank you very much for all the kind supports!
When you are dealing with Web Report Studio (WRS), things are a bit different. The reports that you build in WRS can only use SAS Information Maps as the data source to build reports. Or, alternately, you can build a WRS report from a Stored Process. So, some of the techniques that we recommended (such as using PROC IMPORT and the colon modifier) will work inside a stored process, however, will not work with an Information Map. In some circumstances, it is possible to populate an Information Map from a Stored Process, but the number of columns in the Information Map really can't be dynamic and changeable.
At this point, I would recommend that you talk to your SAS Administrator or the person who has been building Information Maps for your company to discuss the best approach for your WRS report needs.
> As for questions about the extra columns, they would
> be containing just simple number values.
Since the information structure is so well defined, it seems the only variation would be the number of "extra", numeric columns.
I would recommend transposing this varying part into rows (each row also having the "common" part) and perhaps supplement these rows with a counter indicating the column position which that row of data represents.
Then the structure is fixed and the variation from input file to input file will only be the number of rows.
That rearrangement creates a structure that is far more "in tune" or "aligned" to the way that SAS works. I think (hope) it might be a standard transform in Information mapper.
> Besides, I think I was mis-understood my task
> somehow, I am also looking a way to generate reports
> on WRS from datasets dynamically.