08-10-2016 05:08 PM
Hello, I work in an institutional research office where we submit an annual survey to CGS-GRE. We typically use a macro to create all of the necessary tables and then enter all of the data by hand into their website. CGS-GRE offers the option to submit the data as a txt file instead, so I'm attempting to figure out the best way to create this file.
The instructions for the file are here (I can't copy/paste from the file), bottom of page 8, and then page 18 for the specific instructions. Roughly, the format should be:
- plain text
- must not contain values other than blank, or numbers 0 - 9
- record length of 610
Our data are structured as one row per student, and our procedure so far is to create a flag variable of 1 for each category and then sum them in a proc tabulate, e.g.:
input year gre_code sex $ level $ ft_pt $ citizen $ race $;
2015 01 female Masters full_time US White
2015 02 male Doctoral part_time Non-US Asian
if sex = "male" and level = "Masters" then male_masters = 1;
else if sex = "female" and level = "Masters" then female_masters = 1;
else if level = "Masters" then total_masters = 1;
proc tabulate data = students format = 6. missing;
var male_masters female_masters total_masters;
tables gre_code, male_masters female_masters total_masters / printmiss misstext = "0";
All thoughts and ideas welcome! Please let me know if my explanation isn't clear.
08-10-2016 06:03 PM
You'll need to obtain your data in a dataset and then use a data step to export the data to a text file.
Proc tabulate won't work, but you can use proc means to summarize your data or proc sql.
Once you've obtained the data set you need in a data format we can help you with formatting the output into a text file if you need.
08-10-2016 07:09 PM
The real piece of information you need is the appendix B of that document. it has the actual layout.
From that it looks like some work with Proc summary and a datastep to prepare the data and then a good old-fashioned data _null_ with file output might work. The odd bit is going to be getting all of those variables on one line.
Here is a brief example with completely made up data for a small number of records and the first few variables to demonstrate one way.
data have; input GREinst discipline $ year FTMOM FTMOW FTMOT FTDM FTDW FTDT; datalines; 1111 00 2015 15 12 27 3 2 5 1111 01 2015 07 03 10 1 2 3 1111 02 2015 08 09 17 2 0 2 ; run; data _null_; /* replace path with location in the file statement or use a fileref file "<your path>\FileToUpLoad.txt" lrecl=610; */ set have; put GREinst f4. @5 discipline $2. @7 year f4. @11 (FTMOM FTMOW FTMOT FTDM FTDW FTDT)(F5.); run;
The File statement is where the output will go. The LRECL value should match at least the overall length of the line.
The PUT statement is heart of this. The @ is a column specification to print at that column number, from the Appendix B. The formats say how many print positions to use. With numeric data they are right justified within the number of spaces. The ()() is to group variables with like properties with the print instruction. In this case all the variables use 5 print positions, the Length in the appendix.
They actually did you a favor by using the same length for most of the variables.