Hi friends,
I have a question to ask. When one runs an array job, the array's dimension-size
value must be consecutive, like from 5-9 (5,6,7,8,9). I have a data set for schools with subgroups. which is not consecutive. Its value is 1,4,5,6,7,8,9,13,14,25, in long format.
In order for the array to run,I have to create another variable, called group, to re-assign value to be consecutive, like, 1,2,3,4,5,6,7,8,9,10, (when subgroup=1, group=1, subgroup=4, group=2, subgroup=5, group=3, etc. Otherwise my array won't run properly.
This program is to turn data file from long to wide, to control the output, to avoid using a transpose, which is done so in the below codes.
Problem: After my data set was processed, I have to rename my output columns back to subgroup values, which is additional work and tedious.
My question: is there some way that one can use original data' s value (nonconsecutive) to run an array and get the job done?
*Note I have a data step to create a new var (group) before I can run my data step with an array;
*Also the file is sorted by sch_id and group;
Here is my program that is tested and working:
DATA want_wide ; *create a new data file name to be in wide format;
SET have_long; *read in the existing data file in long format;
By sch_id ; *read in data by the order of sch_id-original data each school has 10 rows (or less) of data;
KEEP sch_id EM_ELA_statusCY1 - EM_ELA_statusCY10; *keep output data's these column only;
RETAIN sch_id EM_ELA_statusCY1 - EM_ELA_statusCY10 ; *keep output data's column in this order;
ARRAY emstatus (1:10) $ EM_ELA_statusCY1 - EM_ELA_statusCY10 ; *process data by group value, from 1 to 10;
*based on sh_id, do the data step by the order of group id value;
IF first.sch_id THEN DO;
DO i = 1 to 10 ;
emstatus ( i ) = "" ; *output data is character type;
*array's order is based on i's value;
*above defaulted array index output to be missing (" "), when it has data,then it will be replaced;
END;*finish array execution;
END;* finish the data round beginning with sch_id do loop;
emstatus ( group ) = EM_ELA_statusCY ; *this is the column to be transposed;
* this will assign output new column name (EM subject=ELA) combined with group value for each new column;
IF last.sch_id THEN OUTPUT ;*when reached the last sch_id, output data;
RUN; *It worked, got the output as desired-long data set become wide, one school per row with 10 new columns;
* After this I have to rename the 10 columns back to the subgroup value, say
EM_ELA_statusCY8=EM_ELA_statusCY13,
EM_ELA_statusCY9= EM_ELA_statusCY14
EM_ELA_statusCY10=EM_ELA_statusCY25,etc.;
*Since I have 40 such columns, renaming them can be time consuming, the example only show 10 of them, there are three other sets of data, each set will produce 10 columns;
Any idea to go around so I do not need to create a new group var, and after the job rename my output columns back to subgroups' value?
Thank you.
Curious Peter
I have program that run the transpose job. Now I am seeking alternative way to have it done differently.
@Petergao1 wrote:
I have program that run the transpose job. Now I am seeking alternative way to have it done differently.
PROC TRANSPOSE is the way to go for this. Don't waste brain cycles ín a futile attempt to reinvent the wheel. Use your time for more useful things.
Just compare this:
data have;
input sch_id $ subgroup EM_ELA_statusCY $;
datalines;
A 1 abc
A 13 def
A 14 ghi
A 22 xyz
;
proc transpose data=have out=want prefix=EM_ELA_statusCY;
by sch_id;
var EM_ELA_statusCY;
id subgroup;
run;
with your complicated array code, the need to convert the subgroups to a continuous series, and the re-translation from that. Which one of those codes will you want to maintain in the future?
I like @Reeza 's suggestion.
Alternatively:
If you have variables var1, var4, var5, var6, var7, var8, var9, var13, var14, var25, this should work (note the double-dash)
ARRAY emstatus $ var1--var25 ;
Also, I haven't gone through your code in detail, but usually converting long to wide just makes your problem worse. Usually long data sets are much easier to handle and analyze.
Thanks for the suggestion. My data is in long format and there is one column called subgroup with values of 1,4,5,6,7,8,9,13,14,25; another column is the Em ELA status. that is why and where I encounter problem if I use the subgroup values as my array's dimension.
@Petergao1 wrote:
Thanks for the suggestion. My data is in long format and there is one column called subgroup with values of 1,4,5,6,7,8,9,13,14,25; another column is the Em ELA status. that is why and where I encounter problem if I use the subgroup values as my array's dimension.
Everything you said points to not transposing the data, leaving it long instead of wide. But you'd have to tell us what analysis you are planning with this data to be sure.
Also, if you TRANSPOSE the data, there shouldn't be any problems using the "double-dash" method I showed.
So I'm not sure what your resistance is to either idea.
I have program that run the transpose job. Now I am seeking alternative way to have it done differently.
Given that this seems like more of an academic/learning exercise.
Quite often it is a very good idea to provide example starting data, the desired output and the rules involved.
Transpose is likely an option.
Some ideas:
You have code like this to set all elements of an array to missing:
DO i = 1 to 10 ;
emstatus ( i ) = "" ; *output data is character type;
*array's order is based on i's value;
*above defaulted array index output to be missing (" "), when it has data,then it will be replaced;
END;*finish array execution;
The do loop is unneeded you can accomplish the exact same thing with:
call missing( of emstatus(*));
Call missing sets the values of all variables, and yes a mix of character and numeric is allowed, in the parentheses to missing for the variable type. You can process all elements of an array with the "of arrayname(*) " syntax.
One option might be to consider a custom custom format that would map your desired group values (still not quite sure why the word "group" is used ). Then you could use index values like varname[ functname(indexvalue)] to get the "group" .
But likely the introduction of additional variables to create the needed names with the ID statement in Proc Transpose is going to much easier.
Thanks again for the suggestion. I don't mind the array will run a dimension from 1, 2,3,... till 25. And then I can drop those empty cases. But I do not know how to do that.
Here is the data structure (I took the first one case) and the first status column to process it into wide format. Th out put will be one row for one school with additional 10 columns, each columns is a group's status for a subject, like ELA.
| sch_id | groupid | group | EM_ELA_StatusCY | EM_Math_StatusCY | HS_ELA_StatusCY | HS_Math_StatusCY |
| 1 | 1 | 1 | GS:MP | GS:MP | - | - |
| 1 | 4 | 2 | - | - | - | - |
| 1 | 5 | 3 | - | - | - | - |
| 1 | 6 | 4 | - | - | - | - |
| 1 | 7 | 5 | - | - | - | - |
| 1 | 8 | 6 | - | - | - | - |
| 1 | 9 | 7 | GS:MP | GS:MP | - | - |
| 1 | 13 | 8 | - | - | - | - |
| 1 | 14 | 9 | GS:WAA | GS:WAA | - | - |
| 1 | 25 | 10 | - | - | - | - |
I wished I can do that. I am new to this group and like to see your previous example or answers. Thanks!
Learning how to search is worth the time.
Search terms used in the giant 'Find your SAS answers' above:
"delete variable missing"
@Petergao1 wrote:
I wished I can do that. I am new to this group and like to see your previous example or answers. Thanks!
@Petergao1 wrote:
Thanks again for the suggestion. I don't mind the array will run a dimension from 1, 2,3,... till 25. And then I can drop those empty cases. But I do not know how to do that.
Here is the data structure (I took the first one case) and the first status column to process it into wide format. Th out put will be one row for one school with additional 10 columns, each columns is a group's status for a subject, like ELA.
sch_id groupid group EM_ELA_StatusCY EM_Math_StatusCY HS_ELA_StatusCY HS_Math_StatusCY 1 1 1 GS:MP GS:MP - - 1 4 2 - - - - 1 5 3 - - - - 1 6 4 - - - - 1 7 5 - - - - 1 8 6 - - - - 1 9 7 GS:MP GS:MP - - 1 13 8 - - - - 1 14 9 GS:WAA GS:WAA - - 1 25 10 - - - -
Are all those dashes indicating missing data? I am not sure why you make things "wide" with missing data.
Maybe it is time to ask what this somewhat ugly data set is to be used for. And possibly where that particular structure originate?
With the column heading you show in that source this starts to sound like a REPORT may be easier to make then a very ugly data set if the data is not used for further analysis.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.