About Petergao1

Petergao1 · ‎07-16-2020

I wished I can do that. I am new to this group and like to see your previous example or answers. Thanks!

Petergao1 · ‎07-16-2020

The data set is real life file with just the column names modified for simplicity. Yesterday or earlier I discovered I can run my job using the array dimension 1:25 and do i=1 to 25 without any modification to my data structure, it will run even though the real variable is not consecutive and will produce all those columns, when the subgroups have no value their output columns will have all missing values. All I need to do is in the keep, retain line, I list only those columns that have a value on subgroups and where they should have data (some missing is considered to have data). Previous my program won't run because I forget to define the output columns to be character ones. DATA want_wide; SET have_long; BY sch_id; KEEP sch_id EM_ELA_status1 EM_ELA_status4-EM_ELA_status9 EM_ELA_status13-EM_ELA_status14 EM_ELA_status25 EM_Math_status1 EM_Math_status4-EM_Math_status9 EM_Math_status13-EM_Math_status14 EM_Math_status25 HS_ELA_status1 HS_ELA_status4-HS_ELA_status9 HS_ELA_status13-HS_ELA_status14 HS_ELA_status25 HS_Math_status1 HS_Math_status4-HS_Math_status9 HS_Math_status13-HS_Math_status14 HS_Math_status25; RETAIN sch_id EM_ELA_status1 EM_ELA_status4-EM_ELA_status9 EM_ELA_status13-EM_ELA_status14 EM_ELA_status25 EM_Math_status1 EM_Math_status4-EM_Math_status9 EM_Math_status13-EM_Math_status14 EM_Math_status25 HS_ELA_status1 HS_ELA_status4-HS_ELA_status9 HS_ELA_status13-HS_ELA_status14 HS_ELA_status25 HS_Math_status1 HS_Math_status4-HS_Math_status9 HS_Math_status13-HS_Math_status14 HS_Math_status25; ARRAY emela (1:25) $ EM_ELA_status1 - EM_ELA_status25 ; ARRAY emmath (1:25) $ EM_Math_status1 - EM_Math_status25 ; ARRAY hsela (1:25) $ HS_ELA_status1 - HS_ELA_status25 ; ARRAY hsmath (1:25) $ HS_Math_status1 - HS_Math_status25 ; IF first.sch_id THEN DO; DO i = 1 to 25 ; emela( i ) = "" ; emmath (i) = ""; hsela( i ) = "" ; hsmath (i) = ""; END; END; emela (groupid) = EM_ELA_status; emmath (groupid) = EM_Math_status; hsela (groupid) = HS_ELA_status; hsmath (groupid) = HS_Math_status; IF last.sch_id THEN OUTPUT ; RUN; *WORKS WELL; *advantages: have less codes and programs than using transpose approach (code); *shortcoming: output columns name have subgroup's numeric values instead of subgroup names-this is required by business convention, therefore I have to do further work to modify those 40 column names ending with number (1,4,5,6,7,8,9,13,14,25) into group names such as "All_students, SWD, Native_Amer, etc. after the main part of EM_ELA_status;

Petergao1 · ‎07-14-2020

Thanks again for the suggestion. I don't mind the array will run a dimension from 1, 2,3,... till 25. And then I can drop those empty cases. But I do not know how to do that. Here is the data structure (I took the first one case) and the first status column to process it into wide format. Th out put will be one row for one school with additional 10 columns, each columns is a group's status for a subject, like ELA. sch_id groupid group EM_ELA_StatusCY EM_Math_StatusCY HS_ELA_StatusCY HS_Math_StatusCY 1 1 1 GS:MP GS:MP - - 1 4 2 - - - - 1 5 3 - - - - 1 6 4 - - - - 1 7 5 - - - - 1 8 6 - - - - 1 9 7 GS:MP GS:MP - - 1 13 8 - - - - 1 14 9 GS:WAA GS:WAA - - 1 25 10 - - - -

Petergao1 · ‎07-14-2020

Thanks for the suggestion. My data is in long format and there is one column called subgroup with values of 1,4,5,6,7,8,9,13,14,25; another column is the Em ELA status. that is why and where I encounter problem if I use the subgroup values as my array's dimension.

Petergao1 · ‎07-14-2020

I have program that run the transpose job. Now I am seeking alternative way to have it done differently.

Petergao1 · ‎07-14-2020

Hi friends, I have a question to ask. When one runs an array job, the array's dimension-size value must be consecutive, like from 5-9 (5,6,7,8,9). I have a data set for schools with subgroups. which is not consecutive. Its value is 1,4,5,6,7,8,9,13,14,25, in long format. In order for the array to run,I have to create another variable, called group, to re-assign value to be consecutive, like, 1,2,3,4,5,6,7,8,9,10, (when subgroup=1, group=1, subgroup=4, group=2, subgroup=5, group=3, etc. Otherwise my array won't run properly. This program is to turn data file from long to wide, to control the output, to avoid using a transpose, which is done so in the below codes. Problem: After my data set was processed, I have to rename my output columns back to subgroup values, which is additional work and tedious. My question: is there some way that one can use original data' s value (nonconsecutive) to run an array and get the job done? *Note I have a data step to create a new var (group) before I can run my data step with an array; *Also the file is sorted by sch_id and group; Here is my program that is tested and working: DATA want_wide ; *create a new data file name to be in wide format; SET have_long; *read in the existing data file in long format; By sch_id ; *read in data by the order of sch_id-original data each school has 10 rows (or less) of data; KEEP sch_id EM_ELA_statusCY1 - EM_ELA_statusCY10; *keep output data's these column only; RETAIN sch_id EM_ELA_statusCY1 - EM_ELA_statusCY10 ; *keep output data's column in this order; ARRAY emstatus (1:10) $ EM_ELA_statusCY1 - EM_ELA_statusCY10 ; *process data by group value, from 1 to 10; *based on sh_id, do the data step by the order of group id value; IF first.sch_id THEN DO; DO i = 1 to 10 ; emstatus ( i ) = "" ; *output data is character type; *array's order is based on i's value; *above defaulted array index output to be missing (" "), when it has data,then it will be replaced; END;*finish array execution; END;* finish the data round beginning with sch_id do loop; emstatus ( group ) = EM_ELA_statusCY ; *this is the column to be transposed; * this will assign output new column name (EM subject=ELA) combined with group value for each new column; IF last.sch_id THEN OUTPUT ;*when reached the last sch_id, output data; RUN; *It worked, got the output as desired-long data set become wide, one school per row with 10 new columns; * After this I have to rename the 10 columns back to the subgroup value, say EM_ELA_statusCY8=EM_ELA_statusCY13, EM_ELA_statusCY9= EM_ELA_statusCY14 EM_ELA_statusCY10=EM_ELA_statusCY25,etc.; *Since I have 40 such columns, renaming them can be time consuming, the example only show 10 of them, there are three other sets of data, each set will produce 10 columns; Any idea to go around so I do not need to create a new group var, and after the job rename my output columns back to subgroups' value? Thank you. Curious Peter

Petergao1 · ‎06-29-2020

Dear mklangley, Thanks for your continued attention and help. Today I pasted your codes into my SAS EG, it worked! I got output of four lines, each row is a unique sch_id. The new columns are group_color combined for the sum, This is exactly what I am looking for. Thank you so much. Next I will modify my codes according to my real world data structure and apply them to my dataset to see if I can get them out as desired. Thank you everyone who read and gave your help in many ways. Best wishes, see you around. Peter

Petergao1 · ‎06-29-2020

HI mklangley, Thank you for taking your time reading my questions and making out a test program. I run your program. It worked. However, it still produce just one line, which is the very last record's sch_id, instead of three lines for three sch_id. (previously I did some trials and always have a output of one line of the last record in the input data set. I was wondering it has to do with the sch_id column, somehow, I cannot get it done properly. Thank you. I will keep trying in finding a way out...

Petergao1 · ‎06-29-2020

Hi, Ballardw, The purpose of this request is to enhance the efficiency. I have datasets similar to this structure, that is sch_id, subgroup (has 10 rows) and annual performance data (say PI1, PI2, PI3). I m looking for a solution to quickly process the data to have each school has its data of group_year_PI columns by each school_ID. The Macro should save me a lot of other data steps. Hope this help.

Petergao1 · ‎06-26-2020

Dear SAS users, Below is the sample dataset and a program that will produce a one line output that include team name and color and each team's multiple game result. It works well. My question is suppose there are multiple entities (show up as a new id column such as schools id) and each entity will have the same set of groups, this is normal because schools may have the same set of groups and they will come to compete in some activities and got various result at different round of contests. The new dataset (call it team_new, the new column is sch_id) will be like this: sch_id color group game1 game2 game 3 sch1 Green Crickets 10 7 8 sch1 Blue Sea Otters 10 6 7 sch1 Yellow Stingers 9 10 9 sch1 Red Hot Ants 8 9 9 sch1 Purple Cats 9 9 9 sch2 Green Crickets 10 9 7 sch2 Blue Sea Otters 8 7 9 sch2 Yellow Stingers 7 8 10 sch2 Red Hot Ants 9 8 10 sch2 Purple Cats 8 6 9 sch3 Green Crickets 5 7 9 sch3 Blue Sea Otters 6 8 10 sch3 Yellow Stingers 7 9 8 sch3 Red Hot Ants 6 9 10 sch3 Purple Cats 8 10 9 ; The desired output will be like this: sch_id greencricketstotal blueseaottertotal yellowstingerstotal redhotantstotl purplecatstotal sch1 25 23 28 26 27 sch2 xx xx xx xx xx sch3 xx xx xx xx xx *note above XX will be the sum of three competition's result, not run yet. data teams; *This is the original sample dataset's name; input color $15. @16 team_name $15. @32 game1 game2 game3; datalines; Green Crickets 10 7 8 Blue Sea Otters 10 6 7 Yellow Stingers 9 10 9 Red Hot Ants 8 9 9 Purple Cats 9 9 9 ; %macro newvars(dsn); data _null_; set &dsn end=end; count+1; call symputx('macvar'||left(count),compress(color)||compress(team_name)||"Total"); if end then call symputx('max',count); run; data teamscores; set &dsn end=end; %do i = 1 %to &max; if _n_=&i then do; &&macvar&i=sum(of game1-game3); retain &&macvar&i; keep &&macvar&i; end; %end; if end then output; %mend newvars; %newvars(teams) proc print noobs; title "League Team Game Totals"; run; The above program runs well. However, when I created the new dataset team_new with one more new column sch_id, and I tried to modify the above program, I can have it run, but only produce one line of output, which is the last color+group's record. I know I need to have more codes, but just cannot get it out the way I need. Any idea what I should do? By the way I would like to use the macro, with some modification, and do not want to do a transpose step. Thank you.

Online Status	Offline
Date Last Visited	‎07-20-2020 05:42 PM

Re: Run SAS array when array value is not consecutive

Re: Run SAS array when array value is not consecutive

Re: Run SAS array when array value is not consecutive

Re: Run SAS array when array value is not consecutive

Re: Run SAS array when array value is not consecutive

Run SAS array when array value is not consecutive

Re: How to produce a dataset with addtional id

Re: How to produce a dataset with addtional id

Re: How to produce a dataset with addtional id

How to produce a dataset with addtional id

Re: How to produce a dataset with addtional id

Re: Run SAS array when array value is not consecutive

Re: Run SAS array when array value is not consecutive

Re: Run SAS array when array value is not consecutive

Re: Run SAS array when array value is not consecutive

Re: Run SAS array when array value is not consecutive

Run SAS array when array value is not consecutive

Re: How to produce a dataset with addtional id

Re: How to produce a dataset with addtional id

Re: How to produce a dataset with addtional id

How to produce a dataset with addtional id