- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Posted 05-25-2011 05:03 PM
(4402 views)
Hi, everyone:
I just came out from the R world and started using SAS recently.
In R, there is a function apply, where you can specify sort the data across the rows or columns. In SAS, do we have to transpose the data in order to use sort?
Also I would like to replicate each observation for several times, is there an efficient way to do that? Or I can repeat the data for k times, concatenate them and sort the whole data by ID. I can use append recursively for that. Is that efficient considering my original data has million obs?
Thank you in advance.
Sunrain
I just came out from the R world and started using SAS recently.
In R, there is a function apply, where you can specify sort the data across the rows or columns. In SAS, do we have to transpose the data in order to use sort?
Also I would like to replicate each observation for several times, is there an efficient way to do that? Or I can repeat the data for k times, concatenate them and sort the whole data by ID. I can use append recursively for that. Is that efficient considering my original data has million obs?
Thank you in advance.
Sunrain
4 REPLIES 4
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Gday sunrain,
A PROC SORT should do the trick for rows.
For columns, I would like to know if there's a function/proc which will do this too.
My current strategy would probably be:
* PROC CONTENTS - to get a listing of the variables
* sorting that list
* creating a macro variable with the contents of the above list
* in a new datastep use the ATTRIB statement to with the macro variable above
eg
[pre]
proc contents data = sashelp.adomsg out = a_contents noprint;
run;
proc sort data = a_contents (keep = name) out = a_sort;
by name;
run;
data _null_;
set a_sort;
by name;
retain sort_vars;
length sort_vars $100.;
if _n_ = 1 then
sort_vars = name;
else
sort_vars = catx(' ', sort_vars, name);
call symput('sort_vars', sort_vars);
run;
%put &sort_vars.;
data a_column_sorted;
attrib &sort_vars. label = '';
set sashelp.adomsg;
run;
[/pre]
I don't think PROC TRANSPOSE necessarily sorts the columns, I think it'll probably list variables according the order in which in comes across them.
As for efficiency, I guess that really depends on what the system can handle. If you want to replicate observations, you could have multiple OUTPUT statements in your code.
[/pre]
data a_replicated;
set sashelp.adomsg;
output;
output;
run;
data a_replicated2;
set sashelp.adomsg
sashelp.adomsg;
run;
[/pre]
The second one is concatenating the data with itself.
A PROC SORT should do the trick for rows.
For columns, I would like to know if there's a function/proc which will do this too.
My current strategy would probably be:
* PROC CONTENTS - to get a listing of the variables
* sorting that list
* creating a macro variable with the contents of the above list
* in a new datastep use the ATTRIB statement to with the macro variable above
eg
[pre]
proc contents data = sashelp.adomsg out = a_contents noprint;
run;
proc sort data = a_contents (keep = name) out = a_sort;
by name;
run;
data _null_;
set a_sort;
by name;
retain sort_vars;
length sort_vars $100.;
if _n_ = 1 then
sort_vars = name;
else
sort_vars = catx(' ', sort_vars, name);
call symput('sort_vars', sort_vars);
run;
%put &sort_vars.;
data a_column_sorted;
attrib &sort_vars. label = '';
set sashelp.adomsg;
run;
[/pre]
I don't think PROC TRANSPOSE necessarily sorts the columns, I think it'll probably list variables according the order in which in comes across them.
As for efficiency, I guess that really depends on what the system can handle. If you want to replicate observations, you could have multiple OUTPUT statements in your code.
[/pre]
data a_replicated;
set sashelp.adomsg;
output;
output;
run;
data a_replicated2;
set sashelp.adomsg
sashelp.adomsg;
run;
[/pre]
The second one is concatenating the data with itself.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi, John:
I tried your codes. I did not get the results correct. So I think I can transpose the data, sort and transpose it back. I don't care about the column names.
Thanks.
> Gday sunrain,
>
> A PROC SORT should do the trick for rows.
>
> For columns, I would like to know if there's a
> function/proc which will do this too.
>
> My current strategy would probably be:
> * PROC CONTENTS - to get a listing of the variables
> * sorting that list
> * creating a macro variable with the contents of the
> above list
> * in a new datastep use the ATTRIB statement to with
> the macro variable above
>
> eg
> [pre]
>
> proc contents data = sashelp.adomsg out = a_contents
> noprint;
> run;
>
> proc sort data = a_contents (keep = name) out =
> a_sort;
> by name;
> ;
>
> data _null_;
> set a_sort;
> by name;
> retain sort_vars;
> length sort_vars $100.;
>
> if _n_ = 1 then
> sort_vars = name;
> else
> sort_vars = catx(' ', sort_vars, name);
> all symput('sort_vars', sort_vars);
> run;
> %put &sort_vars.;
>
> data a_column_sorted;
> attrib &sort_vars. label = '';
> set sashelp.adomsg;
> run;
> [/pre]
>
> I don't think PROC TRANSPOSE necessarily sorts the
> columns, I think it'll probably list variables
> according the order in which in comes across them.
>
> As for efficiency, I guess that really depends on
> what the system can handle. If you want to replicate
> observations, you could have multiple OUTPUT
> statements in your code.
>
> [/pre]
> data a_replicated;
> set sashelp.adomsg;
> output;
> output;
> un;
>
>
> data a_replicated2;
> set sashelp.adomsg
> sashelp.adomsg;
>
> [/pre]
>
> The second one is concatenating the data with itself.
I tried your codes. I did not get the results correct. So I think I can transpose the data, sort and transpose it back. I don't care about the column names.
Thanks.
> Gday sunrain,
>
> A PROC SORT should do the trick for rows.
>
> For columns, I would like to know if there's a
> function/proc which will do this too.
>
> My current strategy would probably be:
> * PROC CONTENTS - to get a listing of the variables
> * sorting that list
> * creating a macro variable with the contents of the
> above list
> * in a new datastep use the ATTRIB statement to with
> the macro variable above
>
> eg
> [pre]
>
> proc contents data = sashelp.adomsg out = a_contents
> noprint;
> run;
>
> proc sort data = a_contents (keep = name) out =
> a_sort;
> by name;
> ;
>
> data _null_;
> set a_sort;
> by name;
> retain sort_vars;
> length sort_vars $100.;
>
> if _n_ = 1 then
> sort_vars = name;
> else
> sort_vars = catx(' ', sort_vars, name);
> all symput('sort_vars', sort_vars);
> run;
> %put &sort_vars.;
>
> data a_column_sorted;
> attrib &sort_vars. label = '';
> set sashelp.adomsg;
> run;
> [/pre]
>
> I don't think PROC TRANSPOSE necessarily sorts the
> columns, I think it'll probably list variables
> according the order in which in comes across them.
>
> As for efficiency, I guess that really depends on
> what the system can handle. If you want to replicate
> observations, you could have multiple OUTPUT
> statements in your code.
>
> [/pre]
> data a_replicated;
> set sashelp.adomsg;
> output;
> output;
> un;
>
>
> data a_replicated2;
> set sashelp.adomsg
> sashelp.adomsg;
>
> [/pre]
>
> The second one is concatenating the data with itself.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi, John:
Maybe I did not put my question clear. My goal is to sort data in "row-wise". That is sort every row independently, that is what R sort function do. In default, R sort sorts column wise.
That is why transpose and sort and transpose back does not work.
Sunrain.
> Gday sunrain,
>
> A PROC SORT should do the trick for rows.
>
> For columns, I would like to know if there's a
> function/proc which will do this too.
>
> My current strategy would probably be:
> * PROC CONTENTS - to get a listing of the variables
> * sorting that list
> * creating a macro variable with the contents of the
> above list
> * in a new datastep use the ATTRIB statement to with
> the macro variable above
>
> eg
> [pre]
>
> proc contents data = sashelp.adomsg out = a_contents
> noprint;
> run;
>
> proc sort data = a_contents (keep = name) out =
> a_sort;
> by name;
> ;
>
> data _null_;
> set a_sort;
> by name;
> retain sort_vars;
> length sort_vars $100.;
>
> if _n_ = 1 then
> sort_vars = name;
> else
> sort_vars = catx(' ', sort_vars, name);
> all symput('sort_vars', sort_vars);
> run;
> %put &sort_vars.;
>
> data a_column_sorted;
> attrib &sort_vars. label = '';
> set sashelp.adomsg;
> run;
> [/pre]
>
> I don't think PROC TRANSPOSE necessarily sorts the
> columns, I think it'll probably list variables
> according the order in which in comes across them.
>
> As for efficiency, I guess that really depends on
> what the system can handle. If you want to replicate
> observations, you could have multiple OUTPUT
> statements in your code.
>
> [/pre]
> data a_replicated;
> set sashelp.adomsg;
> output;
> output;
> un;
>
>
> data a_replicated2;
> set sashelp.adomsg
> sashelp.adomsg;
>
> [/pre]
>
> The second one is concatenating the data with itself.
Maybe I did not put my question clear. My goal is to sort data in "row-wise". That is sort every row independently, that is what R sort function do. In default, R sort sorts column wise.
That is why transpose and sort and transpose back does not work.
Sunrain.
> Gday sunrain,
>
> A PROC SORT should do the trick for rows.
>
> For columns, I would like to know if there's a
> function/proc which will do this too.
>
> My current strategy would probably be:
> * PROC CONTENTS - to get a listing of the variables
> * sorting that list
> * creating a macro variable with the contents of the
> above list
> * in a new datastep use the ATTRIB statement to with
> the macro variable above
>
> eg
> [pre]
>
> proc contents data = sashelp.adomsg out = a_contents
> noprint;
> run;
>
> proc sort data = a_contents (keep = name) out =
> a_sort;
> by name;
> ;
>
> data _null_;
> set a_sort;
> by name;
> retain sort_vars;
> length sort_vars $100.;
>
> if _n_ = 1 then
> sort_vars = name;
> else
> sort_vars = catx(' ', sort_vars, name);
> all symput('sort_vars', sort_vars);
> run;
> %put &sort_vars.;
>
> data a_column_sorted;
> attrib &sort_vars. label = '';
> set sashelp.adomsg;
> run;
> [/pre]
>
> I don't think PROC TRANSPOSE necessarily sorts the
> columns, I think it'll probably list variables
> according the order in which in comes across them.
>
> As for efficiency, I guess that really depends on
> what the system can handle. If you want to replicate
> observations, you could have multiple OUTPUT
> statements in your code.
>
> [/pre]
> data a_replicated;
> set sashelp.adomsg;
> output;
> output;
> un;
>
>
> data a_replicated2;
> set sashelp.adomsg
> sashelp.adomsg;
>
> [/pre]
>
> The second one is concatenating the data with itself.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi, every one:
The command call sortn can work for this issue with some restriction.
The varnames should be as x1-x10.
Sunrain
The command call sortn can work for this issue with some restriction.
The varnames should be as x1-x10.
Sunrain