Hello,
I have a dataset which as char and numeric values. I need to sort the data by all of those values and then assign the top most or the last most with a flag.
Sample data:
acc(char) | time | money | id(char) |
1067988 | 5 | 2702.77 | 7101000002817 |
47339 | 35 | 17704.06 | 7101000002601 |
90380 | 35 | 11164.95 | 7101002302817 |
112740 | 5 | 8162.46 | 7101000002026 |
223245 | 35 | 20578.68 | 7101002302817 |
6431468 | 5 | 3418.59 | 7101000002601 |
I need to sort this data first by id, then by descending time, then by descending money and then by descending acc. Then I need to assign the top and bottom acc for each id group by a flag.
This is my code, and it's not sorting correctly. Some of these variables are char and some are num as you can see. Appreciate any help I can get
PROC SORT DATA = abc SORTSEQ =LINGUISTIC (NUMERIC_COLLATION=ON);
BY id descending time descending money DESCENDING acc;
RUN;
proc sql;
alter table abc
add flag char(4);
update abc
set flag = 'YES' where id=first.id;/*I'm pretty sure this is incorrect*/
quit;
Expected output: As you can see we first sort by id, for each id we then sort by desc time, then for each time we sort by desc money and for each money we sort by desc acc value. And the one that is the highest for each id group, we mark them with a flag
acc(char) | time | money | id(char) | flag |
112740 | 5 | 8162.46 | 7101000002026 | YES |
47339 | 35 | 17704.06 | 7101000002601 | YES |
6431468 | 5 | 3418.59 | 7101000002601 | |
1067988 | 5 | 2702.77 | 7101000002817 | YES |
223245 | 35 | 20578.68 | 7101002302817 | YES |
90380 | 35 | 11164.95 | 7101002302817 |
Instead of proc sql:
data abc_flagged;
set abc;
by id;
flag = (first.id or last.id);
run;
EDIT: This "solution" was posted before @AJ_Brien added the expected results to the original post. So the data step is somewhat useless.
So what does your desired result look like given the posted example data?
Ok. Then do
data abc;
input acc $ time money id :$20.;
datalines;
1067988 5 2702.77 7101000002817
47339 35 17704.06 7101000002601
90380 35 11164.95 7101002302817
112740 5 8162.46 7101000002026
223245 35 20578.68 7101002302817
6431468 5 3418.59 7101000002601
;
proc sort data = abc sortseq=linguistic (numeric_collation=on);
by id descending time descending money descending acc;
run;
data want;
set abc;
by id;
if first.id then flag='YES';
run;
It sure does. However with 6 obs there is not much to sort. But it does sort your data by id, by descending time and so on
Instead of proc sql:
data abc_flagged;
set abc;
by id;
flag = (first.id or last.id);
run;
EDIT: This "solution" was posted before @AJ_Brien added the expected results to the original post. So the data step is somewhat useless.
You're talking about numeric and character variables. However, in your sample output ACC, TIME, and MONEY are all left-justified. Whatever SAS interface you're using to view the data shown here, it's a sure sign that these variables are stored as the character type. Thus in your sort, for example, 47339>1067988, 5>35, and so forth. Yet it surely appears as though you're interested in comparing them using the variables according to the numeric values represented by their digits. Thus, either convert ACC, TIME, and MONEY to the numeric data type using the INPUT function or right-justify them using the RIGHT function before the sorting.
Alternatively, revise what you know about proc SORT incredibly numerous options. I seem to recall that there's an option:
SORTSEQ=LINGUISTIC(NUMERIC_COLLATION=ON)
that forces a character string to be ordered by the numeric values of integers present in there instead of their character value. An excellent reference to all these nifty proc SORT features are excellently aggregated in the excellent SGF 2018 paper by Derek Morgan:
https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2018/2773-2018.pdf
Kind regards
Paul D.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.