- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Everyone
i have a dataset having variables ID Name new
input like this
ID Name New
101 Org
101 Mineel
101 Ram
102 23
102 44
102 33
103
103 40
103 42
104
104 clinic
104 Dale
105 12
105 4
105 6
106
106 Y
106 N
107
108
So I need to get numeric values from Name varible and sort the numerical values and store that sorting order in 'New' varible..and if ID value does not contain any value in Name it should not taken as numeric(ID 107,108)..output should be like
ID Name New
101 Org
101 Mineel
101 Ram
102 23 1
102 44 3
102 33 2
103 1
103 40 2
103 42 3
104
104 clinic
104 Dale
105 12 3
105 4 1
105 6 2
106
106 Y
106 N
107
108
Could anyone please help me for how to do this job..
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@sasuser123123 wrote:
Yes it works..but for few observations it's not working I have dataset with 350 observations out of that 330 observations got expected sorting number but for remaining 20 observations the sorting order is not coming expectedly. I don't know why.
and 20 observations'S NAME have values like
10456788
10425786
887446
It gives sorting order 1,2,3... etc per ID
When you mix numerals in character variables the sort order is very likely not what you expect and you may need to provide which you expect for a sort order.
"102456788" comes before "887446" because character sort order defaults to comparing the first CHARACTER value to the first character of another value and "1" comes before "8".
I think that for your purpose you need to 1) identify those "name" values that are all digits; 2) subset the data on that; 3) use Proc Sort on the subset of names with SORTSEQ=linguistic(numeric_collation=on) option to sort in "numeric" order; 4) add the order and 5) merge back to the original data
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try next code:
data temp;
set have;
name_num = input(name, ?? best8.); /* adapt to max length of NAME */
run;
proc sort data=temp; by ID name_num; run;
data want;
set temp;
by ID;
retain new_var;
if first.ID then new_var=.;
if name_num > 0 then naew_var+1;
drop name_num;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Im not getting desired output, in Id 103 first value is missing it should take it as 1 in new variable but it takes from next value of NAME variable of 103
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@sasuser123123 just made a correction to my code below. This gives you the exact result you want.
Let me know if this works for you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@sasuser123123 wrote:
Im not getting desired output, in Id 103 first value is missing it should take it as 1 in new variable but it takes from next value of NAME variable of 103
You did not define the exact rules when to count the rows per ID and when to leave it blank.
According to your output shown data I'm trying to define those rules:
1) If none of the names per ID is a number then leave the new variable blank.
2) If all names per ID are numbers or part of then are empty then count into the new variable.
3) If all names per ID are empty then leave the new variable blank.
4) Question - Can it be that some names are alphabetic and others are numeric in the same ID?
If positive - what should be the rule ?
The code name_num = input(name,?? best 8.) will result in:
- for empty name or alphabetic (or alphanumeric) it will result into a missing value
- for a number it will result into a numeric variable with the same value
Thus by computing the max value got per ID it is possible to make a decision :
- if max value is missing value then leave it blank
- otherwise count starting with 1 and save it in the new variable after sort by ID NAME.
Next is the code to do it:
data temp;
set have;
name_num = input(name, ?? best8.); /* adapt to max length of NAME */
run;
proc sort data=temp; by ID name_num; run;
proc sql;
create table tmp2 as
select ID,
max(name_num) as flag
from temp
group by ID
order by ID;
quit;
options missing= ' ';
data want;
merge temp tmp2;
by ID;
retain new_var;
if first.ID then do;
if missing(flag) then new_var = flag;
else new_var = 1;
end; else if flag then new_var+1;
drop flag;
run;
If you find IDs with wrong counting please post:
1) The ID and NAMEs of that ID
2) Explain the logic why it is wrong and what rule it shall follow.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
How about
data have;
input ID $ Name $;
infile datalines missover;
datalines;
101 Org
101 Mineel
101 Ram
102 23
102 44
102 33
103
103 40
103 42
104
104 clinic
104 Dale
105 12
105 4
105 6
106
106 Y
106 N
107
108
;
data want(drop = n i rc flag);
dcl hash h(ordered : "Y", multidata : "Y");
h.definekey("n");
h.definedone();
dcl hiter hi("h");
do until (last.ID);
set have;
by id;
n = input(Name, ?? best8.);
h.add();
if n then flag = 1;
end;
do until (last.ID);
set have;
by ID;
if flag then do i = 1 by 1 while (hi.next() = 0);
if input(Name, ?? best8.) = n then do;
new = i;
rc = hi.last();
rc = hi.next();
leave;
end;
end;
output;
end;
h.clear();
run;
Result:
ID Name new 101 Org . 101 Mineel . 101 Ram . 102 23 1 102 44 3 102 33 2 103 1 103 40 2 103 42 3 104 . 104 clinic . 104 Dale . 105 12 3 105 4 1 105 6 2 106 . 106 Y . 106 N . 107 . 108 .
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yeah its working . i applied the same code to another dataset which contains start values like
107413601
107415618
87896111
so the code will give the output to this values is
1
2
3
I dont know why its coming like
and also I have 0 in Id variable the code not giving the sorting order for 0
could you please help me out
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
In above comment its not start its ID(sorry for mistake)
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
ID Name
112 Org
112 Mineel
112 Ram
113 23
113 44
113 33
114 10741368
114 10741568
114 8789611
115 0
115 0
116 12
116 4
116 6
so Im geting wrong sorting order for id 114 and didi not get any number for 115
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please post data that is representable of you actual data. Makes it easier to help you 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Seem to me that the result for 114 is correct?
Regarding 115, just made a correction. See if this helps.
data have;
input ID $ Name $;
infile datalines missover;
datalines;
112 Org
112 Mineel
112 Ram
113 23
113 44
113 33
114 10741368
114 10741568
114 8789611
115 0
115 0
116 12
116 4
116 6
;
data want(drop = n flag i rc);
dcl hash h(ordered : "Y", multidata : "Y");
h.definekey("n");
h.definedone();
dcl hiter hi("h");
do until (last.ID);
set have;
by id;
n = input(Name, ?? best8.);
h.add();
if n > . then flag = 1;
end;
do until (last.ID);
set have;
by ID;
if flag then do i = 1 by 1 while (hi.next() = 0);
if input(Name, ?? best8.) = n then do;
new = i;
rc = hi.last();
rc = hi.next();
leave;
end;
end;
output;
end;
h.clear();
run;
Result:
ID Name new 112 Org . 112 Mineel . 112 Ram . 113 23 1 113 44 3 113 33 2 114 10741368 2 114 10741568 3 114 8789611 1 115 0 1 115 0 1 116 12 3 116 4 1 116 6 2
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
could you please help me out
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Does my code provide the result you want?
The logic in your problem is not trivial, so the code probably will not be either 🙂 I'll help you understand the code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
and 20 observations'S NAME have values like
10456788
10425786
887446
It gives sorting order 1,2,3... etc per ID
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@sasuser123123 wrote:
Yes it works..but for few observations it's not working I have dataset with 350 observations out of that 330 observations got expected sorting number but for remaining 20 observations the sorting order is not coming expectedly. I don't know why.
and 20 observations'S NAME have values like
10456788
10425786
887446
It gives sorting order 1,2,3... etc per ID
When you mix numerals in character variables the sort order is very likely not what you expect and you may need to provide which you expect for a sort order.
"102456788" comes before "887446" because character sort order defaults to comparing the first CHARACTER value to the first character of another value and "1" comes before "8".
I think that for your purpose you need to 1) identify those "name" values that are all digits; 2) subset the data on that; 3) use Proc Sort on the subset of names with SORTSEQ=linguistic(numeric_collation=on) option to sort in "numeric" order; 4) add the order and 5) merge back to the original data