SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

Hi Everyone 

i have a dataset having variables ID Name new

input like this
 ID   Name      New
101  Org
101  Mineel
101  Ram
102  23
102  44
102  33
103   
103  40
103  42

104   
104  clinic
104  Dale
105  12
105   4
105   6
106
106   Y
106   N
107

108

So I need to get numeric values from Name varible and sort the numerical values and store that sorting order in 'New' varible..and if ID value does not contain any value in Name it should not taken as numeric(ID 107,108)..output should be like 

 

 ID   Name      New
101  Org
101  Mineel
101  Ram
102  23            1
102  44            3
102  33            2
103                  1
103  40            2
103  42            3

104   
104  clinic
104  Dale
105  12           3
105   4            1
105   6            2
106
106   Y
106   N
107

108

Could anyone please help me for how to do this job..

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@sasuser123123 wrote:
Yes it works..but for few observations it's not working I have dataset with 350 observations out of that 330 observations got expected sorting number but for remaining 20 observations the sorting order is not coming expectedly. I don't know why.
and 20 observations'S NAME have values like
10456788
10425786
887446

It gives sorting order 1,2,3... etc per ID

When you mix numerals in character variables the sort order is very likely not what you expect and you may need to provide which you expect for a sort order.

"102456788" comes before "887446" because character sort order defaults to comparing the first CHARACTER value to the first character of another value and "1" comes before "8".

 

I think that for your purpose you need to 1) identify those "name" values that are all digits; 2) subset the data on that; 3) use Proc Sort on the subset of names with SORTSEQ=linguistic(numeric_collation=on) option to sort in "numeric" order; 4) add the order and 5) merge back to the original data

View solution in original post

14 REPLIES 14
Shmuel
Garnet | Level 18

Try next code:

data temp;
 set have;
      name_num = input(name, ?? best8.); /* adapt to max length of NAME */
run;
proc sort data=temp; by ID name_num; run;
data want;
 set temp;
   by ID;
       retain new_var;
        if first.ID then new_var=.;
        if name_num > 0 then naew_var+1;
        drop name_num;
run;
sasuser123123
Quartz | Level 8

Im not getting desired output, in Id 103 first value is missing it should take it as 1 in new variable but it takes from next value of NAME variable of 103

 

 

PeterClemmensen
Tourmaline | Level 20

@sasuser123123 just made a correction to my code below. This gives you the exact result you want.

 

Let me know if this works for you.

Shmuel
Garnet | Level 18

@sasuser123123 wrote:

Im not getting desired output, in Id 103 first value is missing it should take it as 1 in new variable but it takes from next value of NAME variable of 103

 

 


You did not define the exact rules when to count the rows per ID and when to leave it blank.

According to your output shown data I'm trying to define those rules:

1) If none of the names per ID is a number then leave the new variable blank.

2) If all names per ID are numbers or part of then are empty then count into the new variable.

3) If all names per ID are empty then leave the new variable blank.

4) Question - Can it be that some names are alphabetic and others are numeric in the same ID?

     If positive - what should be the rule ?

 

The code name_num = input(name,?? best 8.) will result in:

- for empty name or alphabetic (or alphanumeric) it will result into a missing value

- for a number it will result into a numeric variable with the same value

Thus by computing the max value got per ID it is possible to make a decision :

- if max value is missing value then leave it blank

- otherwise count starting with 1 and save it in the new variable after sort by ID NAME.

 

Next is the code to do it:

data temp;
 set have;
      name_num = input(name, ?? best8.); /* adapt to max length of NAME */
run;
proc sort data=temp; by ID name_num; run;

proc sql;
  create table tmp2 as
  select ID,
         max(name_num) as flag
  from temp
  group by ID
  order by ID;
quit;

options missing= ' ';
data want; 
 merge temp tmp2;
  by ID;
     retain new_var;
     if first.ID then do;
        if missing(flag) then new_var = flag;
		else new_var = 1;
     end; else if flag then new_var+1;
     drop flag;
run;

If you find IDs with wrong counting please post:

1) The ID and NAMEs of that ID

2) Explain the logic why it is wrong and what rule it shall follow.

 

PeterClemmensen
Tourmaline | Level 20

How about

 

data have;
input ID $ Name $;
infile datalines missover;
datalines;
101 Org    
101 Mineel 
101 Ram    
102 23     
102 44     
102 33     
103        
103 40     
103 42     
104        
104 clinic 
104 Dale   
105 12     
105 4      
105 6      
106        
106 Y      
106 N      
107        
108        
;

data want(drop = n i rc flag);

   dcl hash h(ordered : "Y", multidata : "Y");
   h.definekey("n");
   h.definedone();
   dcl hiter hi("h");

   do until (last.ID);
      set have;
      by id;
      n = input(Name, ?? best8.);
      h.add();
      if n then flag = 1;
   end;

   do until (last.ID);
      set have;
      by ID;
      if flag then do i = 1 by 1 while (hi.next() = 0);
         if input(Name, ?? best8.) = n then do;
            new = i;
            rc = hi.last();
            rc = hi.next();
            leave;
         end;
      end;

      output;
   end;

   h.clear();

run;

 

Result:

 

ID   Name   new 
101  Org    . 
101  Mineel . 
101  Ram    . 
102  23     1 
102  44     3 
102  33     2 
103         1 
103  40     2 
103  42     3 
104         . 
104  clinic . 
104  Dale   . 
105  12     3 
105  4      1 
105  6      2 
106         . 
106  Y      . 
106  N      . 
107         . 
108         .    

 

sasuser123123
Quartz | Level 8

Yeah its working . i applied the same code to another dataset which contains start values like

107413601

107415618

87896111

so the code will give the output to this values is 

1

2

I dont know why its coming like

 

and also I have 0 in Id variable the code not giving the sorting order for 0

 

could you please help me out

sasuser123123
Quartz | Level 8

In above comment its not start its ID(sorry for mistake)

sasuser123123
Quartz | Level 8


ID Name 
112 Org   

112 Mineel
112 Ram
113 23 
113 44 
113 33 
114 10741368
114 10741568 
114 8789611

115  0
115  0
116 12 
116 4  
116 6 
so  Im geting wrong sorting order for id 114 and didi not get any number for 115

PeterClemmensen
Tourmaline | Level 20

Please post data that is representable of you actual data. Makes it easier to help you 🙂

PeterClemmensen
Tourmaline | Level 20

Seem to me that the result for 114 is correct?

 

Regarding 115, just made a correction. See if this helps.

 

data have;
input ID $ Name $;
infile datalines missover;
datalines; 
112  Org      
112  Mineel   
112  Ram      
113  23       
113  44       
113  33       
114  10741368 
114  10741568 
114  8789611  
115  0        
115  0        
116  12       
116  4        
116  6        
;

data want(drop = n flag i rc);

   dcl hash h(ordered : "Y", multidata : "Y");
   h.definekey("n");
   h.definedone();
   dcl hiter hi("h");

   do until (last.ID);
      set have;
      by id;
      n = input(Name, ?? best8.);
      h.add();
      if n > . then flag = 1;
   end;

   do until (last.ID);
      set have;
      by ID;
      if flag then do i = 1 by 1 while (hi.next() = 0);
         if input(Name, ?? best8.) = n then do;
            new = i;
            rc = hi.last();
            rc = hi.next();
            leave;
         end;
      end;

      output;
   end;

   h.clear();

run;

 

Result:

 

ID   Name      new 
112  Org       . 
112  Mineel    . 
112  Ram       . 
113  23        1 
113  44        3 
113  33        2 
114  10741368  2 
114  10741568  3 
114  8789611   1 
115  0         1 
115  0         1 
116  12        3 
116  4         1 
116  6         2 

 

sasuser123123
Quartz | Level 8
Im unable to understand this code and also not getting any other methods..
could you please help me out
PeterClemmensen
Tourmaline | Level 20

Does my code provide the result you want? 

 

The logic in your problem is not trivial, so the code probably will not be either 🙂 I'll help you understand the code. 

sasuser123123
Quartz | Level 8
Yes it works..but for few observations it's not working I have dataset with 350 observations out of that 330 observations got expected sorting number but for remaining 20 observations the sorting order is not coming expectedly. I don't know why.
and 20 observations'S NAME have values like
10456788
10425786
887446

It gives sorting order 1,2,3... etc per ID
ballardw
Super User

@sasuser123123 wrote:
Yes it works..but for few observations it's not working I have dataset with 350 observations out of that 330 observations got expected sorting number but for remaining 20 observations the sorting order is not coming expectedly. I don't know why.
and 20 observations'S NAME have values like
10456788
10425786
887446

It gives sorting order 1,2,3... etc per ID

When you mix numerals in character variables the sort order is very likely not what you expect and you may need to provide which you expect for a sort order.

"102456788" comes before "887446" because character sort order defaults to comparing the first CHARACTER value to the first character of another value and "1" comes before "8".

 

I think that for your purpose you need to 1) identify those "name" values that are all digits; 2) subset the data on that; 3) use Proc Sort on the subset of names with SORTSEQ=linguistic(numeric_collation=on) option to sort in "numeric" order; 4) add the order and 5) merge back to the original data

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 14 replies
  • 3375 views
  • 0 likes
  • 4 in conversation