BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Abhi2017
Calcite | Level 5

Hi ,

 

I want to create a column C in my SAS  data set based on column A  and B values as below.

 

Please suggest.

 

Column AColumn BColumn C
XXXA1
 B2
 C3
 D4
 E5
XXXB2
 C3
XXXA1
 E5
1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

3000 values should easily fit into memory with a format, so my solution should work..

You can even create reverse format in the same step that maps the number to the original value. That allows you to drop B, reducing space consumption and speeding up your computations.

View solution in original post

9 REPLIES 9
PaigeMiller
Diamond | Level 26

@Abhi2017 wrote:

Hi ,

 

I want to create a column C in my SAS  data set based on column A  and B values as below.

 

Please suggest.

 

Column A Column B Column C
XXX A 1
  B 2
  C 3
  D 4
  E 5
XXX B 2
  C 3
XXX A 1
  E 5

Can you explain further? it's not clear what the desired result is. It's not clear what is the use of Column A.

--
Paige Miller
Abhi2017
Calcite | Level 5

I have Column A and B in my input data set,I want to create Column C based on A and B

 

Say i have  the data as below ,I want to create a sequences  as column C which should have unique value for each column A and B.

ANd the Column C value will repeat when identifies value in Column B.

Column AColumn BColumn C
100A1
 B2
 C3
 D4
 E5
101B2
 C3
102A1
 E5
PaigeMiller
Diamond | Level 26

@Abhi2017 wrote:

I have Column A and B in my input data set,I want to create Column C based on A and B

 

Say i have  the data as below ,I want to create a sequences  as column C which should have unique value for each column A and B.

ANd the Column C value will repeat when identifies value in Column B.

Column A Column B Column C
100 A 1
  B 2
  C 3
  D 4
  E 5
101 B 2
  C 3
102 A 1
  E 5

I'm sorry, I still don't understand. It sounds like the statement that you want "as column C which should have unique value for each column A and B" is contradicted by the next statement in which you say "ANd the Column C value will repeat when identifies value in Column B" meaning that column C does not depend on A.

--
Paige Miller
Abhi2017
Calcite | Level 5

Thanks for your response.Let me try to reframe .
In my input data set, I have column A and Column B.My requirement is to create a new variable i.e. Column C based on Column B.Eample:100(A) has 5 (B)values 1 to 5.I want to create 5 unique sequences for each 5 rows in Column C.If the data in B is repeated in subsequent rows it will hold the initial value not new value.Now say I have assigned a value for column B ,value B as 2 and column B ,value C as 3, then the program to hold the value 2 and 3.

Now for Column B value B and Column B value c we already assigned 2 and 3 each ,so when 101(Column A) is validated it should automatically output 2 and 3 like wise 102 (Column A) it should validate 1 and 5 each rather than assigning new number.

Column C is dependent on value at column B.Hope this helps to explain my query.

Kurt_Bremser
Super User

Don't think so complicated. Since C would be directly dependent on only B, C is unnecessary. You can use B for conditions just as you would use C. C = f(B), so if C would become if f(B).

 

The only use I can see for a new (numeric) variable would be to have a reference for creating an order that can't be achieved with using B alone.

 

In any case, my suggested solution does this.

Abhi2017
Calcite | Level 5

Thanks for your response Kurt,however my requirement is to create a new variable C .Because in input data set there are around 3000+ records for Column B and length for the field is greater than 200.Validation and review for  numeric value  column C will help  much better compared to character field column B.Let me know your suggestion.

Kurt_Bremser
Super User

3000 values should easily fit into memory with a format, so my solution should work..

You can even create reverse format in the same step that maps the number to the original value. That allows you to drop B, reducing space consumption and speeding up your computations.

Abhi2017
Calcite | Level 5

Thanks for the suggestion Kurt.Worked great!

Kurt_Bremser
Super User
/* using  a custom format */

proc sort
  data=have (keep=column_b)
  out=lookup (rename=(column_b=start))
  nodupkey
;
by column_b;
run;

data lookup;
set lookup;
fmtname = 'column_b';
type = 'C';
label = put(_n_,best.);
run;

proc format lib=work cntlin=lookup;
run;

data want;
set have;
column_c = put(column_b,$column_b.);
run;

proc print data=want noobs;
run;

Result:

column_a    column_b    column_c

  XXX          A           1    
               B           2    
               C           3    
               D           4    
               E           5    
  XXX          B           2    
               C           3    
  XXX          A           1    
               E           5    

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1049 views
  • 1 like
  • 3 in conversation