BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Siddharth123
Obsidian | Level 7

Hello All,

 

I have a SAS Data manipulation question.

 

Please see attached excel.

I am trying to change the column 'Total_Record_Number_ALL' from column F to column G. 

The SAS Code should look at each Segment, then keep the value of column 'Total_Record_Number_ALL'  same for the first ID, but 

change the value of the subsequent IDs. for example, for Segment 'SEG_00070', the value for Segment 'QA_00012' should stay as 10712. But for ID QA_00013, the value should exclude the Failure records (columns C & E) corresponding to the first ID QA_00012 thus the value will change from 10712 to 10712 - 0 (column C) - 4210 (column E) = 6502.

 

 

Please help.

 

Best

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Don't post spreadsheets. Just post the data. Also use shortner names, at least in your examples, nobody wants to type a paragraph just to reference a varaible by name.  Also if you know what value you want to get then give that in your example so we can test whether our suggestions work or not.

So it looks like you have this data.

ID,Segment,Fail_NTB,Total_NTB,Fail_ALL,Total_ALL,Want
QA_00012,SEG_00070,0,35,4210,10712,10712
QA_00012,SEG_00071,0,30,299,6193,6193
QA_00012,SEG_00072,0,99,642,6570,6570
QA_00013,SEG_00070,0,35,191,10712,6502
QA_00013,SEG_00071,0,30,235,6193,5894
QA_00013,SEG_00072,0,99,163,6570,5928
QA_00001,SEG_00142,0,19,3,565,565

So it looks like you have an two id variables. How do they relate to each other?  You seem to have sorted the data by ID and SEGMENT but your explanation of your algorithm makes it look like you really want to sort by SEGMENT and then ID .

So let's turn your data into an actual SAS dataset.

data have ;
  infile cards dsd truncover ;
  length id segment $10 ;
  input ID $ Segment $ Fail_NTB Total_NTB Fail_ALL Total_ALL Want;
cards;
QA_00012,SEG_00070,0,35,4210,10712,10712
QA_00012,SEG_00071,0,30,299,6193,6193
QA_00012,SEG_00072,0,99,642,6570,6570
QA_00013,SEG_00070,0,35,191,10712,6502
QA_00013,SEG_00071,0,30,235,6193,5894
QA_00013,SEG_00072,0,99,163,6570,5928
QA_00001,SEG_00142,0,19,3,565,565
;

Next sort it properly so we can process the recrods in the right order.

proc sort;
  by segment id ;
run;

We can calculate the number of non-failures by subtraction, but it looks like you want to actually keep the value from the previous calculation instead of the calculation on the current record.  So we could use LAG() to get the value from the previous value, but we need to execute the LAG() function on every observation otherwise if will messup how LAG() operates. So calculate if first and then overwrite the value when you are on the first record of the group.

data want ;
  set have ;
  by segment  ;
  non_fail = total_all - fail_ntb - fail_all ;
  new = lag(non_fail);
  if first.segment then new=total_all ;
run;
                                        Total_              Total_
Obs     id        segment   Fail_NTB    NTB   Fail_ALL    ALL    Want  non_fail   new

 1   QA_00012    SEG_00070      0       35      4210     10712  10712     6502   10712
 2   QA_00013    SEG_00070      0       35       191     10712   6502    10521    6502
 3   QA_00012    SEG_00071      0       30       299      6193   6193     5894    6193
 4   QA_00013    SEG_00071      0       30       235      6193   5894     5958    5894
 5   QA_00012    SEG_00072      0       99       642      6570   6570     5928    6570
 6   QA_00013    SEG_00072      0       99       163      6570   5928     6407    5928
 7   QA_00001    SEG_00142      0       19         3       565    565      562     565

View solution in original post

7 REPLIES 7
LinusH
Tourmaline | Level 20
A) Please use your imagination. How does your subject distinguish your question from others?
B) Don't attach Excel files.
C) This is not a programming service. What have you tried so far?
Data never sleeps
MarkWik
Quartz | Level 8

The subject line as SAS base warrants to be completely ridiculed by the moderator. Also like @LinusH  mentioned, OP should extend courtesy to responders when asking for help by explaining the situation, ideally a business problem or a real world technical solution with some self attempt in the first place. Very annoying indeed!!!!!!!!!

 

Should super users like @Reeza who gave me likes to a similar comment of mine for another ridiculous question have privileges to mark such questions as invalid, they should mark without any hesitation. It can't be like one can sit at work, expecting to throw their work onto people in the forum while they can enjoy their coffee. 

 

novinosrin
Tourmaline | Level 20

I see some strong comments.hmm. I do concur with their thoughts. OK However., I saw your sample and I'm afraid you should have framed the subject line as some kind of look up across columns/column values and stuff and not SAS base lol. 

Alright, so there you go,

1. now you know that your problem requires a look up solution. 

2. What you should be doing next? Search for SAS look up concepts. For examples, joins, merge, arrays, hashes etc

3. Then try to figure which of the above can work for your data and requirement

4. You may require pre processing your data to get into PDV for SAS to execute.

5. Break your solution into pieces so it is easy to understand and debug until you gain solid expertise

 

In SAS programming, what I learned in my 4+ years of exp, the code should come to your mind as soon as you look at the data or while your stakeholders/boss at work assign you the task. This is when you'll know you have arrived .

 

Try to give it your best shot take the above 5 points and if you still don't get it right, that's perfectly fine. We the community and I will give you the solution as long as there is a solid effort. I promise. Either way, if somebody does give you what you want without you having to go through the pain of my suggested points. God bless! All the best!

Reeza
Super User

@MarkWik SuperUsers can't mark questions invalid, but we can edit questions. I usually will for subject lines if I catch it early enough and understand the question. Although I agree in general, I try not to get too aggressive with people, though I have been on several sides of this rant, people who feel I'm 'rude or arrogant' and unhelpful.  One part of learning programming is learning how to ask questions, basically you don't even know what you don't know. At the end of the day, the only behaviour we can control is ours, and if a users continuously doesn't show effort, we can just ignore the question. There are a few users who I won't answer on principal anymore. 

 

@Siddharth123 although some of the points here may come across as rude, please realize the goal is really to answer your question as efficiently and quickly as possible. Taking time to ask for clarifications, figure out the problem, download/open attachments all reduce the amount of time to answer your question or others.  These suggestions are designed to help you get answers faster and they really, really do. 

Siddharth123
Obsidian | Level 7

I apologise and would take special care when posting question on the forum.

 

kind regards

Tom
Super User Tom
Super User

Don't post spreadsheets. Just post the data. Also use shortner names, at least in your examples, nobody wants to type a paragraph just to reference a varaible by name.  Also if you know what value you want to get then give that in your example so we can test whether our suggestions work or not.

So it looks like you have this data.

ID,Segment,Fail_NTB,Total_NTB,Fail_ALL,Total_ALL,Want
QA_00012,SEG_00070,0,35,4210,10712,10712
QA_00012,SEG_00071,0,30,299,6193,6193
QA_00012,SEG_00072,0,99,642,6570,6570
QA_00013,SEG_00070,0,35,191,10712,6502
QA_00013,SEG_00071,0,30,235,6193,5894
QA_00013,SEG_00072,0,99,163,6570,5928
QA_00001,SEG_00142,0,19,3,565,565

So it looks like you have an two id variables. How do they relate to each other?  You seem to have sorted the data by ID and SEGMENT but your explanation of your algorithm makes it look like you really want to sort by SEGMENT and then ID .

So let's turn your data into an actual SAS dataset.

data have ;
  infile cards dsd truncover ;
  length id segment $10 ;
  input ID $ Segment $ Fail_NTB Total_NTB Fail_ALL Total_ALL Want;
cards;
QA_00012,SEG_00070,0,35,4210,10712,10712
QA_00012,SEG_00071,0,30,299,6193,6193
QA_00012,SEG_00072,0,99,642,6570,6570
QA_00013,SEG_00070,0,35,191,10712,6502
QA_00013,SEG_00071,0,30,235,6193,5894
QA_00013,SEG_00072,0,99,163,6570,5928
QA_00001,SEG_00142,0,19,3,565,565
;

Next sort it properly so we can process the recrods in the right order.

proc sort;
  by segment id ;
run;

We can calculate the number of non-failures by subtraction, but it looks like you want to actually keep the value from the previous calculation instead of the calculation on the current record.  So we could use LAG() to get the value from the previous value, but we need to execute the LAG() function on every observation otherwise if will messup how LAG() operates. So calculate if first and then overwrite the value when you are on the first record of the group.

data want ;
  set have ;
  by segment  ;
  non_fail = total_all - fail_ntb - fail_all ;
  new = lag(non_fail);
  if first.segment then new=total_all ;
run;
                                        Total_              Total_
Obs     id        segment   Fail_NTB    NTB   Fail_ALL    ALL    Want  non_fail   new

 1   QA_00012    SEG_00070      0       35      4210     10712  10712     6502   10712
 2   QA_00013    SEG_00070      0       35       191     10712   6502    10521    6502
 3   QA_00012    SEG_00071      0       30       299      6193   6193     5894    6193
 4   QA_00013    SEG_00071      0       30       235      6193   5894     5958    5894
 5   QA_00012    SEG_00072      0       99       642      6570   6570     5928    6570
 6   QA_00013    SEG_00072      0       99       163      6570   5928     6407    5928
 7   QA_00001    SEG_00142      0       19         3       565    565      562     565
novinosrin
Tourmaline | Level 20

data have;

input ID $ Segment : $15.  Fail_Record_Number_NTB     Total_Record_Number_NTB     Fail_Record_Number_ALL     Total_Record_Number_ALL;

datalines;

QA_00012   SEG_00070  0    35   4210 10712

QA_00012   SEG_00071  0    30   299  6193

QA_00012   SEG_00072  0    99   642  6570

QA_00013   SEG_00070  0    35   191  10712

QA_00013   SEG_00071  0    30   235  6193

QA_00013   SEG_00072  0    99   163  6570

QA_00001   SEG_00142  0    19   3    565

QA_00001   SEG_00143  0    59   0    491

QA_00001   SEG_00144  0    30   0    201

QA_00002   SEG_00142  0    19   0    565

QA_00002   SEG_00143  0    59   0    491

QA_00002   SEG_00144  0    30   0    201

QA_00003   SEG_00142  0    19   3    565

QA_00003   SEG_00143  0    59   0    491

QA_00003   SEG_00144  0    30   0    201

;

 

data want;

if _N_ = 1 then do;

    if 0 then set have;

      declare hash myhash( );

      myhash.defineKey('segment');

      myhash.defineData('_Fail_Record_Number_ALL');

      myhash.defineDone( );

       call missing(_Fail_Record_Number_ALL);

   end;

   set have;

   _Fail_Record_Number_ALL=Fail_Record_Number_ALL;

   if myhash.check() ne 0 then myhash.add();

   else do;

   myhash.find();

   Total_Record_Number_ALL=Total_Record_Number_ALL-_Fail_Record_Number_ALL;

   end;

   drop _:;

run;

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1630 views
  • 2 likes
  • 6 in conversation