Hello All,
I have a SAS Data manipulation question.
Please see attached excel.
I am trying to change the column 'Total_Record_Number_ALL' from column F to column G.
The SAS Code should look at each Segment, then keep the value of column 'Total_Record_Number_ALL' same for the first ID, but
change the value of the subsequent IDs. for example, for Segment 'SEG_00070', the value for Segment 'QA_00012' should stay as 10712. But for ID QA_00013, the value should exclude the Failure records (columns C & E) corresponding to the first ID QA_00012 thus the value will change from 10712 to 10712 - 0 (column C) - 4210 (column E) = 6502.
Please help.
Best
Don't post spreadsheets. Just post the data. Also use shortner names, at least in your examples, nobody wants to type a paragraph just to reference a varaible by name. Also if you know what value you want to get then give that in your example so we can test whether our suggestions work or not.
So it looks like you have this data.
ID,Segment,Fail_NTB,Total_NTB,Fail_ALL,Total_ALL,Want QA_00012,SEG_00070,0,35,4210,10712,10712
QA_00012,SEG_00071,0,30,299,6193,6193
QA_00012,SEG_00072,0,99,642,6570,6570
QA_00013,SEG_00070,0,35,191,10712,6502
QA_00013,SEG_00071,0,30,235,6193,5894
QA_00013,SEG_00072,0,99,163,6570,5928
QA_00001,SEG_00142,0,19,3,565,565
So it looks like you have an two id variables. How do they relate to each other? You seem to have sorted the data by ID and SEGMENT but your explanation of your algorithm makes it look like you really want to sort by SEGMENT and then ID .
So let's turn your data into an actual SAS dataset.
data have ;
infile cards dsd truncover ;
length id segment $10 ;
input ID $ Segment $ Fail_NTB Total_NTB Fail_ALL Total_ALL Want;
cards;
QA_00012,SEG_00070,0,35,4210,10712,10712
QA_00012,SEG_00071,0,30,299,6193,6193
QA_00012,SEG_00072,0,99,642,6570,6570
QA_00013,SEG_00070,0,35,191,10712,6502
QA_00013,SEG_00071,0,30,235,6193,5894
QA_00013,SEG_00072,0,99,163,6570,5928
QA_00001,SEG_00142,0,19,3,565,565
;
Next sort it properly so we can process the recrods in the right order.
proc sort;
by segment id ;
run;
We can calculate the number of non-failures by subtraction, but it looks like you want to actually keep the value from the previous calculation instead of the calculation on the current record. So we could use LAG() to get the value from the previous value, but we need to execute the LAG() function on every observation otherwise if will messup how LAG() operates. So calculate if first and then overwrite the value when you are on the first record of the group.
data want ;
set have ;
by segment ;
non_fail = total_all - fail_ntb - fail_all ;
new = lag(non_fail);
if first.segment then new=total_all ;
run;
Total_ Total_ Obs id segment Fail_NTB NTB Fail_ALL ALL Want non_fail new 1 QA_00012 SEG_00070 0 35 4210 10712 10712 6502 10712 2 QA_00013 SEG_00070 0 35 191 10712 6502 10521 6502 3 QA_00012 SEG_00071 0 30 299 6193 6193 5894 6193 4 QA_00013 SEG_00071 0 30 235 6193 5894 5958 5894 5 QA_00012 SEG_00072 0 99 642 6570 6570 5928 6570 6 QA_00013 SEG_00072 0 99 163 6570 5928 6407 5928 7 QA_00001 SEG_00142 0 19 3 565 565 562 565
The subject line as SAS base warrants to be completely ridiculed by the moderator. Also like @LinusH mentioned, OP should extend courtesy to responders when asking for help by explaining the situation, ideally a business problem or a real world technical solution with some self attempt in the first place. Very annoying indeed!!!!!!!!!
Should super users like @Reeza who gave me likes to a similar comment of mine for another ridiculous question have privileges to mark such questions as invalid, they should mark without any hesitation. It can't be like one can sit at work, expecting to throw their work onto people in the forum while they can enjoy their coffee.
I see some strong comments.hmm. I do concur with their thoughts. OK However., I saw your sample and I'm afraid you should have framed the subject line as some kind of look up across columns/column values and stuff and not SAS base lol.
Alright, so there you go,
1. now you know that your problem requires a look up solution.
2. What you should be doing next? Search for SAS look up concepts. For examples, joins, merge, arrays, hashes etc
3. Then try to figure which of the above can work for your data and requirement
4. You may require pre processing your data to get into PDV for SAS to execute.
5. Break your solution into pieces so it is easy to understand and debug until you gain solid expertise
In SAS programming, what I learned in my 4+ years of exp, the code should come to your mind as soon as you look at the data or while your stakeholders/boss at work assign you the task. This is when you'll know you have arrived .
Try to give it your best shot take the above 5 points and if you still don't get it right, that's perfectly fine. We the community and I will give you the solution as long as there is a solid effort. I promise. Either way, if somebody does give you what you want without you having to go through the pain of my suggested points. God bless! All the best!
@MarkWik SuperUsers can't mark questions invalid, but we can edit questions. I usually will for subject lines if I catch it early enough and understand the question. Although I agree in general, I try not to get too aggressive with people, though I have been on several sides of this rant, people who feel I'm 'rude or arrogant' and unhelpful. One part of learning programming is learning how to ask questions, basically you don't even know what you don't know. At the end of the day, the only behaviour we can control is ours, and if a users continuously doesn't show effort, we can just ignore the question. There are a few users who I won't answer on principal anymore.
@Siddharth123 although some of the points here may come across as rude, please realize the goal is really to answer your question as efficiently and quickly as possible. Taking time to ask for clarifications, figure out the problem, download/open attachments all reduce the amount of time to answer your question or others. These suggestions are designed to help you get answers faster and they really, really do.
I apologise and would take special care when posting question on the forum.
kind regards
Don't post spreadsheets. Just post the data. Also use shortner names, at least in your examples, nobody wants to type a paragraph just to reference a varaible by name. Also if you know what value you want to get then give that in your example so we can test whether our suggestions work or not.
So it looks like you have this data.
ID,Segment,Fail_NTB,Total_NTB,Fail_ALL,Total_ALL,Want QA_00012,SEG_00070,0,35,4210,10712,10712
QA_00012,SEG_00071,0,30,299,6193,6193
QA_00012,SEG_00072,0,99,642,6570,6570
QA_00013,SEG_00070,0,35,191,10712,6502
QA_00013,SEG_00071,0,30,235,6193,5894
QA_00013,SEG_00072,0,99,163,6570,5928
QA_00001,SEG_00142,0,19,3,565,565
So it looks like you have an two id variables. How do they relate to each other? You seem to have sorted the data by ID and SEGMENT but your explanation of your algorithm makes it look like you really want to sort by SEGMENT and then ID .
So let's turn your data into an actual SAS dataset.
data have ;
infile cards dsd truncover ;
length id segment $10 ;
input ID $ Segment $ Fail_NTB Total_NTB Fail_ALL Total_ALL Want;
cards;
QA_00012,SEG_00070,0,35,4210,10712,10712
QA_00012,SEG_00071,0,30,299,6193,6193
QA_00012,SEG_00072,0,99,642,6570,6570
QA_00013,SEG_00070,0,35,191,10712,6502
QA_00013,SEG_00071,0,30,235,6193,5894
QA_00013,SEG_00072,0,99,163,6570,5928
QA_00001,SEG_00142,0,19,3,565,565
;
Next sort it properly so we can process the recrods in the right order.
proc sort;
by segment id ;
run;
We can calculate the number of non-failures by subtraction, but it looks like you want to actually keep the value from the previous calculation instead of the calculation on the current record. So we could use LAG() to get the value from the previous value, but we need to execute the LAG() function on every observation otherwise if will messup how LAG() operates. So calculate if first and then overwrite the value when you are on the first record of the group.
data want ;
set have ;
by segment ;
non_fail = total_all - fail_ntb - fail_all ;
new = lag(non_fail);
if first.segment then new=total_all ;
run;
Total_ Total_ Obs id segment Fail_NTB NTB Fail_ALL ALL Want non_fail new 1 QA_00012 SEG_00070 0 35 4210 10712 10712 6502 10712 2 QA_00013 SEG_00070 0 35 191 10712 6502 10521 6502 3 QA_00012 SEG_00071 0 30 299 6193 6193 5894 6193 4 QA_00013 SEG_00071 0 30 235 6193 5894 5958 5894 5 QA_00012 SEG_00072 0 99 642 6570 6570 5928 6570 6 QA_00013 SEG_00072 0 99 163 6570 5928 6407 5928 7 QA_00001 SEG_00142 0 19 3 565 565 562 565
data have;
input ID $ Segment : $15. Fail_Record_Number_NTB Total_Record_Number_NTB Fail_Record_Number_ALL Total_Record_Number_ALL;
datalines;
QA_00012 SEG_00070 0 35 4210 10712
QA_00012 SEG_00071 0 30 299 6193
QA_00012 SEG_00072 0 99 642 6570
QA_00013 SEG_00070 0 35 191 10712
QA_00013 SEG_00071 0 30 235 6193
QA_00013 SEG_00072 0 99 163 6570
QA_00001 SEG_00142 0 19 3 565
QA_00001 SEG_00143 0 59 0 491
QA_00001 SEG_00144 0 30 0 201
QA_00002 SEG_00142 0 19 0 565
QA_00002 SEG_00143 0 59 0 491
QA_00002 SEG_00144 0 30 0 201
QA_00003 SEG_00142 0 19 3 565
QA_00003 SEG_00143 0 59 0 491
QA_00003 SEG_00144 0 30 0 201
;
data want;
if _N_ = 1 then do;
if 0 then set have;
declare hash myhash( );
myhash.defineKey('segment');
myhash.defineData('_Fail_Record_Number_ALL');
myhash.defineDone( );
call missing(_Fail_Record_Number_ALL);
end;
set have;
_Fail_Record_Number_ALL=Fail_Record_Number_ALL;
if myhash.check() ne 0 then myhash.add();
else do;
myhash.find();
Total_Record_Number_ALL=Total_Record_Number_ALL-_Fail_Record_Number_ALL;
end;
drop _:;
run;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.