Hi,
My question is referencing the below thread regarding how to skip a variable based on the value and go to the next variable.
https://communities.sas.com/t5/SAS-Programming/how-to-skip-a-variable-based-on-the-value-and-go-to-t...
I got a similar situation.
I tried the solution in the above thread. It seems working for me by performing some of the eye ball checking.
I need to conduct a data quality check.
Can somebody suggest an efficient approach (SAS code) to conduct a data quality check to ensure it has done the job appropriately?
Thank you very much in advance.
@BeatriceWang The requirement in your other question here was: " If the grid_rate =D than i dont want that value and go on the the next grid rate". You didn't mention missings and that you want to skip them as well.
If you want to exclude multiple values then just change the condition from:
if rates[_i] ne 'D' then
to a list like:
if rates[_i] not in ('D',' ') then
Here implemented for one of the options posted in your other thread.
data have;
infile datalines truncover dsd;
input (contract_id rate1 rate2 rate3 rate4 want_rate1 want_rate2 want_rate3 want_rate4) ($);
datalines;
XEA123,D,PRIME,TD49,TD58,PRIME,TD49,TD58
DEW343,,D,TD58,PRIME,TD49,TD58,PRIME
JRQ912,PRIME,TD49,TD58,,PRIME,TD49,TD58
;
data want(drop=_:);
set have;
array rates {*} rate1 - rate4;
do _i=1 to dim(rates);
if rates[_i] not in ('D',' ') then
do;
_ind=sum(_ind,1);
rates[_ind]=rates[_i];
end;
end;
_ind=sum(_ind,1);
do _i=_ind to dim(rates);
call missing(rates[_i]);
end;
run;
@BeatriceWang - Your link doesn't appear to work.
The link works well for me. I tried directly clicking on it.
@BeatriceWang - Yep, must have been a website problem. Working now.
Exactly what kind of check do you want?
I want to check the newvars consecutively stored the vars without blank fields on all records of my file(big).
I did some eye ball checking. It looks OK. But I could not eye check all of the records.
Please advise if you can.
@BeatriceWang wrote:
I want to check the newvars consecutively stored the vars without blank fields on all records of my file(big).
I did some eye ball checking. It looks OK. But I could not eye check all of the records.
Please advise if you can.
What exactly does "consecutively stored the vars without blank fields" actually mean? We don't have any of your data, before or after so we have no idea which vars, and while it may be obvious to you, "consecutively stored" is not intuitively obvious. You can use the NMISS function if the variables are numeric to tell how many are missing (or blank).
@BeatriceWang The requirement in your other question here was: " If the grid_rate =D than i dont want that value and go on the the next grid rate". You didn't mention missings and that you want to skip them as well.
If you want to exclude multiple values then just change the condition from:
if rates[_i] ne 'D' then
to a list like:
if rates[_i] not in ('D',' ') then
Here implemented for one of the options posted in your other thread.
data have;
infile datalines truncover dsd;
input (contract_id rate1 rate2 rate3 rate4 want_rate1 want_rate2 want_rate3 want_rate4) ($);
datalines;
XEA123,D,PRIME,TD49,TD58,PRIME,TD49,TD58
DEW343,,D,TD58,PRIME,TD49,TD58,PRIME
JRQ912,PRIME,TD49,TD58,,PRIME,TD49,TD58
;
data want(drop=_:);
set have;
array rates {*} rate1 - rate4;
do _i=1 to dim(rates);
if rates[_i] not in ('D',' ') then
do;
_ind=sum(_ind,1);
rates[_ind]=rates[_i];
end;
end;
_ind=sum(_ind,1);
do _i=_ind to dim(rates);
call missing(rates[_i]);
end;
run;
Thank you for all the responses. You guys are all very helpful.
My data looks like this:
var1 | var2 | var3 | var4 | var5 | var6 | newvar1 | newvar2 | newvar3 | newvar4 | newvar5 | newvar6 |
mi | ab | cd | mi | ab | cd | ||||||
ef | hi | xy | ef | hi | xy | ||||||
wx | yz | jk | lm | no | wx | yz | jk | lm | no |
The code you posted works for me.
Like I said earlier I need to do data quality check.
I am looking for efficient sas code to check if the job is appropriately done on all of the records in my file (big).
Please advise if you can. Thanks in advance.
You suffer from inefficient data design. Maxim 19: Long Beats Wide.
So, first transpose:
data have;
infile datalines dsd dlm=" " truncover;
input ID $ (var1 var2 var3 var4 var5 var6) ($);
datalines;
1 mi ab cd
2 ef hi xy
3 wx yz jk lm no
;
proc transpose
data=have
out=long (
rename=(_name_=var col1=value)
where=(value ne " ")
)
;
var var:;
by id;
run;
(I guess you have some identifying column like the ID I use here)
Then renumber:
data want;
set long;
by id;
if first.id
then i = 1;
else i + 1;
var = cats("var",i);
drop i;
run;
If you want, you can now re-transpose to wide:
proc transpose
data=want
out=want_wide (drop=_name_)
;
by id;
id var;
var value;
run;
Not that, throughout, the code needs not make any assumption about the number of entries for a single ID. And no space is wasted for missing values, after the initial transpose.
If you want to test if the code logic does what you need then create test cases. In your case create sample data that represent what you see in your data prior to running the code, then run the code against it and verify that the outcome meets expectations.
Writing "DQ" code you're asking for would require us to know how your data looks like - and if you know that then you can also create the test cases as explained above. That's how you test code.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.