BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
BeatriceWang
Obsidian | Level 7

Hi,

My question is referencing the below thread regarding how to skip a variable based on the value and go to the next variable.
https://communities.sas.com/t5/SAS-Programming/how-to-skip-a-variable-based-on-the-value-and-go-to-t... 

I got a similar situation.

I tried the solution in the above thread. It seems working for me by performing some of the eye ball checking.

I need to conduct a data quality check.

Can somebody suggest an efficient approach (SAS code) to conduct a data quality check to ensure it has done the job appropriately?

Thank you very much in advance.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@BeatriceWang The requirement in your other question here was: " If the grid_rate =D than i dont want that value and go on the the next grid rate". You didn't mention missings and that you want to skip them as well.

 

If you want to exclude multiple values then just change the condition from:

if rates[_i] ne 'D' then

to a list like:

if rates[_i] not in ('D',' ') then

Here implemented for one of the options posted in your other thread.

data have;
  infile datalines truncover dsd;
  input (contract_id rate1 rate2 rate3 rate4 want_rate1 want_rate2 want_rate3 want_rate4) ($);
  datalines;
XEA123,D,PRIME,TD49,TD58,PRIME,TD49,TD58
DEW343,,D,TD58,PRIME,TD49,TD58,PRIME
JRQ912,PRIME,TD49,TD58,,PRIME,TD49,TD58
;

data want(drop=_:);
  set have;
  array rates {*} rate1 - rate4;
  do _i=1 to dim(rates);
    if rates[_i] not in ('D',' ') then
      do;
        _ind=sum(_ind,1);
        rates[_ind]=rates[_i];
      end;
  end;
  _ind=sum(_ind,1);
  do _i=_ind to dim(rates);
    call missing(rates[_i]);
  end;
run;

View solution in original post

12 REPLIES 12
SASKiwi
PROC Star

@BeatriceWang  - Your link doesn't appear to work.

BeatriceWang
Obsidian | Level 7

The link works well for me. I tried directly clicking on it.

Reeza
Super User
I fixed the link for you, it didn't work previously. I assume that means I found the right link as well 🙂
SASKiwi
PROC Star

@BeatriceWang  - Yep, must have been a website problem. Working now.

ballardw
Super User

Exactly what kind of check do you want?

BeatriceWang
Obsidian | Level 7

I want to check the newvars consecutively stored the vars without blank fields on all records of my file(big).

I did some eye ball checking. It looks OK. But I could not eye check all of the records.

Please advise if you can.

ballardw
Super User

@BeatriceWang wrote:

I want to check the newvars consecutively stored the vars without blank fields on all records of my file(big).

I did some eye ball checking. It looks OK. But I could not eye check all of the records.

Please advise if you can.


What exactly does "consecutively stored the vars without blank fields" actually mean? We don't have any of your data, before or after so we have no idea which vars, and while it may be obvious to you, "consecutively stored" is not intuitively obvious. You can use the NMISS function if the variables are numeric to tell how many are missing (or blank).

 

 

Patrick
Opal | Level 21

@BeatriceWang The requirement in your other question here was: " If the grid_rate =D than i dont want that value and go on the the next grid rate". You didn't mention missings and that you want to skip them as well.

 

If you want to exclude multiple values then just change the condition from:

if rates[_i] ne 'D' then

to a list like:

if rates[_i] not in ('D',' ') then

Here implemented for one of the options posted in your other thread.

data have;
  infile datalines truncover dsd;
  input (contract_id rate1 rate2 rate3 rate4 want_rate1 want_rate2 want_rate3 want_rate4) ($);
  datalines;
XEA123,D,PRIME,TD49,TD58,PRIME,TD49,TD58
DEW343,,D,TD58,PRIME,TD49,TD58,PRIME
JRQ912,PRIME,TD49,TD58,,PRIME,TD49,TD58
;

data want(drop=_:);
  set have;
  array rates {*} rate1 - rate4;
  do _i=1 to dim(rates);
    if rates[_i] not in ('D',' ') then
      do;
        _ind=sum(_ind,1);
        rates[_ind]=rates[_i];
      end;
  end;
  _ind=sum(_ind,1);
  do _i=_ind to dim(rates);
    call missing(rates[_i]);
  end;
run;
BeatriceWang
Obsidian | Level 7

Thank you for all the responses. You guys are all very helpful.

My data looks like this:

var1var2var3var4var5var6newvar1newvar2newvar3newvar4newvar5newvar6
miab cd  miabcd   
ef  hixy efhixy   
wxyzjklmno wxyzjklmno 

 

The code you posted works for me.

Like I said earlier I need to do data quality check.

I am looking for efficient sas code to check if the job is appropriately done on all of the records in my file (big).

Please advise if you can. Thanks in advance.

Kurt_Bremser
Super User

You suffer from inefficient data design. Maxim 19: Long Beats Wide.

So, first transpose:

data have;
infile datalines dsd dlm=" " truncover;
input ID $ (var1 var2  var3  var4  var5  var6) ($);
datalines;
1 mi ab  cd
2 ef   hi xy 
3 wx yz jk lm no
;

proc transpose
  data=have
  out=long (
    rename=(_name_=var col1=value)
    where=(value ne " ")
  )
;
var var:;
by id;
run;

(I guess you have some identifying column like the ID I use here)

Then renumber:

data want;
set long;
by id;
if first.id
then i = 1;
else i + 1;
var = cats("var",i);
drop i;
run;

If you want, you can now re-transpose to wide:

proc transpose
  data=want
  out=want_wide (drop=_name_)
;
by id;
id var;
var value;
run;

Not that, throughout, the code needs not make any assumption about the number of entries for a single ID. And no space is wasted for missing values, after the initial transpose.

 

Patrick
Opal | Level 21

@BeatriceWang 

If you want to test if the code logic does what you need then create test cases. In your case create sample data that represent what you see in your data prior to running the code, then run the code against it and verify that the outcome meets expectations.

Writing "DQ" code you're asking for would require us to know how your data looks like - and if you know that then you can also create the test cases as explained above. That's how you test code.

Reeza
Super User
This seems like a different question than your original question. In that case, it may be better to post it as a new question with an more appropriate subject line eg "Timing a data step process"

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 2979 views
  • 3 likes
  • 6 in conversation