DATA Step, Macro, Functions and more

Getting rid of blanks

Accepted Solution Solved
Reply
Contributor
Posts: 65
Accepted Solution

Getting rid of blanks

[ Edited ]

Hi, so I have a merged dataset I created in SAS that looks like this in the output results

 

ID   Alteration    Mutation

1                         ART

1                         COX

2      GBR           LIR

2      ART            EGF

3      LOP            

3      ART             NIH

3                         EGF

 

My problem is, I want to remove all the observations that do not have both a value for Alteration and Mutation for each ID. But unfortunately, I have an excel file with duplicate ID numbers for each mutation. So for example, I'd like to find a way to remove the first two rows, without losing the last row (ID:3) since ID:3 does have some rows with both alteration and mutation results.

Thank you!


Accepted Solutions
Solution
‎10-24-2017 06:54 PM
Super User
Posts: 8,213

Re: Getting rid of blanks

Posted in reply to kmardinian

No, you won't lose them. In fact, you don't even have to initialize a and m between ID values, as they are automatically initialized. E.g., try:

data have;
  input ID   Alteration $    Mutation $;
  cards;
1      .         ART
1      .         COX
2      GBR       LIR
2      ART       EGF
3      LOP        .
3      ART       NIH
3      .         EGF
4      .         ART
4      .         COX
;

data want (drop=a m);
  do until (last.id);
    set have;
    by id;
    if not missing(alteration) then a=1;
    if not missing(mutation) then m=1;
  end;
  do until (last.id);
    set have;
    by id;
    if a and m then output;
  end;
run;

Art, CEO, AnalystFinder.com

 

View solution in original post


All Replies
Respected Advisor
Posts: 3,247

Re: Getting rid of blanks

Posted in reply to kmardinian

Your explanation makes it sound like removing the first two rows will remove anything with ID=3, and that's not making sense to me. Can you explain further?

 

Or do you want something simple like this:

 

if missing(alteration) or missing(mutation) then delete;
--
Paige Miller
Contributor
Posts: 65

Re: Getting rid of blanks

Posted in reply to PaigeMiller

I have two tests that were conducted for people. One called Mutation and the other one, Alteration. Not everyone got both tests, which is why 1 is missing values for Alteration. But 3 did get both tests, but there are blanks from how the two datasets were merged.

 

I only want the data from people who took both tests and I am trying to get rid of ID numbers that only took one of the tests

 

If I used the code that you provided me with, would I not lose my last row since it's missing an observation in Alteration?

 

if missing(alteration) or missing(mutation) then delete;

 

Super User
Posts: 8,213

Re: Getting rid of blanks

Posted in reply to kmardinian

If I correctly understand what you want to do, you need to make two passes thru your data. A DOW loop is good for that. e.g.:

data have;
  input ID   Alteration $    Mutation $;
  cards;
1      .         ART
1      .         COX
2      GBR       LIR
2      ART       EGF
3      LOP        .
3      ART       NIH
3      .         EGF
;

data want;
  do until (last.id);
    set have;
    by id;
    if first.id then do;
      a=0;
      m=0;
    end;
    if not missing(alteration) then a=1;
    if not missing(mutation) then m=1;
  end;
  do until (last.id);
    set have;
    by id;
    if a and m then output;
  end;
run;

Art, CEO, AnalystFinder.com

 

Contributor
Posts: 65

Re: Getting rid of blanks

Hi art297, 

 

Do you mind walking me through what the first do statement is doing exactly?

Thank you!

Super User
Posts: 8,213

Re: Getting rid of blanks

Posted in reply to kmardinian

Sure. With a DOW loop you're taking control of how SAS processes the dataset. All of the variable values you create are retained.

 

Thus, in the first do loop, separately for each id, a and m are only set to 1 if both values are present.

 

The second do loop starts as soon as the first loop reads the last record for a given id. Thus, if both a and m are present, it outputs the records. Then it goes back to the first do loop to process the next id.

 

Art, CEO, AnalystFinder.com

 

You can find more technical descriptions if you search the web for "DOW loop". More than likely the authors will be Whitlock, Dorfman or Henderson as they are the ones how came up with the concept.

 

Contributor
Posts: 65

Re: Getting rid of blanks

Posted in reply to kmardinian
Thanks art, I think that might work! And I won't lose the LOP and EGF observations for ID 3 since they have missing values for alteration and mutation? Thank you so much!
Solution
‎10-24-2017 06:54 PM
Super User
Posts: 8,213

Re: Getting rid of blanks

Posted in reply to kmardinian

No, you won't lose them. In fact, you don't even have to initialize a and m between ID values, as they are automatically initialized. E.g., try:

data have;
  input ID   Alteration $    Mutation $;
  cards;
1      .         ART
1      .         COX
2      GBR       LIR
2      ART       EGF
3      LOP        .
3      ART       NIH
3      .         EGF
4      .         ART
4      .         COX
;

data want (drop=a m);
  do until (last.id);
    set have;
    by id;
    if not missing(alteration) then a=1;
    if not missing(mutation) then m=1;
  end;
  do until (last.id);
    set have;
    by id;
    if a and m then output;
  end;
run;

Art, CEO, AnalystFinder.com

 

Contributor
Posts: 65

Re: Getting rid of blanks

Hi art, so I just tried running the code and unfortunately it won't run. I keep getting this note over and over again

NOTE: Invalid numeric data, Alteration='MYST3' , at line 72 column 4.

 

and then this one:  last.ID=1 ID=0463 mutation= Protocol=predict Alteration=SETD2 FIRST.MRN=0 _ERROR_=1 _N_=5

 

Do you know what this issue might be? Thank you!

Contributor
Posts: 65

Re: Getting rid of blanks

Posted in reply to kmardinian
Thank you for all the help!
Super User
Posts: 8,213

Re: Getting rid of blanks

Posted in reply to kmardinian

Post the full set of code that you're running along with the log.

 

Art, CEO, AnalystFinder.com

 

Contributor
Posts: 65

Re: Getting rid of blanks

Here is my code so far:

PROC IMPORT DATAFILE="xlsx"

   OUT=Guardant

   DBMS=XLSX

   REPLACE;

RUN;

 

 

PROC IMPORT DATAFILE="xlsx"

   OUT=Patient

   DBMS=XLSX

   REPLACE;

RUN;

 

 

PROC IMPORT DATAFILE=".xlsx"

   OUT=Foundation

   DBMS=XLSX

   REPLACE;

RUN;

 

proc sort data= G  out=G1;

 by ID;

 run;

 

proc sort data= P  out=P1;

 by ID;

 run;

 

proc sort data=F out=F1;

by ID;

run;

 

 

Data CombinedData;

 Merge G1 P1 F1;

 By  ID;

 Run;

 

Data MCC1; 

Set CombinedData (keep= Mutation ID Alteration protocol);

If protocol= "No" then delete;

if missing (mutation) and missing(alteration) then delete;

run;

 

Data Final (drop=alteration mutation);

do until (last.ID);

set MCC1;

By ID;

if not missing (alteration) then alteration=1;

if not missing(mutation) then mutation=1;

end;

do until (last.ID);

set MCC1;

by ID;

if alteration and mutation then output;

end;

Run;

 

 

Here is my LOG: 

 

 1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;

 61        

 62         Data MCC1;

 63         Set CombinedData (keep= Mutation ID Alteration protocol);

 64         If protocol= "No" then delete;

 65         if missing (Mutation) and missing(alteration) then delete;

 66         run;

 

 NOTE: There were 38390 observations read from the data set WORK.COMBINEDDATA.

 NOTE: The data set WORK.MCC1 has 30871 observations and 4 variables.

 NOTE: DATA statement used (Total process time):

       real time           0.04 seconds

       cpu time            0.03 seconds

      

 

 67        

 68         Data Final (drop=alteration Mutation);

 69         do until (last.ID);

 70         set MCC1;

 71         By ID;

 72         if not missing (alteration) then alteration='1';

 73         if not missing(Mutation) then Mutation='1';

 74         end;

 75         do until (last.ID);

 76         set MCC1;

 77         by ID;

 78         if alteration and Mutation then output;

 79         end;

 80         Run;

 

 NOTE: Character values have been converted to numeric values at the places given by: (Line)Smiley SadColumn).

       78:4    78:19  

 NOTE: Invalid numeric data, Alteration='NA' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='NA' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='NA' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='NA' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='NA' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='LRP1B' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='TSC1' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='FANCA' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='MLL3' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='SNCAIP' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='MLL' , at line 78 column 4.

 last.ID=1 ID=03697703 Mutation=  Protocol=predict Alteration=MLL FIRST.ID=0 _ERROR_=1 _N_=1

 NOTE: Invalid numeric data, Alteration='NTRK1' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='TP53' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='CUL3' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='TP53' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='ARID1A' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='MAP3K1' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='MLL' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='MYST3' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='NTRK1' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='TET2' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='CDKN2B' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='EPHB1' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='PIK3CB' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='TERC' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='PRKCI' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='KRAS' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='BCL2L2' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='NFKBIA' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='NKX2-1' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='PIK3C2B' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='MDM4' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='IKBKE' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='PIK3CA' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='SOX2' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='EGFR' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='MYC' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='ZNF703' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='FGFR1' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='CDKN2A/B' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='SMAD4' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='SDHC' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='DDR2' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='GSK3B' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='ATR' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='KLHL6' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='BCL6' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='GPR124' , at line 78 column 4.

 last.ID=1 ID=03907441 Mutation=  Protocol=PREDICT Alteration=GPR124 FIRST.ID=0 _ERROR_=1 _N_=2

 NOTE: Invalid numeric data, Alteration='PTPN11' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='NF1' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='NF1' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='TSC1' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='CDKN2C' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='DOT1L' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='ESR1' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='IL7R' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='MSH6' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='PIK3CG' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='SPTA1' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='STK11' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='PDGFRA' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='KIT' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='CDKN2A/B' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='TET2' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='FBXW7' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='IRF2' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='FAT1' , at line 78 column 4.

 last.ID=1 ID=03921269 Mutation=  Protocol=PREDICT Alteration=FAT1 FIRST.ID=0 _ERROR_=1 _N_=3

 NOTE: Invalid numeric data, Alteration='TP53' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='ARID1B' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='ATM' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='BRCA2' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='BRCA2' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='FOXP1' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='FRS2' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='HSP90AA1' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='MEN1' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='MLL2' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='RB1' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='TP53' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='WISP3' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='ZNF217' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='HSP90AA1' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='KMT2C' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='GATA6' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='MCL1' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='ZNF217' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='MYCN' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='FH' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='FUBP1' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='RANBP2' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='INPP4B' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='APC' , at line 78 column 4.

 last.ID=1 ID=04090148 Mutation=  Protocol=PREDICT Alteration=APC FIRST.ID=0 _ERROR_=1 _N_=4

 NOTE: Invalid numeric data, Alteration='BRAF' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='PREX2' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='EP300' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='SETD2' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='BCORL1' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='CREBBP' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='MLL' , at line 78 column 4.

 NOTE: Invalid numeric data, Alteration='MYST3' , at line 78 column 4.

 NOTE: Over 100 NOTES, additional NOTES suppressed.

 NOTE: Invalid numeric data, Alteration='SETD2' , at line 78 column 4.

 last.ID=1 ID=04463 Mutation=  Protocol=predict Alteration=SETD2 FIRST.ID=0 _ERROR_=1 _N_=5

 last.ID=1 ID=04700 Mutation=  Protocol=predict Alteration=SNCAIP FIRST.ID=0 _ERROR_=1 _N_=6

 last.ID=1 ID=04390 Mutation=  Protocol=PREDICT Alteration=SPTA1 FIRST.ID=0 _ERROR_=1 _N_=7

 last.ID=1 ID=04482 Mutation=  Protocol=PREDICT Alteration=TAF1 FIRST.ID=0 _ERROR_=1 _N_=10

 last.ID=1 ID=04100 Mutation=  Protocol=PREDICT Alteration=PAK3 FIRST.ID=0 _ERROR_=1 _N_=11

 last.ID=1 ID=56035 Mutation=  Protocol=PREDICT Alteration=NA FIRST.ID=0 _ERROR_=1 _N_=12

 last.ID=1 ID=04757 Mutation=  Protocol=PREDICT Alteration=KLHL6 FIRST.ID=0 _ERROR_=1 _N_=13

 last.ID=1 ID=04962 Mutation=  Protocol=predict Alteration=RUNX1T1 FIRST.ID=0 _ERROR_=1 _N_=15

 last.ID=1 ID=04984 Mutation=  Protocol=predict Alteration=PIK3R2 FIRST.ID=0 _ERROR_=1 _N_=16

 last.ID=1 ID=05027 Mutation=  Protocol=predict Alteration=FANCL FIRST.ID=0 _ERROR_=1 _N_=17

 last.ID=1 ID=05502 Mutation=  Protocol=PREDICT Alteration=SPTA1 FIRST.ID=0 _ERROR_=1 _N_=20

 last.ID=1 ID=04312 Mutation=  Protocol=predict Alteration=KMT2A FIRST.ID=0 _ERROR_=1 _N_=22

 last.ID=1 ID=05476 Mutation=  Protocol=  Alteration=RARA FIRST.ID=0 _ERROR_=1 _N_=24

 last.ID=1 ID=00902 Mutation=  Protocol=predict Alteration=NA FIRST.ID=0 _ERROR_=1 _N_=25

 last.ID=1 ID=05856 Mutation=  Protocol=PREDICT Alteration=VHL FIRST.ID=0 _ERROR_=1 _N_=27

 WARNING: Limit set by ERRORS= option reached.  Further errors of this type will not be printed.

 last.ID=1 ID=05474 Mutation=  Protocol=predict Alteration=SETD2 FIRST.ID=0 _ERROR_=1 _N_=28

 NOTE: There were 30871 observations read from the data set WORK.MCC1.

 NOTE: There were 30871 observations read from the data set WORK.MCC1.

 NOTE: The data set WORK.FINAL has 0 observations and 2 variables.

 NOTE: DATA statement used (Total process time):

       real time           0.03 seconds

       cpu time            0.02 seconds

      

 

 81        

 82        

 83        

 84         OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;

 97        

 

Contributor
Posts: 65

Re: Getting rid of blanks

Posted in reply to kmardinian

I'm definitely a SAS beginner, so I apologize for the organization of my code. 

 

I did try to change the alteration=1 to alteration='1' in the do loop, since I thought it was a numeric vs. character issue. But that didn't solve the problem

Super User
Posts: 8,213

Re: Getting rid of blanks

Posted in reply to kmardinian

You didn't follow my suggested code. I created two new variables, namely a and m, and then dropped them as part of the data statement.

 

You tried to replace existing variables which risks bringing up the types of problems for which you got errors. You never mentioned anything about wanting to replace existing variable values with something else.

 

I would simply use the code the way I suggested.

 

Art, CEO, AnalystFinder.com

 

Contributor
Posts: 65

Re: Getting rid of blanks

Yep, that was it. I'm sorry, for some reason I saw those as abbreviations! That worked though, thank you very much Art!

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 15 replies
  • 325 views
  • 0 likes
  • 4 in conversation