Desktop productivity for business analysts and programmers

missing variable.. why?

Reply
Occasional Contributor
Posts: 7

missing variable.. why?

Hello,

 

Here is my problem.  In my ctclaims (6 variables) all of my data columns are perfectly populated.  However, once I run my second proc sql code all of the 6 variables remain populated EXCEPT once called eventdate.  All of those are now missing.  What am I doing wrong?

 

Thanks!!

 

 

 

proc sql;

 

create table mart_tmp.ctclaims as

select a.*, b.*

from mart_tmp.controlclaims (keep=patid eventdate medcode) a, mart_tmp.elig_denomC(keep=patid elig_start elig_end yob) b

where a.patid=b.patid

order by patid;

quit;

 

 

proc sql;

 

create table mart_tmp.allpregC as

select *

from mart_tmp.ctclaims

where medcode in (select medcode from mart.codelistP);

*** extract outcome events during the study period;

create table mart_tmp.pregoutcomeC as

select *

from mart_tmp.ctclaims

where medcode in (select medcode from mart.codelistO);

*** extract exclusion events during the study period;

create table mart_tmp.exclusionsC as

select *

from mart_tmp.ctclaims

where medcode in (select medcode from mart.codelistE);

quit;

 

Super User
Posts: 13,000

Re: missing variable.. why?

As you have 3 output tables, which one has missing values for the variable?

And do you have any notes or errors in the log?

 

 

Occasional Contributor
Posts: 7

Re: missing variable.. why?

All three of the tables created from the 2nd proc sql have all values of eventdate as . 

 

I took a look at the log.  There are no errors and all the tables have a similar statement beneath:

NOTE: Table MART_TMP.PREGOUTCOMEC created, with 6361 rows and 6 columns.

Super User
Posts: 13,000

Re: missing variable.. why?

[ Edited ]

Dsquared wrote:

All three of the tables created from the 2nd proc sql have all values of eventdate as . 

 

I took a look at the log.  There are no errors and all the tables have a similar statement beneath:

NOTE: Table MART_TMP.PREGOUTCOMEC created, with 6361 rows and 6 columns.


Time to provide some small example data sets of mart_tmp.controlclaims and mart.codelistP, 4 or 5 records for each, that will duplicate the behavior. You only need to provide enough variables to satisfy the code. Provide them as a data step so we can recreate the problem with the data. The data step could look like:

data work.junk1;
   input a b;
datalines;
1 1
2 2
3 3
3 1
;
run;

I say this because with dummy sets I build I can't recreate the problem making me suspect this is somehow data related.

 

 

Make sure you get the same behavior with your example data sets.

Super User
Posts: 9,550

Re: missing variable.. why?

This:

create table mart_tmp.ctclaims as
select a.*, b.*
............
where a.patid=b.patid

is a sign of sloppy programming. You tolerate the warning for the already existing patid, and that will mask other WARNINGs for variables present in both input tables.

Instead of asterisks, always write a comprehensive list of variables to be included.

 

Never accept code to run with WARNINGs or extraneous NOTEs, let alone ERRORs.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Occasional Contributor
Posts: 7

Re: missing variable.. why?

Posted in reply to KurtBremser

Thanks.  I re-ran adjusting the a*, b* but am getting the same result.  Smiley Sad

Super User
Posts: 9,550

Re: missing variable.. why?

Then we will need example data that illustrates the issue against which we can test-run code.

Please post example data as data step code.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Occasional Contributor
Posts: 7

Re: missing variable.. why?

Posted in reply to KurtBremser

Here they are.  It is longitudinal data. 

 CtClaims

data work.ctclaims;
   input patid eventdate medcode yob elig_start elig_end;
datalines;
1111 3/24/99  12      1930   9/11/99   10/31/99

1111 1/20/01  34       1930   9/11/99   10/31/99

1001 5/20/06  57      1980   5/11/09   5/3/09
;
run;

 

CodelistP

data work.codelistP;
   input   disease  medcode;
datalines;
pregnancy     34

pregnancy   127

pregnancy   294

;
run;

 

Thanks so much!!

Super User
Posts: 13,000

Re: missing variable.. why?


Dsquared wrote:

Here they are.  It is longitudinal data. 

 CtClaims

data work.ctclaims;
   input patid eventdate medcode yob elig_start elig_end;
datalines;
1111 3/24/99  12      1930   9/11/99   10/31/99

1111 1/20/01  34       1930   9/11/99   10/31/99

1001 5/20/06  57      1980   5/11/09   5/3/09
;
run;

 

CodelistP

data work.codelistP;
   input   disease  medcode;
datalines;
pregnancy     34

pregnancy   127

pregnancy   294

;
run;

 

Thanks so much!!


I guess you did this before I corrected my post. We need an example of the controlclaims so that we have the INPUT data to the proc SQL to see if we get the same output.

 

Note that your ctlaims data step does not run correctly because you haven't specified an informat that will read dates correctly for numeric variables.

Super User
Posts: 9,550

Re: missing variable.. why?


Dsquared wrote:

Here they are.  It is longitudinal data. 

 CtClaims

data work.ctclaims;
   input patid eventdate medcode yob elig_start elig_end;
datalines;
1111 3/24/99  12      1930   9/11/99   10/31/99

1111 1/20/01  34       1930   9/11/99   10/31/99

1001 5/20/06  57      1980   5/11/09   5/3/09
;
run;

 

CodelistP

data work.codelistP;
   input   disease  medcode;
datalines;
pregnancy     34

pregnancy   127

pregnancy   294

;
run;

 

Thanks so much!!


Well, it's the data:

data work.ctclaims;
input patid eventdate medcode yob elig_start elig_end;
datalines;
1111 3/24/99  12      1930   9/11/99   10/31/99
1111 1/20/01  34       1930   9/11/99   10/31/99
1001 5/20/06  57      1980   5/11/09   5/3/09
;
run;

proc print data=work.ctclaims noobs;
run;

Result:

                                         elig_
patid    eventdate    medcode     yob    start    elig_end

 1111        .           12      1930      .          .   
 1111        .           34      1930      .          .   
 1001        .           57      1980      .          .   

If this is just a problem with your datastep/datalines code, use the macro from https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... to convert your dataset to a datastep as it really is.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Super User
Super User
Posts: 7,844

Re: missing variable.. why?

[ Edited ]

Nothing in that code is changing the value of EVENTDATE.

If you have missing values then they were in the data already.

Your code:

create table mart_tmp.ctclaims as
select a.*, b.*
from mart_tmp.controlclaims (keep=patid eventdate medcode) a
, mart_tmp.elig_denomC(keep=patid elig_start elig_end yob) b
where a.patid=b.patid
order by patid
;

It is reading EVENTDATE from the dataset MART_TMP.CONTROLCLAIMS.  Since you are essentially doing an inner join it should not be adding any extra observations that only come from the other dataset.

 

So look in MART_TMP.CONTROLCLAIMS for missing values.

 

 

Ask a Question
Discussion stats
  • 10 replies
  • 276 views
  • 2 likes
  • 4 in conversation