BookmarkSubscribeRSS Feed
Dsquared
Fluorite | Level 6

Hello,

 

Here is my problem.  In my ctclaims (6 variables) all of my data columns are perfectly populated.  However, once I run my second proc sql code all of the 6 variables remain populated EXCEPT once called eventdate.  All of those are now missing.  What am I doing wrong?

 

Thanks!!

 

 

 

proc sql;

 

create table mart_tmp.ctclaims as

select a.*, b.*

from mart_tmp.controlclaims (keep=patid eventdate medcode) a, mart_tmp.elig_denomC(keep=patid elig_start elig_end yob) b

where a.patid=b.patid

order by patid;

quit;

 

 

proc sql;

 

create table mart_tmp.allpregC as

select *

from mart_tmp.ctclaims

where medcode in (select medcode from mart.codelistP);

*** extract outcome events during the study period;

create table mart_tmp.pregoutcomeC as

select *

from mart_tmp.ctclaims

where medcode in (select medcode from mart.codelistO);

*** extract exclusion events during the study period;

create table mart_tmp.exclusionsC as

select *

from mart_tmp.ctclaims

where medcode in (select medcode from mart.codelistE);

quit;

 

10 REPLIES 10
ballardw
Super User

As you have 3 output tables, which one has missing values for the variable?

And do you have any notes or errors in the log?

 

 

Dsquared
Fluorite | Level 6

All three of the tables created from the 2nd proc sql have all values of eventdate as . 

 

I took a look at the log.  There are no errors and all the tables have a similar statement beneath:

NOTE: Table MART_TMP.PREGOUTCOMEC created, with 6361 rows and 6 columns.

ballardw
Super User

@Dsquared wrote:

All three of the tables created from the 2nd proc sql have all values of eventdate as . 

 

I took a look at the log.  There are no errors and all the tables have a similar statement beneath:

NOTE: Table MART_TMP.PREGOUTCOMEC created, with 6361 rows and 6 columns.


Time to provide some small example data sets of mart_tmp.controlclaims and mart.codelistP, 4 or 5 records for each, that will duplicate the behavior. You only need to provide enough variables to satisfy the code. Provide them as a data step so we can recreate the problem with the data. The data step could look like:

data work.junk1;
   input a b;
datalines;
1 1
2 2
3 3
3 1
;
run;

I say this because with dummy sets I build I can't recreate the problem making me suspect this is somehow data related.

 

 

Make sure you get the same behavior with your example data sets.

Kurt_Bremser
Super User

This:

create table mart_tmp.ctclaims as
select a.*, b.*
............
where a.patid=b.patid

is a sign of sloppy programming. You tolerate the warning for the already existing patid, and that will mask other WARNINGs for variables present in both input tables.

Instead of asterisks, always write a comprehensive list of variables to be included.

 

Never accept code to run with WARNINGs or extraneous NOTEs, let alone ERRORs.

Dsquared
Fluorite | Level 6

Thanks.  I re-ran adjusting the a*, b* but am getting the same result.  😞

Kurt_Bremser
Super User

Then we will need example data that illustrates the issue against which we can test-run code.

Please post example data as data step code.

Dsquared
Fluorite | Level 6

Here they are.  It is longitudinal data. 

 CtClaims

data work.ctclaims;
   input patid eventdate medcode yob elig_start elig_end;
datalines;
1111 3/24/99  12      1930   9/11/99   10/31/99

1111 1/20/01  34       1930   9/11/99   10/31/99

1001 5/20/06  57      1980   5/11/09   5/3/09
;
run;

 

CodelistP

data work.codelistP;
   input   disease  medcode;
datalines;
pregnancy     34

pregnancy   127

pregnancy   294

;
run;

 

Thanks so much!!

ballardw
Super User

@Dsquared wrote:

Here they are.  It is longitudinal data. 

 CtClaims

data work.ctclaims;
   input patid eventdate medcode yob elig_start elig_end;
datalines;
1111 3/24/99  12      1930   9/11/99   10/31/99

1111 1/20/01  34       1930   9/11/99   10/31/99

1001 5/20/06  57      1980   5/11/09   5/3/09
;
run;

 

CodelistP

data work.codelistP;
   input   disease  medcode;
datalines;
pregnancy     34

pregnancy   127

pregnancy   294

;
run;

 

Thanks so much!!


I guess you did this before I corrected my post. We need an example of the controlclaims so that we have the INPUT data to the proc SQL to see if we get the same output.

 

Note that your ctlaims data step does not run correctly because you haven't specified an informat that will read dates correctly for numeric variables.

Kurt_Bremser
Super User

@Dsquared wrote:

Here they are.  It is longitudinal data. 

 CtClaims

data work.ctclaims;
   input patid eventdate medcode yob elig_start elig_end;
datalines;
1111 3/24/99  12      1930   9/11/99   10/31/99

1111 1/20/01  34       1930   9/11/99   10/31/99

1001 5/20/06  57      1980   5/11/09   5/3/09
;
run;

 

CodelistP

data work.codelistP;
   input   disease  medcode;
datalines;
pregnancy     34

pregnancy   127

pregnancy   294

;
run;

 

Thanks so much!!


Well, it's the data:

data work.ctclaims;
input patid eventdate medcode yob elig_start elig_end;
datalines;
1111 3/24/99  12      1930   9/11/99   10/31/99
1111 1/20/01  34       1930   9/11/99   10/31/99
1001 5/20/06  57      1980   5/11/09   5/3/09
;
run;

proc print data=work.ctclaims noobs;
run;

Result:

                                         elig_
patid    eventdate    medcode     yob    start    elig_end

 1111        .           12      1930      .          .   
 1111        .           34      1930      .          .   
 1001        .           57      1980      .          .   

If this is just a problem with your datastep/datalines code, use the macro from https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... to convert your dataset to a datastep as it really is.

Tom
Super User Tom
Super User

Nothing in that code is changing the value of EVENTDATE.

If you have missing values then they were in the data already.

Your code:

create table mart_tmp.ctclaims as
select a.*, b.*
from mart_tmp.controlclaims (keep=patid eventdate medcode) a
, mart_tmp.elig_denomC(keep=patid elig_start elig_end yob) b
where a.patid=b.patid
order by patid
;

It is reading EVENTDATE from the dataset MART_TMP.CONTROLCLAIMS.  Since you are essentially doing an inner join it should not be adding any extra observations that only come from the other dataset.

 

So look in MART_TMP.CONTROLCLAIMS for missing values.

 

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Creating Custom Steps in SAS Studio

Check out this tutorial series to learn how to build your own steps in SAS Studio.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 2904 views
  • 2 likes
  • 4 in conversation