BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
shami
Obsidian | Level 7

I've got two datasets that are similar. I've run frequency counts for the same variable, however I noticed that one dataset has two less observations. What's the best way/approach to identify what those two observations that are missing from the other data set are?

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

The in= option lets you capture which source table contributes data to the target table.

data raw3;
  merge raw1(in=in1) raw2(in=in2);
  by patient_local_id;
  if in1=0 or in2=0 then
    do;
      in_raw1=in1;
      in_raw2=in2;
      output;
    end;
run;

View solution in original post

5 REPLIES 5
PaigeMiller
Diamond | Level 26

If there is a common ID variable (such as customer ID or patient ID or similar), you can merge the two data sets and see which observations don't line up.

 

Example:

 

proc sql;
    create table combined as select
        coalesce(a.id,b.id) as id
        ,a.variable as variable_in_a
        ,b.variable as variable_in_b
    from dataset_a as a full join dataset_b as b
    on a.id=b.id;
quit;

 

Then it is easy to search this data set (or use a WHERE clause in the above SQL) for observations that are missing values in either dataset_a or dataset_b.

 

You would be wise to provide sample data along with your explanation; we really can't write code for your situation without sample data. Sample data should be provided as working SAS data step code (examples and instructions).

--
Paige Miller
shami
Obsidian | Level 7

I merged the dataset with the following code by the patient ID:

 

data raw3;
merge raw1 raw2;
by patient_local_id;

run;

 

I'm not sure how I'd code after this to get the missing observations...

PaigeMiller
Diamond | Level 26

You can just look at data set RAW3 in a viewer (such as viewtable) and scroll up and down to find missing observations. Or you can do a PROC PRINT with a WHERE statement to print only observations that are missing some values.

 

Also

 


@shami wrote:

 

data raw3;
merge raw1 raw2;
by patient_local_id;

run;


Unless the variable names are different in RAW1 and RAW2, or you force them to be different, this won't work.

--
Paige Miller
Patrick
Opal | Level 21

The in= option lets you capture which source table contributes data to the target table.

data raw3;
  merge raw1(in=in1) raw2(in=in2);
  by patient_local_id;
  if in1=0 or in2=0 then
    do;
      in_raw1=in1;
      in_raw2=in2;
      output;
    end;
run;
shami
Obsidian | Level 7
Thank you this is helpful

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 792 views
  • 2 likes
  • 3 in conversation