BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
shami
Obsidian | Level 7

I've got two datasets that are similar. I've run frequency counts for the same variable, however I noticed that one dataset has two less observations. What's the best way/approach to identify what those two observations that are missing from the other data set are?

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

The in= option lets you capture which source table contributes data to the target table.

data raw3;
  merge raw1(in=in1) raw2(in=in2);
  by patient_local_id;
  if in1=0 or in2=0 then
    do;
      in_raw1=in1;
      in_raw2=in2;
      output;
    end;
run;

View solution in original post

5 REPLIES 5
PaigeMiller
Diamond | Level 26

If there is a common ID variable (such as customer ID or patient ID or similar), you can merge the two data sets and see which observations don't line up.

 

Example:

 

proc sql;
    create table combined as select
        coalesce(a.id,b.id) as id
        ,a.variable as variable_in_a
        ,b.variable as variable_in_b
    from dataset_a as a full join dataset_b as b
    on a.id=b.id;
quit;

 

Then it is easy to search this data set (or use a WHERE clause in the above SQL) for observations that are missing values in either dataset_a or dataset_b.

 

You would be wise to provide sample data along with your explanation; we really can't write code for your situation without sample data. Sample data should be provided as working SAS data step code (examples and instructions).

--
Paige Miller
shami
Obsidian | Level 7

I merged the dataset with the following code by the patient ID:

 

data raw3;
merge raw1 raw2;
by patient_local_id;

run;

 

I'm not sure how I'd code after this to get the missing observations...

PaigeMiller
Diamond | Level 26

You can just look at data set RAW3 in a viewer (such as viewtable) and scroll up and down to find missing observations. Or you can do a PROC PRINT with a WHERE statement to print only observations that are missing some values.

 

Also

 


@shami wrote:

 

data raw3;
merge raw1 raw2;
by patient_local_id;

run;


Unless the variable names are different in RAW1 and RAW2, or you force them to be different, this won't work.

--
Paige Miller
Patrick
Opal | Level 21

The in= option lets you capture which source table contributes data to the target table.

data raw3;
  merge raw1(in=in1) raw2(in=in2);
  by patient_local_id;
  if in1=0 or in2=0 then
    do;
      in_raw1=in1;
      in_raw2=in2;
      output;
    end;
run;
shami
Obsidian | Level 7
Thank you this is helpful