<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Missing data in SAS Data Management</title>
    <link>https://communities.sas.com/t5/SAS-Data-Management/Missing-data/m-p/224825#M5370</link>
    <description>&lt;P&gt;Ohhhhh that makes so much sense now. Is there a way to get rid of that problem? Maybe if I used a MERGE statement rather than a SET statement?&lt;/P&gt;</description>
    <pubDate>Wed, 09 Sep 2015 19:43:22 GMT</pubDate>
    <dc:creator>mmraja</dc:creator>
    <dc:date>2015-09-09T19:43:22Z</dc:date>
    <item>
      <title>Missing data</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Missing-data/m-p/224794#M5364</link>
      <description>&lt;P&gt;Hi there,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm having some trouble when I'm trying to make some new datasets.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have survey data from 2 separate cohorts which are in two separate data sets. I've pooled them into one data set (called pooled) and pooled the variables of interest into new pooled variables.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The problem that I'm having is that the same variable from the cohort1 and pooled datasets have different number of missing values.&lt;/P&gt;&lt;P&gt;Example: there's a variable in cohort1 dataset called netcst1 with ~2000 missing values (out of ~600,000 so not really a large percent). The same variable that has been put into the pooled dataset has 500,000+ missing values, and I can't figure out why that is.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Syntax-wise, this is what I've done:&lt;/P&gt;&lt;P&gt;DATA pooled;&lt;BR /&gt;SET cohort1&amp;nbsp;cohort2;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;....&lt;/P&gt;&lt;P&gt;/*I've created the new pooled variables I need here*/&lt;/P&gt;&lt;P&gt;...&lt;/P&gt;&lt;P&gt;RUN;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In cohort1 (and therefore in pooled), there is a variable called netcst1 (which I've verified through a PROC CONTENTS), so these two statements give me two different answers:&lt;/P&gt;&lt;P&gt;(I) &amp;nbsp;proc means data = cohort1 nmiss;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; var netcst1;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;run;&lt;/P&gt;&lt;P&gt;(II) proc means data = pooled nmiss;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; var netcst1;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Shouldn't I be getting the same number of missing values for both? What have I done wrong for (I) to give me about 2000 missing values and for (II) to give me more than 500,000 missing values?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for your help&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 09 Sep 2015 17:29:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Missing-data/m-p/224794#M5364</guid>
      <dc:creator>mmraja</dc:creator>
      <dc:date>2015-09-09T17:29:46Z</dc:date>
    </item>
    <item>
      <title>Re: Missing data</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Missing-data/m-p/224795#M5365</link>
      <description>Is netcst1 on cohort2?  if not how is netcst1 calculated for observations coming from cohort2?</description>
      <pubDate>Wed, 09 Sep 2015 17:44:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Missing-data/m-p/224795#M5365</guid>
      <dc:creator>ChrisWard</dc:creator>
      <dc:date>2015-09-09T17:44:41Z</dc:date>
    </item>
    <item>
      <title>Re: Missing data</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Missing-data/m-p/224808#M5366</link>
      <description>&lt;P&gt;So the variable netcst1 exists in cohort1 only and then a different variable, call it netcst1a, exists only in cohort2. In the pooled data set, the variable poolednetcst brings them together.&lt;/P&gt;</description>
      <pubDate>Wed, 09 Sep 2015 18:22:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Missing-data/m-p/224808#M5366</guid>
      <dc:creator>mmraja</dc:creator>
      <dc:date>2015-09-09T18:22:20Z</dc:date>
    </item>
    <item>
      <title>Re: Missing data</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Missing-data/m-p/224810#M5367</link>
      <description>&lt;P&gt;How many missing from cohort2?&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc means data = cohort2 nmiss;
    var netcst1a;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 09 Sep 2015 18:49:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Missing-data/m-p/224810#M5367</guid>
      <dc:creator>ChrisWard</dc:creator>
      <dc:date>2015-09-09T18:49:28Z</dc:date>
    </item>
    <item>
      <title>Re: Missing data</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Missing-data/m-p/224815#M5368</link>
      <description>I believe 2400 or so values are missing from cohort2, which is completely reasonable considering the size of the dataset.</description>
      <pubDate>Wed, 09 Sep 2015 19:15:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Missing-data/m-p/224815#M5368</guid>
      <dc:creator>mmraja</dc:creator>
      <dc:date>2015-09-09T19:15:19Z</dc:date>
    </item>
    <item>
      <title>Re: Missing data</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Missing-data/m-p/224822#M5369</link>
      <description>&lt;P&gt;Since it is a vertical join, the pro means is going to give result of total no of observation from both data set.&lt;/P&gt;&lt;P&gt;example:&lt;/P&gt;&lt;P&gt;data a;&lt;BR /&gt;input num VarA ;&lt;BR /&gt;datalines;&lt;BR /&gt;1 7&lt;BR /&gt;2 8&lt;BR /&gt;3 .&lt;BR /&gt;;&lt;BR /&gt;proc print data=a;&lt;BR /&gt;run;&lt;BR /&gt;data b;&lt;BR /&gt;input num VarB ;&lt;BR /&gt;datalines;&lt;BR /&gt;4 6&lt;BR /&gt;5 .&lt;BR /&gt;6 3&lt;BR /&gt;;&lt;BR /&gt;proc print data=b;&lt;BR /&gt;run;&lt;BR /&gt;proc means data = a n nmiss;&lt;BR /&gt;&amp;nbsp; var VarA;&lt;BR /&gt;run;&lt;BR /&gt;proc means data = b n nmiss;&lt;BR /&gt;&amp;nbsp; var VarB;&lt;BR /&gt;run;&lt;BR /&gt;data joined;&lt;BR /&gt;&amp;nbsp;set a b;&lt;BR /&gt;&amp;nbsp;run;&lt;BR /&gt;&amp;nbsp;proc means data = joined n nmiss;&lt;BR /&gt;&amp;nbsp; var _numeric_;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;the result:&lt;/P&gt;&lt;P&gt;Obs num VarA 1 2 3&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;7&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;8&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;TD&gt;.&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;HR /&gt;&lt;P&gt;Obs num VarB 1 2 3&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;4&lt;/TD&gt;&lt;TD&gt;6&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;5&lt;/TD&gt;&lt;TD&gt;.&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;6&lt;/TD&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;HR /&gt;&lt;DIV class="proc_title_group"&gt;&lt;P class="c proctitle"&gt;The MEANS Procedure&lt;/P&gt;&lt;/DIV&gt;&lt;P&gt;Analysis Variable : VarA N N Miss&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;HR /&gt;&lt;DIV class="proc_title_group"&gt;&lt;P class="c proctitle"&gt;The MEANS Procedure&lt;/P&gt;&lt;/DIV&gt;&lt;P&gt;Analysis Variable : VarB N N Miss&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;HR /&gt;&lt;DIV class="proc_title_group"&gt;&lt;P class="c proctitle"&gt;The MEANS Procedure&lt;/P&gt;&lt;/DIV&gt;&lt;P&gt;Variable N N Miss&lt;/P&gt;&lt;DIV class="stacked-cell"&gt;&lt;DIV&gt;num&lt;/DIV&gt;&lt;DIV&gt;VarA&lt;/DIV&gt;&lt;DIV&gt;VarB&lt;/DIV&gt;&lt;/DIV&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;DIV class="stacked-cell"&gt;&lt;DIV&gt;6&lt;/DIV&gt;&lt;DIV&gt;2&lt;/DIV&gt;&lt;DIV&gt;2&lt;/DIV&gt;&lt;/DIV&gt;&lt;/TD&gt;&lt;TD&gt;&lt;DIV class="stacked-cell"&gt;&lt;DIV&gt;0&lt;/DIV&gt;&lt;DIV&gt;4&lt;/DIV&gt;&lt;DIV&gt;4&lt;/DIV&gt;&lt;/DIV&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;</description>
      <pubDate>Wed, 09 Sep 2015 19:27:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Missing-data/m-p/224822#M5369</guid>
      <dc:creator>stevyfargose</dc:creator>
      <dc:date>2015-09-09T19:27:50Z</dc:date>
    </item>
    <item>
      <title>Re: Missing data</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Missing-data/m-p/224825#M5370</link>
      <description>&lt;P&gt;Ohhhhh that makes so much sense now. Is there a way to get rid of that problem? Maybe if I used a MERGE statement rather than a SET statement?&lt;/P&gt;</description>
      <pubDate>Wed, 09 Sep 2015 19:43:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Missing-data/m-p/224825#M5370</guid>
      <dc:creator>mmraja</dc:creator>
      <dc:date>2015-09-09T19:43:22Z</dc:date>
    </item>
    <item>
      <title>Re: Missing data</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Missing-data/m-p/224830#M5371</link>
      <description>&lt;P&gt;You can use Merge&amp;nbsp;statement&amp;nbsp;if you have common variable&amp;nbsp;in both data set,&amp;nbsp;but than you will have to sort your data first.&lt;/P&gt;</description>
      <pubDate>Wed, 09 Sep 2015 20:00:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Missing-data/m-p/224830#M5371</guid>
      <dc:creator>stevyfargose</dc:creator>
      <dc:date>2015-09-09T20:00:35Z</dc:date>
    </item>
    <item>
      <title>Re: Missing data</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Missing-data/m-p/225264#M5403</link>
      <description>&lt;P&gt;You need to add a CLASS statement to your PROC MEANS so the cohorts are treated as separate groups.&lt;/P&gt;&lt;P&gt;If you do not already have a variable that defines the cohort you could create one during the step that combines the data.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data pooled ;
   length cohort indsname $50 ;
   set cohort1 cohort2 indsname=indsname ;
   cohort=indsname ;
   ...
run;

proc means ;
   class cohort ;
...&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Sat, 12 Sep 2015 11:56:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Missing-data/m-p/225264#M5403</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2015-09-12T11:56:26Z</dc:date>
    </item>
  </channel>
</rss>

