<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Combining multiple datasets into one but choosing values based on a condition in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Combining-multiple-datasets-into-one-but-choosing-values-based/m-p/667871#M200087</link>
    <description>&lt;BR /&gt;"If you start with 3 observations for the same doctor, and there is conflicting data among various sources for those other 101 variables, what should be done?"&lt;BR /&gt;&lt;BR /&gt;These observations with conflicting data would write to another observation.&lt;BR /&gt;&lt;BR /&gt;So for example, if Dr. Smith sees patients at two different locations then I want both locations in my dataset as two observations.&lt;BR /&gt;&lt;BR /&gt;Example:&lt;BR /&gt;Dataset AC4 says:&lt;BR /&gt;Name | OfficeAddress | EyeColor | NPI&lt;BR /&gt;John Smith | 123 Main St. | Blue |1234567890&lt;BR /&gt;&lt;BR /&gt;Dataset CD4 says:&lt;BR /&gt;Name | OfficeAddress | EyeColor | NPI&lt;BR /&gt;John Smith | 789 Any St. | Brown |1234567890&lt;BR /&gt;&lt;BR /&gt;I want the final dataset to say:&lt;BR /&gt;Name | OfficeAddress | EyeColor | NPI&lt;BR /&gt;John Smith | 789 Any St. | Brown |1234567890&lt;BR /&gt;John Smith | 123 Main St. | Brown |1234567890&lt;BR /&gt;</description>
    <pubDate>Wed, 08 Jul 2020 20:48:15 GMT</pubDate>
    <dc:creator>InspectahDex</dc:creator>
    <dc:date>2020-07-08T20:48:15Z</dc:date>
    <item>
      <title>Combining multiple datasets into one but choosing values based on a condition</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Combining-multiple-datasets-into-one-but-choosing-values-based/m-p/667596#M199950</link>
      <description>&lt;P&gt;I have four datasets that I'm combining into one dataset.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;data FinalDataSet;&lt;BR /&gt;set AC4 EP4 IP4 CD4;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Each dataset has a common set of variables. They all have 130 variables that are named the same and are all formatted to character variables.&amp;nbsp; They each contain list of doctors from different health networks.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;However, some doctors are only on one dataset, and others are found in multiple datasets. For the doctors that are found in multiple, one dataset may information that might differ or be missing from another dataset. For example, Doctor Smith might have their name, office address, and eye color in dataset AC4 but the CD4 dataset, which also has the same information, is saying the eye color is "blue" when dataset AC4 lists eye color as "brown".&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Question: how do I go about concatenating these multiple datasets into one so that the data will always default to CD4 dataset when there is conflicting information or if the data is missing? Basically, I want my final dataset to say Dr. Smith has "brown" eyes regardless of what the other datasets have down for that doctor. But if the CD4 dataset doesn't have an observation for Dr. Smith, then I want to keep the observation in the other datasets.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The four datasets can be matched / linked by a variable called "NPI" which is a unique identifier for doctors.&lt;/P&gt;</description>
      <pubDate>Tue, 07 Jul 2020 23:02:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Combining-multiple-datasets-into-one-but-choosing-values-based/m-p/667596#M199950</guid>
      <dc:creator>InspectahDex</dc:creator>
      <dc:date>2020-07-07T23:02:50Z</dc:date>
    </item>
    <item>
      <title>Re: Combining multiple datasets into one but choosing values based on a condition</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Combining-multiple-datasets-into-one-but-choosing-values-based/m-p/667605#M199954</link>
      <description>&lt;P&gt;I understand that CD4 is a masterfile. Merge data by NPI.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;While merging a null variable will not replace a non-null and last non-null value will hold.&lt;/P&gt;
&lt;P&gt;all datasets should be sorted by NPI then:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data FinalDataSet;
merge AC4 EP4 IP4 CD4;
  by NPI;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 08 Jul 2020 00:54:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Combining-multiple-datasets-into-one-but-choosing-values-based/m-p/667605#M199954</guid>
      <dc:creator>Shmuel</dc:creator>
      <dc:date>2020-07-08T00:54:14Z</dc:date>
    </item>
    <item>
      <title>Re: Combining multiple datasets into one but choosing values based on a condition</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Combining-multiple-datasets-into-one-but-choosing-values-based/m-p/667616#M199962</link>
      <description>&lt;P&gt;&amp;gt; While merging a null variable will not replace a non-null and last non-null value will hold.&lt;/P&gt;
&lt;P&gt;Untrue. In this MERGE, the CD4 value will always overwrite other values if the record is present in CD4.&lt;/P&gt;
&lt;P&gt;UPDATE is the way to obtain the behaviour you describe.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here, MERGE is required as CD$ should always overwrite the other values, if available.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Jul 2020 01:39:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Combining-multiple-datasets-into-one-but-choosing-values-based/m-p/667616#M199962</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-07-08T01:39:17Z</dc:date>
    </item>
    <item>
      <title>Re: Combining multiple datasets into one but choosing values based on a condition</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Combining-multiple-datasets-into-one-but-choosing-values-based/m-p/667621#M199964</link>
      <description>&lt;P&gt;You are off to a good start, concatenating the data sets and naming CD4 last.&amp;nbsp; Continue with:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=FinalDataSet;
   by doctor;
run;

data TrulyFinalDataSet;
   update FinalDataSet (obs=0) FinalDataSet;
   by doctor;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;You didn't specify variable names, so I'm assuming DOCTOR is the name of the variable that identifies a doctor.&lt;/P&gt;</description>
      <pubDate>Wed, 08 Jul 2020 02:18:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Combining-multiple-datasets-into-one-but-choosing-values-based/m-p/667621#M199964</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2020-07-08T02:18:40Z</dc:date>
    </item>
    <item>
      <title>Re: Combining multiple datasets into one but choosing values based on a condition</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Combining-multiple-datasets-into-one-but-choosing-values-based/m-p/667622#M199965</link>
      <description>Sorry, should have seen ... use NPI instead of DOCTOR as the identifier.</description>
      <pubDate>Wed, 08 Jul 2020 02:24:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Combining-multiple-datasets-into-one-but-choosing-values-based/m-p/667622#M199965</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2020-07-08T02:24:42Z</dc:date>
    </item>
    <item>
      <title>Re: Combining multiple datasets into one but choosing values based on a condition</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Combining-multiple-datasets-into-one-but-choosing-values-based/m-p/667830#M200072</link>
      <description>&lt;P&gt;Ok I think this is the right answer but there is one other thing I need to take into account. Out of the 130 common variables, I only need to update 29 variables from the CD4 dataset. How do I tell SAS to only update those 29 variables?&lt;/P&gt;</description>
      <pubDate>Wed, 08 Jul 2020 19:05:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Combining-multiple-datasets-into-one-but-choosing-values-based/m-p/667830#M200072</guid>
      <dc:creator>InspectahDex</dc:creator>
      <dc:date>2020-07-08T19:05:26Z</dc:date>
    </item>
    <item>
      <title>Re: Combining multiple datasets into one but choosing values based on a condition</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Combining-multiple-datasets-into-one-but-choosing-values-based/m-p/667834#M200074</link>
      <description>&lt;P&gt;Keep in mind that the right answer requires defining what to do what the other 101 variables.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you start with 3 observations for the same doctor, and there is conflicting data among various sources for those other 101 variables, what should be done?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Remember, there will be only one observation per doctor in the final data set.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So spell out the rules, and the programming should be short but perhaps mildly more complex than what I posted the first time.&lt;/P&gt;</description>
      <pubDate>Wed, 08 Jul 2020 19:23:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Combining-multiple-datasets-into-one-but-choosing-values-based/m-p/667834#M200074</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2020-07-08T19:23:25Z</dc:date>
    </item>
    <item>
      <title>Re: Combining multiple datasets into one but choosing values based on a condition</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Combining-multiple-datasets-into-one-but-choosing-values-based/m-p/667864#M200084</link>
      <description>&lt;P&gt;This is how I would approach the problem identify the 29 variables that need to be kept track of&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;%let list_of_vars_to_track=&amp;lt;some varlist separated by spaces&amp;gt;;
%let list_of_vars_len_to_track=&amp;lt;some varlist separated by spaces&amp;gt;;

%macro  test;
data FinalDataSet(rename=(
%do i = 1 %to 29;
              ret_%scan(list_of_vars_to_track,&amp;amp;i.,%str( ))  =%scan(list_of_vars_to_track,&amp;amp;i.,%str( ))
            %end;
));
length %do i = 1 %to 29;
              ret_%scan(list_of_vars_to_track,&amp;amp;i.,%str( ))  %scan(list_of_vars_len_to_track,&amp;amp;i.,%str( ))
            %end;;
set AC4 EP4 IP4 CD4;
by doctor;
retain %do i = 1 %to 29;
              ret_%scan(list_of_vars_to_track,&amp;amp;i.,%str( ))  
            %end; ' ';
if first.doctor then do;
%do i = 1 %to 29;
              
              ret_%scan(list_of_vars_to_track,&amp;amp;i.,%str( ))  = '';
            %end; 
end;
%do i = 1 %to 29;
              if not missing(%scan(list_of_vars_to_track,&amp;amp;i.,%str( ))) then 
              ret_%scan(list_of_vars_to_track,&amp;amp;i.,%str( ))  =%scan(list_of_vars_to_track,&amp;amp;i.,%str( ));
            %end; 
if last.doctor then output;
drop %do i = 1 %to 29;
             %scan(list_of_vars_to_track,&amp;amp;i.,%str( ))  
            %end; ;
run;
%mend test;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 08 Jul 2020 20:50:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Combining-multiple-datasets-into-one-but-choosing-values-based/m-p/667864#M200084</guid>
      <dc:creator>smantha</dc:creator>
      <dc:date>2020-07-08T20:50:40Z</dc:date>
    </item>
    <item>
      <title>Re: Combining multiple datasets into one but choosing values based on a condition</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Combining-multiple-datasets-into-one-but-choosing-values-based/m-p/667871#M200087</link>
      <description>&lt;BR /&gt;"If you start with 3 observations for the same doctor, and there is conflicting data among various sources for those other 101 variables, what should be done?"&lt;BR /&gt;&lt;BR /&gt;These observations with conflicting data would write to another observation.&lt;BR /&gt;&lt;BR /&gt;So for example, if Dr. Smith sees patients at two different locations then I want both locations in my dataset as two observations.&lt;BR /&gt;&lt;BR /&gt;Example:&lt;BR /&gt;Dataset AC4 says:&lt;BR /&gt;Name | OfficeAddress | EyeColor | NPI&lt;BR /&gt;John Smith | 123 Main St. | Blue |1234567890&lt;BR /&gt;&lt;BR /&gt;Dataset CD4 says:&lt;BR /&gt;Name | OfficeAddress | EyeColor | NPI&lt;BR /&gt;John Smith | 789 Any St. | Brown |1234567890&lt;BR /&gt;&lt;BR /&gt;I want the final dataset to say:&lt;BR /&gt;Name | OfficeAddress | EyeColor | NPI&lt;BR /&gt;John Smith | 789 Any St. | Brown |1234567890&lt;BR /&gt;John Smith | 123 Main St. | Brown |1234567890&lt;BR /&gt;</description>
      <pubDate>Wed, 08 Jul 2020 20:48:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Combining-multiple-datasets-into-one-but-choosing-values-based/m-p/667871#M200087</guid>
      <dc:creator>InspectahDex</dc:creator>
      <dc:date>2020-07-08T20:48:15Z</dc:date>
    </item>
    <item>
      <title>Re: Combining multiple datasets into one but choosing values based on a condition</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Combining-multiple-datasets-into-one-but-choosing-values-based/m-p/667880#M200092</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/121262"&gt;@InspectahDex&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Ok I think this is the right answer but there is one other thing I need to take into account. Out of the 130 common variables, I only need to update 29 variables from the CD4 dataset. How do I tell SAS to only update those 29 variables?&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Am I correct - for those doctores in CD4 you want update only 29 variables but for those who are not in CD4 you need all variables?&lt;/P&gt;
&lt;P&gt;If true then you should do it within 3 steps:&lt;/P&gt;
&lt;P&gt;(1) Update - keeping NPI and the 29 variables of&amp;nbsp; AC4 EP4 and CD4 - update CD4 keeping all variables but if NPI is in CD4.&lt;/P&gt;
&lt;P&gt;(2) Merge or update CD4, keeping all variables, but only those NPI who are not in CD4.&lt;/P&gt;
&lt;P&gt;(3) Merge the result of step1 and step2.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data Final1t  
     temp1(keep=NPI);
update AC4 (in=in1 keep=NPI ... &amp;lt;29 variables to update&amp;gt; )
       EP4 (in=in2 keep=NPI ... &amp;lt;29 variables to update&amp;gt; )
       IP4  (in=in3 keep=NPI ... &amp;lt;29 variables to update&amp;gt; )
       CD4 (in=in4);
  by NPI;
     if in4 then output Final1;
     else output temp1;
run;
data Final2;
Update AC4 (in=in1)
       EP4 (in=in2)
       IP4  (in=in3)
       temp1 (in=in4);
  by NPI;
      if  in4;
run;
data FinalDataSet;
merge Final1 Final2;
  by NPI;
run;
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 08 Jul 2020 21:21:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Combining-multiple-datasets-into-one-but-choosing-values-based/m-p/667880#M200092</guid>
      <dc:creator>Shmuel</dc:creator>
      <dc:date>2020-07-08T21:21:53Z</dc:date>
    </item>
  </channel>
</rss>

