Solved: How to update a variable value to Zero not drop the entire row like NO...

SRemm · Posted 08-30-2018 03:39 PM

Hello,

I have three large data sets that I merge into one data set data set - one has 20121 rows and data set 2 and 3 have 20143 rows I sort them all on the Emp identifier then run the merge.

After I merge the three I end up with 20143 rows but I end up with 22 members with a duplicate Contribution amount gets added twice with the merge.

To resolve it I take my finished report exported into Excel and find duplicate SSN and high light them

Then I filter the 20143 rows on the color high light to bring them to the top of the list

Then I manually update the Emp Contribtion amounts to zero for one of the duplicate values - this is where NODUPRECS NODUPKEY dropped the entire dup rows and the other variables amounts in the rows are needed or it tosses off my other column amounts.

I am looking for a modify or update data step or SQL or case that I can clean up those rows with a dup member identifier that also have a dup contribution amount to update one of them to zero.

Any help is greatly appreciated.

SR-

data abc_Merged_001AAa;

MERGE abc_STEP2_SORTEDA abc_STEP3_SORTED_A abc_STEP1_SORTEDa ;
by ami

;
RUN;

NOTE: MERGE statement has more than one data set with repeats of BY values.
NOTE: There were 20143 observations read from the data set WORK.abc_STEP2_SORTEDA.
NOTE: There were 20121 observations read from the data set WORK.abc_STEP3_SORTED_D.
NOTE: There were 20143 observations read from the data set WORK.abc_STEP1_SORTEDA.
NOTE: The data set WORK.abc_MERGED_001AAA has 20143 observations and 21 variables.

Reeza · Posted 08-30-2018 04:31 PM

We could probably help with this significantly easier if you could provide a small data sample that reflects the issue and what you want as final output. It does not have to be your real data, but should be reflective of your data, such as if you're doing it by specific groups make sure to include more than one group of data.

After I merge the three I end up with 20143 rows but I end up with 22 members with a duplicate Contribution amount gets added twice with the merge.

Usually to me that means you need to add some other condition to the merge, or de-duplicate a data set before the merge.

I am looking for a modify or update data step or SQL or case that I can clean up those rows with a dup member identifier that also have a dup contribution amount to update one of them to zero.

BY group processing will identify multiple groups if you have a key set of variables that identify a 'group'.

Then if its not the first of the group you can set those values to 0 or missing. You may want missing so they're not included as N's for summary statistics, whereas missing would be excluded automatically, 0's could throw off averages or other calculations.

proc sort data=sashelp.cars out=cars;
by make model mpg_highway;
run;

data identify_first;
set cars;
by Make; *note only make this time;

if NOT first.make then mpg_highway=0;*set to 0;
if NOT first.make then mpg_highway=.; *set to missing;


keep make model mpg_highway; *limit data for example;
run;

@SRemm wrote:

Hello,

I have three large data sets that I merge into one data set data set - one has 20121 rows and data set 2 and 3 have 20143 rows I sort them all on the Emp identifier then run the merge.

After I merge the three I end up with 20143 rows but I end up with 22 members with a duplicate Contribution amount gets added twice with the merge.

To resolve it I take my finished report exported into Excel and find duplicate SSN and high light them

Then I filter the 20143 rows on the color high light to bring them to the top of the list

Then I manually update the Emp Contribtion amounts to zero for one of the duplicate values - this is where NODUPRECS NODUPKEY dropped the entire dup rows and the other variables amounts in the rows are needed or it tosses off my other column amounts.

I am looking for a modify or update data step or SQL or case that I can clean up those rows with a dup member identifier that also have a dup contribution amount to update one of them to zero.

Any help is greatly appreciated.

SR-

data abc_Merged_001AAa;

MERGE abc_STEP2_SORTEDA abc_STEP3_SORTED_A abc_STEP1_SORTEDa ;
by ami

;
RUN;

NOTE: MERGE statement has more than one data set with repeats of BY values.
NOTE: There were 20143 observations read from the data set WORK.abc_STEP2_SORTEDA.
NOTE: There were 20121 observations read from the data set WORK.abc_STEP3_SORTED_D.
NOTE: There were 20143 observations read from the data set WORK.abc_STEP1_SORTEDA.
NOTE: The data set WORK.abc_MERGED_001AAA has 20143 observations and 21 variables.

View solution in original post

SuryaKiran · Posted 08-30-2018 03:54 PM

FIRST. would solve your problem.

Something like: If not first.id then col=0;

Thanks,
Suryakiran

SRemm · Posted 08-30-2018 04:00 PM

Thanks Suryakiran

So col=0 as in col is the variable name??

SR-

SuryaKiran · Posted 08-30-2018 04:06 PM

Sort the data by the Key variable you have (In your case I guess it's ami ) and then change the variable values as required.

eg:

data have;
input id val;
datalines;
1 20
1 30
1 50
2 10
3 20
4 40
;
run;
proc sort data=have;
by id val;
run;
data want;
set have;
by id;
if not first.id then val=0;
run;

Thanks,
Suryakiran

SRemm · Posted 08-30-2018 05:04 PM

Hi Suryakiran,

That worked like a charm...

Thanks again for your time hope this helps other people in the future...

Best,

SR

data want;
set abc_Merged_001AAa;
by AMI;
if not first.AMI then EMP_CTB_AMT=0;
run;

Reeza · Posted 08-30-2018 04:31 PM

We could probably help with this significantly easier if you could provide a small data sample that reflects the issue and what you want as final output. It does not have to be your real data, but should be reflective of your data, such as if you're doing it by specific groups make sure to include more than one group of data.

After I merge the three I end up with 20143 rows but I end up with 22 members with a duplicate Contribution amount gets added twice with the merge.

Usually to me that means you need to add some other condition to the merge, or de-duplicate a data set before the merge.

I am looking for a modify or update data step or SQL or case that I can clean up those rows with a dup member identifier that also have a dup contribution amount to update one of them to zero.

BY group processing will identify multiple groups if you have a key set of variables that identify a 'group'.

Then if its not the first of the group you can set those values to 0 or missing. You may want missing so they're not included as N's for summary statistics, whereas missing would be excluded automatically, 0's could throw off averages or other calculations.

proc sort data=sashelp.cars out=cars;
by make model mpg_highway;
run;

data identify_first;
set cars;
by Make; *note only make this time;

if NOT first.make then mpg_highway=0;*set to 0;
if NOT first.make then mpg_highway=.; *set to missing;


keep make model mpg_highway; *limit data for example;
run;

@SRemm wrote:

Hello,

I have three large data sets that I merge into one data set data set - one has 20121 rows and data set 2 and 3 have 20143 rows I sort them all on the Emp identifier then run the merge.

After I merge the three I end up with 20143 rows but I end up with 22 members with a duplicate Contribution amount gets added twice with the merge.

To resolve it I take my finished report exported into Excel and find duplicate SSN and high light them

Then I filter the 20143 rows on the color high light to bring them to the top of the list

Then I manually update the Emp Contribtion amounts to zero for one of the duplicate values - this is where NODUPRECS NODUPKEY dropped the entire dup rows and the other variables amounts in the rows are needed or it tosses off my other column amounts.

I am looking for a modify or update data step or SQL or case that I can clean up those rows with a dup member identifier that also have a dup contribution amount to update one of them to zero.

Any help is greatly appreciated.

SR-

data abc_Merged_001AAa;

MERGE abc_STEP2_SORTEDA abc_STEP3_SORTED_A abc_STEP1_SORTEDa ;
by ami

;
RUN;

NOTE: MERGE statement has more than one data set with repeats of BY values.
NOTE: There were 20143 observations read from the data set WORK.abc_STEP2_SORTEDA.
NOTE: There were 20121 observations read from the data set WORK.abc_STEP3_SORTED_D.
NOTE: There were 20143 observations read from the data set WORK.abc_STEP1_SORTEDA.
NOTE: The data set WORK.abc_MERGED_001AAA has 20143 observations and 21 variables.

How to update a variable value to Zero not drop the entire row like NODUPKEY NODUPRECS options

Re: How to update a variable value to Zero not drop the entire row like NODUPKEY NODUPRECS options

Re: How to update a variable value to Zero not drop the entire row like NODUPKEY NODUPRECS options

Re: How to update a variable value to Zero not drop the entire row like NODUPKEY NODUPRECS options

Re: How to update a variable value to Zero not drop the entire row like NODUPKEY NODUPRECS options

Re: How to update a variable value to Zero not drop the entire row like NODUPKEY NODUPRECS options

Re: How to update a variable value to Zero not drop the entire row like NODUPKEY NODUPRECS options

How to update a variable value to Zero not drop the entire row like NODUPKEY NODUPRECS options

Re: How to update a variable value to Zero not drop the entire row like NODUPKEY NODUPRECS options

Re: How to update a variable value to Zero not drop the entire row like NODUPKEY NODUPRECS options

Re: How to update a variable value to Zero not drop the entire row like NODUPKEY NODUPRECS options

Re: How to update a variable value to Zero not drop the entire row like NODUPKEY NODUPRECS options

Re: How to update a variable value to Zero not drop the entire row like NODUPKEY NODUPRECS options

Re: How to update a variable value to Zero not drop the entire row like NODUPKEY NODUPRECS options

The 2025 SAS Hackathon has begun!

SAS Training: Just a Click Away