Solved: Re: ADD RECORDS TO MASTER DATA SET BY...

alan_maxs · Posted 06-28-2017 12:07 AM

hi, i need heellpp!

I need to achieve the following:

data master;
 input id $ x y z;
 datalines;
 1001 340 2 6
 1001 2670 4 8
 1002 990 10 6
 1002 1050 2 1
 1002 887 4 5
 1003 5896 3 2
 ;
run;

data update;
 input id $ x y z;
 datalines;
 1001 340 4 9
 1001 390 2 1
 1003 9900 10 6
 1003 1050 2 1
 1004 887 4 5
 1009 5896 3 2
 ;
run;

data want;**-----> add (or update) records by id (if id are present in master. i.e id=1004 or 1009 dont interest me); 
 input id $ x y z;
 datalines;
 1001 340 2 6
 1001 2670 4 8
 1001 340 4 9
 1001 390 2 1 
 1002 990 10 6
 1002 1050 2 1
 1002 887 4 5
 1003 5896 3 2
 1003 9900 10 6
 1003 1050 2 1
 ;
run;

Thankss!

Alan

Ksharp · Posted 06-28-2017 08:49 AM

data master;
 input id $ x y z;
 datalines;
 1001 340 2 6
 1001 2670 4 8
 1002 990 10 6
 1002 1050 2 1
 1002 887 4 5
 1003 5896 3 2
 ;
run;

data new_data;
 input id $ x y z;
 datalines;
 1001 340 4 9
 1001 390 2 1
 1003 9900 10 6
 1003 1050 2 1
 1004 887 4 5
 1009 5896 3 2
 ;
run;
proc sql;
create table want as
 select *
  from master
union all corr
 select *
  from new_data
   where id in (select id from master)
order by id;
quit;

View solution in original post

kiranv_ · Posted 06-28-2017 12:32 AM

something like this . I just changed your table name from update to updated

proc sql;
create table temp_table as 
select * from updated a
where exists(select * from master b
where a.id =b.id);
insert into master
select * from temp_table;
quit;

or

proc sql;
create table temp_table as
select * from updated
where id in(select id from master);
insert into master
select * from temp_table;
quit;

alan_maxs · Posted 07-03-2017 05:25 PM

thanks! its works! 🙂

RW9 · Posted 06-28-2017 04:41 AM

Couple of issues with what you post, dont call a dataste "update", and also specify what the by groups are here as x y z is non descriptive. I assumed that as ID has multiple rows that X is the other grouping variable. SAS provides the update statement in datastep specifically for these master/transaction processes, docs here:

http://support.sas.com/documentation/cdl/en/basess/68381/HTML/default/viewer.htm#n0s3jm3mkzvz5qn1mri...

And for your example:

data master;
 input id $ x y z;
 datalines;
 1001 340 2 6
 1001 2670 4 8
 1002 990 10 6
 1002 1050 2 1
 1002 887 4 5
 1003 5896 3 2
 ;
run;

data new_data;
 input id $ x y z;
 datalines;
 1001 340 4 9
 1001 390 2 1
 1003 9900 10 6
 1003 1050 2 1
 1004 887 4 5
 1009 5896 3 2
 ;
run;

proc sort data=master;
  by id x;
run;

proc sort data=new_data;
  by id x;
run;
  
data want;
  update master new_data;
  by id x;
run;

Astounding · Posted 06-28-2017 08:21 AM

Here's a DATA step approach that assumes both data sets are sorted:

data want;

set master (in=in1) update;

by id;

if first.id then keep_flag = in1;

retain keep_flag;

if keep_flag=1;

drop keep_flag;

run;

In SAS terminology, we would usually call this interleaving (rather than merging or updating). You have the added twist of wanting to eliminate ID values that are not in the master data set.

alan_maxs · Posted 07-03-2017 05:24 PM

thanks!!
I sorry for the terminology, i'm new!

Ksharp · Posted 06-28-2017 08:49 AM

data master;
 input id $ x y z;
 datalines;
 1001 340 2 6
 1001 2670 4 8
 1002 990 10 6
 1002 1050 2 1
 1002 887 4 5
 1003 5896 3 2
 ;
run;

data new_data;
 input id $ x y z;
 datalines;
 1001 340 4 9
 1001 390 2 1
 1003 9900 10 6
 1003 1050 2 1
 1004 887 4 5
 1009 5896 3 2
 ;
run;
proc sql;
create table want as
 select *
  from master
union all corr
 select *
  from new_data
   where id in (select id from master)
order by id;
quit;

alan_maxs · Posted 07-03-2017 05:32 PM

Hi Ksharp, thank you very much for your reply, its works!

Can I ask you another question?

I have the following situation:

data have;
 input id $ weight1 weight2 age year $ group $;
 datalines;
 1001 340 330 35.7 1990 2
 1001 267 289 35.7 1988 2
 1001 300 297 35.7 1989 2  
 1002 450 438 33.2 1995 1
 1002 299 279 33.2 1993 1
 1002 390 385 33.2 1994 1
 1003 519 512 39.5 2000 4
 1003 450 443 39.5 1998 4
 1003 329 316 39.5 1997 4
 1003 310 301 39.5 1995 4
 ;
run;

I need to generate a new dataset ("want") which adds an observation by id putting zeroes (0) to two variables ("weight1" & "weight2") and retaining the original values of the others ("age", "year" & "group").

like as:

data want;
 input id $ weight1 weight2 age year $ group $;
 datalines;
 1001 0 0 35.7 1990 2
 1001 340 330 35.7 1990 2
 1001 267 289 35.7 1988 2
 1001 300 297 35.7 1989 2
 1002 0 0 33.2 1995 1
 1002 450 438 33.2 1995 1
 1002 299 279 33.2 1993 1
 1002 390 385 33.2 1994 1
 1003 0 0 39.5 2000 4
 1003 519 512 39.5 2000 4
 1003 450 443 39.5 1998 4
 1003 329 316 39.5 1997 4
 1003 310 301 39.5 1995 4
 ;
run;

it is understood?

thanks!

Ksharp · Posted 07-04-2017 09:07 AM

OK. Here is .

data have;
 input id $ weight1 weight2 age year $ group $;
 datalines;
 1001 340 330 35.7 1990 2
 1001 267 289 35.7 1988 2
 1001 300 297 35.7 1989 2  
 1002 450 438 33.2 1995 1
 1002 299 279 33.2 1993 1
 1002 390 385 33.2 1994 1
 1003 519 512 39.5 2000 4
 1003 450 443 39.5 1998 4
 1003 329 316 39.5 1997 4
 1003 310 301 39.5 1995 4
 ;
run;
data want;
 set have;
 by id;
 if first.id then do;
  w1=weight1;w2=weight2;
  weight1=0;weight2=0;output;
  weight1=w1;weight2=w2;output;
 end;
 else output;
 drop w1 w2;
run;

alan_maxs · Posted 07-04-2017 02:02 PM

Thanks Ksharp!!

Another query please, is there any function in SAS that allows to invoke "all variables"?

For example: If "all variables" ne weight1 or weight2 then "all variables" = .;
Do you understand what I want to do?

In my situation above (if first.id then do), if I wanted to put all other variables missing except for weight1 or weight2 (latter = 0) ...
because in my original file I have many variables.

thanks, again!

Ksharp · Posted 07-05-2017 08:41 AM

No . There is no such function . But you can use ARRAY to get what you want .

alan_maxs · Posted 07-06-2017 01:31 PM

Ok.

I ask one last help:

I'm building a macro that allows you to keep observations for 2 conditions:
1- if the frequencies of &contgroup are greater than or equal to 3;
2- if the frequencies of &sire are greater than or equal to 3;

The complex thing is that eliminating observations by 1- changes the frequencies to 2-, and vice versa. It is understood?

Here is an example of my data file:

data have;
 input id $ sire $ contgroup $;
 datalines;
9914089	9613073	81999	
9914096	9512799	81999	
9914108	100671	81999	
9914115	100690	81999	
9914119	100690	81999	
9914124	100689	81999	
9914130	9613166	81999	
9914136	9613048	81999	
9914137	100671	91999	
9914148	9512799	91999	
9914158	9613166	91999	
9914163	100671	91999	
9914164	9512782	91999
;
run;

I have tried several ways but I always have the same problem, deleting/retaining observations for one of the conditions, change the frequencies of the other variable.

How could the macro do?

Thanks!

Alan

Ksharp · Posted 07-07-2017 10:04 AM

Sorry. I couldn't understand what you mean.

Maybe you should start a new session. Let others see your question.

alan_maxs · Posted 07-07-2017 02:25 PM

Ok.
Here: https://communities.sas.com/t5/Base-SAS-Programming/delete-retein-observations-by-frequency-n-of-var...

Classroom Training Available!