BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
IgorR
Quartz | Level 8

Hi!

I have data like this:

Data Have;
Input MS_POL MS_SEIF KOD_TAARIF SUG_TAARIF;
datalines;
15476381 1 758 7
15476381 1 772 9
15476381 17 0 0
15476382 1 3001 7
15476382 1 3002 9
15476382 17 0 0
;
Run;

I need data like this:

Data Want;
Input MS_POL MS_SEIF KOD_TAARIF SUG_TAARIF;
datalines;
15476381 1 758 7
15476381 1 772 9
15476381 17 772 9
15476382 1 3001 7
15476382 1 3002 9
15476382 17 3002 9
;
Run;

The logic:

If MS_SEIF = 17 then bring KOD_TAARIF from the Same MS_POL where SUG_TAARIF = 9.

If There is no SUG_TAARIF = 9 Then KOD_TAARIF  = 0

 

Any help how to do that will be highly appreciated.

Thanks in advance.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Below approach doesn't require sorting and given your volumes you should also have enough memory for the hash table.

Data Have;
  Input MS_POL MS_SEIF KOD_TAARIF SUG_TAARIF;
  datalines;
15476381 1 758 7
15476381 1 772 9
15476381 17 0 0
15476382 1 3001 7
15476382 1 3002 9
15476382 17 0 0
;

data want;
  if _n_=1 then
    do;
      dcl hash h1(dataset:'have(where=(SUG_TAARIF = 9))');
      h1.defineKey('MS_POL');
      h1.defineData('KOD_TAARIF ');
      h1.defineDone();
    end;
  set have;
  If MS_SEIF = 17 then
    do;
      if h1.find() ne 0 then KOD_TAARIF = 0;
    end;
run;

proc print data=want;
run;

Patrick_0-1702517321830.png

 

View solution in original post

11 REPLIES 11
PaigeMiller
Diamond | Level 26

Will the data always be sorted as you have shown? Or can sometimes a different sort be found where MS_SEIF=17 is not the last row for a given MS_POL? Is SUG_TAARIF always in the order shown, or can that change too?

--
Paige Miller
IgorR
Quartz | Level 8

@PaigeMiller, I can't guarantee that the data will always be sorted.

There might be additional data between these rows with different values.

PaigeMiller
Diamond | Level 26

If our solution were to sort the data, would that work for you? How large is the actual data set (number of rows and number of columns)?

--
Paige Miller
IgorR
Quartz | Level 8

@PaigeMiller, Depends of the month - between 10 to 20 million rows.

Constant 18 columns.

PaigeMiller
Diamond | Level 26

If our solution were to sort the data, would that work for you?

--
Paige Miller
PaigeMiller
Diamond | Level 26

Well, I will have to think about this, whatever solution I come up with for your sample data also needs to be scaled and run efficiently for 10-20 million records, and at the moment I'm not sure how best to do this.

--
Paige Miller
A_Kh
Lapis Lazuli | Level 10

My approach. 

1. Identify unique MS_POL with at least one observation with SUG_TAARIF=9

2. Get KOD_TAARIF values for those observations into macro variable

3. Assign macro variable values to KOD_TAARIF where MS_SEIF=17, appropriately per MS_POL .

 

Data Have;
Input MS_POL MS_SEIF KOD_TAARIF SUG_TAARIF;
datalines;
15476381 1 758 7
15476381 1 772 9
15476381 17 0 0
15476382 1 3001 7
15476382 1 3002 9
15476382 17 0 0
;
proc print; Run;


*Identifying unique MS_POls having at least one record where SUG_TAARIF = 9;
proc sql noprint;;
	select distinct MS_POL into:id separated by ' ' from have where SUG_TAARIF = 9;
quit;
%put &=id;  

*Splitting data into 2 parts, one with SUG_TAARIF = 9 obs, another without SUG_TAARIF = 9 observation; 
data have1 have2;
	set have;
	if MS_POL in (&id) then output have1;
	else output have2;
run; 

*Creating series of macro variables with KOD_TAARIF values where SUG_TAARIF = 9; 
data _null_;
	set have1;
	if SUG_TAARIF= 9 then call symputx(cats('kt', MS_POL), KOD_TAARIF);
run; 
%put &=kt15476381, &=kt15476382;


*If MS_SEIF = 17 then bring KOD_TAARIF from the Same MS_POL where SUG_TAARIF = 9.;
data have1a;
	set have1;
	if ms_seif eq 17 then do;
		sug_taarif=9;
		kod_taarif=input(symget(cats('kt', ms_pol)), best.); 
	end; 
run;

*Setting back two parts of HAVE dataset;
data want;
	set have1a have2;
proc print;run;  
Patrick
Opal | Level 21

Below approach doesn't require sorting and given your volumes you should also have enough memory for the hash table.

Data Have;
  Input MS_POL MS_SEIF KOD_TAARIF SUG_TAARIF;
  datalines;
15476381 1 758 7
15476381 1 772 9
15476381 17 0 0
15476382 1 3001 7
15476382 1 3002 9
15476382 17 0 0
;

data want;
  if _n_=1 then
    do;
      dcl hash h1(dataset:'have(where=(SUG_TAARIF = 9))');
      h1.defineKey('MS_POL');
      h1.defineData('KOD_TAARIF ');
      h1.defineDone();
    end;
  set have;
  If MS_SEIF = 17 then
    do;
      if h1.find() ne 0 then KOD_TAARIF = 0;
    end;
run;

proc print data=want;
run;

Patrick_0-1702517321830.png

 

IgorR
Quartz | Level 8

Hi,

Thank you for your help!

I have a question - must "h1.defineData('KOD_TAARIF'); " have the same name in the both tables?

What should I do If it has different names in two tables?

 

Patrick
Opal | Level 21

They can have different names but if there is no same named variable the Base dataset then you need to also define it there (for the PDV).

 

The easiest is to rename the variable before loading into the hash.

  if _n_=1 then
    do;
      dcl hash h1(dataset:'looup_table(where=(SUG_TAARIF = 9) renanme=(other_name=KOD_TAARIF))');
      h1.defineKey('MS_POL');
      h1.defineData('KOD_TAARIF ');
      h1.defineDone();
    end;

If different name then define the variable also for the PDV

  if _n_=1 then
    do;
      length other_name 8
      dcl hash h1(dataset:'looup_table(where=(SUG_TAARIF = 9))');
      h1.defineKey('MS_POL');
      h1.defineData('other_name ');
      h1.defineDone();
    end;

 

Or this way to create the variable with the exact attributes like in source

  if _n_=1 then
    do;
      if 0 then set looup_table(keep=other_name);
      dcl hash h1(dataset:'looup_table(where=(SUG_TAARIF = 9))');
      h1.defineKey('MS_POL');
      h1.defineData('other_name ');
      h1.defineDone();
    end;
    call missing(other_name);

The call missing() is required because with the variable defined via a SET statement it gets retained if there is no new row read from the source table (which with if 0 then.. never happens). 

Without the call missing() and if the hash lookup doesn't find an entry you would get the retained value from the last successful lookup.

 

If the variable names in the hash lookup are different then you will also need to use the KEY keyword for lookup if it's a key variable and also DATA if it's a data variable. It's all explained in the docu and multiple good white papers (and there is also a book). It's really worth learning how hash tables work in SAS and what the use cases are. 

 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1470 views
  • 4 likes
  • 4 in conversation