Desktop productivity for business analysts and programmers

finding missing values from a list

Accepted Solution Solved
Reply
Contributor
Posts: 44
Accepted Solution

finding missing values from a list

Hello All,

I am trying to create a list where a certain string is deleted from a string list.

for example: considering the third row in the table below where year=2007M,2011M, the value 2007M,2011M should be deleted from year_list and with end result displayed in variable "want" being "2008,2009,2010". for this same example, the start should be 2007M, as that is the first observation in variable "year".

THe FINAL table that i want should be as follows:

year

year_list

want

2005M

2005M,2006,2007M,2008,2009,2010,2011M

2006,2007M,2008,2009,2010,2011M

2005M,2007M

2005M,2006,2007M,2008,2009,2010,2011M

2006,2008,2009,2010,2011M

2007M,2011M

2005M,2006,2007M,2008,2009,2010,2011M

2008,2009,2010

I tried arrays to solve this problem, but not able to get anywhere close to the first step.

I list below my code:

proc sql;
create table have (year VARCHAR(100), year_list VARCHAR(100));
insert into have values('2005M','2005M,2006,2007M,2008,2009,2010,2011M');
insert into have values('2005M,2007M','2005M,2006,2007M,2008,2009,2010,2011M');
insert into have values('2007M,2011M','2005M,2006,2007M,2008,2009,2010,2011M');
run;

data trial (keep=x);
set have(obs=1);
cnt=countw(year_list,",");
i=0;
do while (i<cnt);
i+1;
x=cats("dlv",scan(strip(year_list),i,","));
output;
end;
run;

proc sql ;
select distinct cats("mart",max(countw(year,","))), max(countw(year,",")) into :arraylist,:arraycnt from have;
select distinct cats("dlv",max(countw(year_list,","))), max(countw(year_list,",")) into :dlv_array, :dlvcnt from have;
select distinct x into :array_list separated by ' ' from trial;
run;

data want ;
set have;
length &array_list. $100.;
array dlv(&dlvcnt.) $ &array_list.;
do _j=1 to dim(dlv);
if index(upcase(vname(dlv(_j))),upcase(cats("DLV",dlv[_j]))) then do;
dlv[_j]=scan(year,_j,',');
end;
end;
run;

 

The array code in the last data step does not display the result i want for the first step of arriving to the final table. The table should look more like:

year year_list dlv2005M dlv2006 dlv2007M dlv2008 dlv2009 dlv2010 dlv2011M
2005M 2005M,2006,2007M,2008,2009,2010,2011M 2005M            
2005M,2007M 2005M,2006,2007M,2008,2009,2010,2011M 2005M   2007M        
2007M,2011M 2005M,2006,2007M,2008,2009,2010,2011M     2007M       2011M

After getting this table, I look at finding all missing values in the table and listing these out.

Still left with the dilemma of how to determine the first observation Smiley Sad

Any pointers, help, tip would be appreciated.

thanking you


Accepted Solutions
Solution
‎04-25-2016 08:51 AM
Esteemed Advisor
Esteemed Advisor
Posts: 7,203

Re: finding missing values from a list

Hi,

 

I am sure you have posted this question in the last few weeks, no?  Anyways, as always my first question is going to be why?  SAS is built around the construct of datasets containing variables which hold One! item per variable.  Your trying to fit many data items into one variable, this is just going to cause you headaches through programming.  I would suggest the following, normalise your data, have one row per id value.  You can then easily apply any programming to that dataset and then if for some reason you need the final, transpose it up again when you have finished - remember the data you program with does not need to look like any out format.

However, that being said, you can process it as is:

data have;
  length year year_list $200;
  input year $ year_list $;
datalines;
2005M 2005M,2006,2007M,2008,2009,2010,2011M
2005M,2007M 2005M,2006,2007M,2008,2009,2010,2011M
;
run;
data want;
  length want $200;
  set have;
  do i=1 to countw(year_list,",");
    if index(year,scan(year_list,i,","))=0 then want=catx(",",want,scan(year_list,i,","));
  end;
run;

View solution in original post


All Replies
Contributor
Posts: 44

compare values and note missing values

Hello All,

I am trying to create a list where a certain string is deleted from a string list.

I adopted the array approach as in the code for the following reasons:

1) The year and year_list is in character format and comma separated. the year and year_list values are comma separated and sorted.

2) Variable "want" lists all variables from "year_list2 that are not present in variable "year", however ensuring that the starting point is determined by the first value from variable "year" in "year_list" - as seen in the last row.

THe table that i want should be as follows:

year

year_list

want

2005M

2005M,2006,2007M,2008,2009,2010,2011M

2006,2007M,2008,2009,2010,2011M

2005M,2007M

2005M,2006,2007M,2008,2009,2010,2011M

2006,2008,2009,2010,2011M

2007M,2011M

2005M,2006,2007M,2008,2009,2010,2011M

2008,2009,2010

 

proc sql;
create table have (year VARCHAR(100), year_list VARCHAR(100));
insert into have values('2005M','2005M,2006,2007M,2008,2009,2010,20​11M');
insert into have values('2005M,2007M','2005M,2006,2007M,2008,2009,2​010,2011M');
insert into have values('2007M,2011M','2005M,2006,2007M,2008,2009,2​010,2011M');
run;

data tmp1;
set have;
csv=countw(year,",");
codgen_csv=countw(year_list,",");
run;


proc sql ;
select distinct cats("mart",max(csv)), max(csv) into :arraylist,:arraycnt from tmp1;
select distinct cats("dlv",max(codgen_csv)), max(codgen_csv) into :dlv_array, :dlvcnt from tmp1;
run;

data want (drop=list);
set tmp1;
length mart1-&arraylist. $100.;
length dlv1-&dlv_array. $100.;
array mart(&arraycnt.) $;
array dlv(&dlvcnt.) $;
do _i=1 to dim(mart);
mart[_i]=scan(year,_i,',');
end;
do _j=1 to dim(dlv);
dlv[_j]=scan(year_list,_j,',');
end;
run;

I am really struggling with this. And i don't know how to approach this problem.

 

Any pointers/ help would be appreciated.

Solution
‎04-25-2016 08:51 AM
Esteemed Advisor
Esteemed Advisor
Posts: 7,203

Re: finding missing values from a list

Hi,

 

I am sure you have posted this question in the last few weeks, no?  Anyways, as always my first question is going to be why?  SAS is built around the construct of datasets containing variables which hold One! item per variable.  Your trying to fit many data items into one variable, this is just going to cause you headaches through programming.  I would suggest the following, normalise your data, have one row per id value.  You can then easily apply any programming to that dataset and then if for some reason you need the final, transpose it up again when you have finished - remember the data you program with does not need to look like any out format.

However, that being said, you can process it as is:

data have;
  length year year_list $200;
  input year $ year_list $;
datalines;
2005M 2005M,2006,2007M,2008,2009,2010,2011M
2005M,2007M 2005M,2006,2007M,2008,2009,2010,2011M
;
run;
data want;
  length want $200;
  set have;
  do i=1 to countw(year_list,",");
    if index(year,scan(year_list,i,","))=0 then want=catx(",",want,scan(year_list,i,","));
  end;
run;
Contributor
Posts: 44

Re: finding missing values from a list

Sorry for the pestering and double emails. I am not pleased with the format structure that i presented in the post.THis is data that i receive from third party and a request to deliver in a better format is something they cannot do Smiley Sad

Sir RW9, you are a genius.

Smiley Very Happy 

 

For anyone having a similar query, slightly modified solution below: to produce desired outcome


data want (drop=i);
	length want $200;
	set have;
	do i=1 to countw(year_list,",");
		if index(year,scan(substr(year_list,max(index(year_list,scan(year,1,",")),1)),i,","))=0 then
			want=catx(",",want,scan(substr(year_list,max(index(year_list,scan(year,1,",")),1)),i,","));
	end;
run;

thank you

 

Grand Advisor
Posts: 9,576

Re: finding missing values from a list

I am not sure I understand what is your mean completely .

data have;
infile cards dlm=' ';
input year :$20. year_list : $100. ;
cards;
2005M 2005M,2006,2007M,2008,2009,2010,2011M
2005M,2007M 2005M,2006,2007M,2008,2009,2010,2011M
2007M,2011M 2005M,2006,2007M,2008,2009,2010,2011M
;
run;
data want;
 set have;
 length want $ 200;
 first=scan(year,1,',');
 s=findw(year_list,strip(first));
 temp=substr(year_list,s);
 do i=1 to countw(temp,',');
  t=scan(temp,i,',');
  if not findw(year,strip(t)) then want=catx(',',want,t);
 end;
 drop t i s temp first ;
run;

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 267 views
  • 0 likes
  • 3 in conversation