BookmarkSubscribeRSS Feed
blowsnow__
Calcite | Level 5

Greetings all,

 

I have the following data set stored as a .txt file. How can I use only one DATA step to produce an output that only contains one observation for each variable? My ultimate goal is to have the output as the attached picture. I know I am currently missing a few other variables, but how can I create those variables from the existing ones? This is what I have so far.

 

data testing;
infile datalines firstobs=2;
input name $ 1-15 latest_sales_date:date11. person_total_sales 28-32;
format date date11.;
datalines;
1234567890123456789012345678901234567890
David Wong     1/Aug/2018  13200
Brian Leung    15/Sep/2018 23450
               10/Sep/2018 33000
Mary Chan      3/Jul/2018  45600
               20/Aug/2018 37800
               1/Sep/2018  21500
               30/Aug/2018 42000
John Tam       12/Sep/2018 35000

run;


Capture.PNG

 

 

 

 

9 REPLIES 9
VDD
Ammonite | Level 13 VDD
Ammonite | Level 13

given your sample data @blowsnow__ this should work.

data testing;
infile datalines firstobs=2;
input name $ 1-15 latest_sales_date date11. person_total_sales 28-32;
format latest_sales_date date11.;
/*1234567890123456789012345678901234567890*/
datalines;

David Wong     1/Aug/2018  13200 
Brian Leung    15/Sep/2018 23450
               10/Sep/2018 33000
Mary Chan      3/Jul/2018  45600
               20/Aug/2018 37800
               1/Sep/2018  21500
               30/Aug/2018 42000
John Tam       12/Sep/2018 35000
run;

data temp;
	 set testing;
	 retain holder;
	 if not missing(name) then holder=name;
run;
proc sort data=temp out=temp;
	by holder latest_sales_date;
run;

 


data want(drop=holder hdate person_sales_total);
	 retain days person_sales_total 0 hdate ;
	 set temp;
	 by holder latest_sales_date;
	 running_total + person_total_sales;
	 Person_sales_total + person_total_sales;
	 if first.holder then hdate = latest_sales_date;
	 if last.holder then days = (latest_sales_date - hdate);
	 name=holder;
	 person_total_sales = person_sales_total;
	 if last.holder then 
		do;
			output;
			person_total_sales = 0;
		end;
run;
novinosrin
Tourmaline | Level 20

Hi @blowsnow__  is your expectation one datastep or one pass of the dataset?

 

Plz clarify as within  a datastep , we can have multiple passes

blowsnow__
Calcite | Level 5
The condition is that I must use only one datastep
VDD
Ammonite | Level 13 VDD
Ammonite | Level 13

that changes what I have already posted.  It doesn't meet the requirements since the requirements were not fully stated.

Tom
Super User Tom
Super User

You just need to read the name value into a dummy column and then save it into the real Name variable when it is not empty. Use RETAIN the carry the old values of NAME forward.

 

Here is sketch of what you need to do.

input dummy $ 1-15 ...... ;
name=coalescec(dummy,name);
retain name;
blowsnow__
Calcite | Level 5

But even after running this program, I still have multiple observations for each person. How can my output shows only one name; one Latest_Sales_Date number that is the largest among others; and one Total_Sales sum of all the associated sales amount; for each respective salesperson?

 

data Sales_Total ;
infile datalines firstobs=2 ;
length Dummy Name $15 Latest_Sales_Date Total_Sales 8;
input dummy $15. Latest_Sales_Date date11. Total_Sales ;
retain name;
name=coalescec(dummy,name);
drop dummy;
datalines;
1234567890123456789012345678901234567890
David Wong     1/Aug/2018  13200
Brian Leung    15/Sep/2018 23450
               10/Sep/2018 33000
Mary Chan      3/Jul/2018  45600
               20/Aug/2018 37800
               1/Sep/2018  21500
               30/Aug/2018 42000
John Tam       12/Sep/2018 35000
run;

 

 

Capture2.PNG

 

 

blowsnow__
Calcite | Level 5

Greetings all,

 

I am trying to read a data set into the following table.

 

 

Capture.PNG

 

 

 

 

 

 

 

 

 

When I try to run my program, however, the Name column contains more than one observation for each person. What can I do to make the output contains only one name for each person? And the same goes for the date column, how can I only select the most recent date to show in my output table? Here is what I have so far.

 

data Sales_Total ;
infile datalines firstobs=2 ;
length Dummy Name $15 Latest_Sales_Date Total_Sales 8;
input dummy $15. Latest_Sales_Date date11. Total_Sales ;
retain name;
name=coalescec(dummy,name);
drop dummy;
datalines;
1234567890123456789012345678901234567890
David Wong     1/Aug/2018  13200
Brian Leung    15/Sep/2018 23450
               10/Sep/2018 33000
Mary Chan      3/Jul/2018  45600
               20/Aug/2018 37800
               1/Sep/2018  21500
               30/Aug/2018 42000
John Tam       12/Sep/2018 35000
run;

VDD
Ammonite | Level 13 VDD
Ammonite | Level 13

Why did you start a new post for the same question.  refer to your last post and follow through with your issue that is the same topic same question in the same post please.

@blowsnow__ prior post same question 

https://communities.sas.com/t5/SAS-Programming/Creating-variables-from-existing-variables/m-p/510186...

 

Note you also don't have all of the original requirements listed for the same question listed here.

 https://communities.sas.com/t5/SAS-Programming/How-to-output-the-last-part-data-programming/m-p/5101...

 

Robot Mad

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 2491 views
  • 4 likes
  • 5 in conversation