BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Sofianeuk
Fluorite | Level 6

Hello,

 

i have a sales dataset as shown below, i am facing a problem to manipulate the data with the Dates presented as attributes in the top row, can someone suggest a method to reshape the data to be able to use the dataset in my descriptive analysis ,

 

for example i would like to add a where clause such as Where Year < 2018 ...etc, however with my actual file i can not do that as long as the dates are presented as attributes?

 

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
ErikLund_Jensen
Rhodochrosite | Level 12

Hi @Sofianeuk 

 

The following code transforms the data set to something that can be used in analysis. The import step reads data into sas, but SAS put an underscore in front of the month variable names, because a valid SAS variable name cannot start with a number.

 

The following step is there to make the code dynamic. The actual month variable names from input is placed as a list in a macro variable, so the following reshape step can handle any input as long as it has the same structure with variable names yyyymm. 

 

The last step transforms the "wide" input  data set to a "long" dataset with a record for each ID / Month with a sales quantity.

 

* Import spreadsheet;
proc import 
	datafile="c:\temp\sales.xlsx"
	dbms = xlsx 
	out = have;
run;

* Get string with month variables from input (all vars starting with _);
proc sql noprint;
	select name into :monthlist separated by ' '
	from sashelp.vcolumn
	where libname = 'WORK' and memname = 'HAVE' and name EQT '_';
quit;
%put &=monthlist;

* Reshape data - transform to Month, Year and Qty_Sold variables
	Drop month variables from input;
data want;
	set have;
	array months &monthlist;
	drop &monthlist Total_Qty_Sold;
	do i = 1 to dim(months);
		Month = substr(vname(months{i}),2);
		if months{i} ne . then do;
			Year = substr(Month,1,4);
			Qty_Sold = months{i};
			output;
		end;
	end;
run;

 

 

 

 

View solution in original post

13 REPLIES 13
Onizuka
Pyrite | Level 9

I do not understand your data, you have a column 201305 with numbers, why don't you have a variable date on you file ?

Sofianeuk
Fluorite | Level 6

Hi ,yes that is the problem i am facing the 05 2019, this is the date May 2019, so for example in May 2019 (05-2019) product A sold 23 of them,

 

Thanks

Onizuka
Pyrite | Level 9

So when you say that you want a

 

Where year < 2018, you want all columns starting with the minimum (xx 2013) to 12 2018 ?

 

EDIT : If you want some help, you should create a dataset example which can be copy/past on SAS !

 

Like below :

 

Data have ;
format var1 $15. var2 8. var3 $13. /* ..... */ ;
input var1 var2 var3 /* ... */ ;
cards;
blabla blabla blabla
blabla blabla blabla
blabla blabla blabla
;
Sofianeuk
Fluorite | Level 6

Hi,

 

i am about to conduct descriptive analysis on my data, but i beleive having dates (05 2013, 06 2013, 07 3013 etc..) as attributes wont help me much, i think ill need to reshape the date to have it long instead of wide, to be honest i am not sure just asking,

 

here is a sas example:

 

Data have ;
format var1 $15. var2 8. var3 $13 var4 $Date052018 var5 $Date062018 var6 $Date062018 var7 $Sales  /* ..... */ ;
input var1 var2 var3 var4 var5 var6 var7/* ... */ ;
cards;blabla blabla blabla test 12 4 5 6 
blabla blabla blabla test 12 2 5 6 
blabla blabla blabla test 14 1 4 6 ;

 

if you see example attached, i would like to conduct my descriptive analysis using the dates as well, but not sure how i can acheive that.

ballardw
Super User

@Sofianeuk wrote:

Hello,

 

i have a sales dataset as shown below, i am facing a problem to manipulate the data with the Dates presented as attributes in the top row, can someone suggest a method to reshape the data to be able to use the dataset in my descriptive analysis ,

 

for example i would like to add a where clause such as Where Year < 2018 ...etc, however with my actual file i can not do that as long as the dates are presented as attributes?

 

Thank you


Please provide examples of YOUR SAS data set in the form of a data step.

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

 

There are several ways to reshape the data and turn column headings into actual date values but I am not going to make dummy data as it may not quite reflect your actual data structure.

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @Sofianeuk 

 

The following code transforms the data set to something that can be used in analysis. The import step reads data into sas, but SAS put an underscore in front of the month variable names, because a valid SAS variable name cannot start with a number.

 

The following step is there to make the code dynamic. The actual month variable names from input is placed as a list in a macro variable, so the following reshape step can handle any input as long as it has the same structure with variable names yyyymm. 

 

The last step transforms the "wide" input  data set to a "long" dataset with a record for each ID / Month with a sales quantity.

 

* Import spreadsheet;
proc import 
	datafile="c:\temp\sales.xlsx"
	dbms = xlsx 
	out = have;
run;

* Get string with month variables from input (all vars starting with _);
proc sql noprint;
	select name into :monthlist separated by ' '
	from sashelp.vcolumn
	where libname = 'WORK' and memname = 'HAVE' and name EQT '_';
quit;
%put &=monthlist;

* Reshape data - transform to Month, Year and Qty_Sold variables
	Drop month variables from input;
data want;
	set have;
	array months &monthlist;
	drop &monthlist Total_Qty_Sold;
	do i = 1 to dim(months);
		Month = substr(vname(months{i}),2);
		if months{i} ne . then do;
			Year = substr(Month,1,4);
			Qty_Sold = months{i};
			output;
		end;
	end;
run;

 

 

 

 

ErikLund_Jensen
Rhodochrosite | Level 12
Input is the xlsx-file you posted. I just saved it under the name c:\temp\sales.xlsx
Sofianeuk
Fluorite | Level 6

ErikLund_Jensen, i am really grateful your code has just done it in a fraction, however is it possible to keep the month only as MM and not with year included, the year looks perfect only YYYY, but month field has the MMYYYY, is it possible to have it only MM? i can for sure extract only month from there and reshape if you think this will work as well, 

 

also you will do me a huge favor if you will explain to me in details the bit after getting the string with month variable please as this did all the magic for me, and i have no idea how it calculated the sum of all year for each product realy an ace solution,

 

Many Thanks

 

Sof

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @Sofianeuk 

 

I am happy to hear you found my code useful. Month can be extracted with a substring just like Year, see examples. If you would prefer numeric variables for Month and Year, it is also possible, see the second example below.

 

I am sorry to say that I don't understand what you mean with this: i have no idea how it calculated the sum of all year for each product realy an ace solution, so I can't comment on it without a further explanation.

 

In the first example below, I have made comments for most lines to explain what happens.

 


data want;

	* The set statement takes input one observation after the other, so all code is;
	* executed for every observation;
	set have;

	* Declare an array with the list of elements in the macro variable &monthlist;
	array months &monthlist;

	* Drop these variables from output together with Total_Qty_Sold;
	* It makes no sense any more, because it horisontal sum;
	drop &monthlist Total_Qty_Sold i;

	* Loop over the array - dim function returns the number if elements;
	* With i as index variable, all month cells in one row in the spreadsheet is processed;
	* one after another;
	do i = 1 to dim(months);

		* We need month as variable VALUE instead of a VARIABLE NAME;
		* Months{i} is an array reference. It is the same as writing e.g. _201401 in the code,;
		*   and returns the VALUE of the variable eg. 3;
		* But vname(months{i}) returns the VARIABLE NAME of the i'th element i in the array,;
		*   eg. _201401, and returns that as a VALUE;
		* We want to get rid of the underscore, so we take a substring starting with 2;
		* So in the loop, the variable Month takes values 201305, 201306 etc,;
		Month = substr(vname(months{i}),2);

		* In the loop, months{i} has values 4, ., 3, ., 8 etc;
		* We only want to process the month, if the quantity sold is not missing.;
		if months{i} ne . then do;

			* The variable Year is the four first characters in the Month variable;
			Year = substr(Month,1,4);

			* The variable Month is recoded to contain only characters 5 and 6;
			Month = substr(Month,5,2);

			* Quantity_Sold is the sales quantity contained in the month variable;
			* corresponding to the value in one cell in the spreadsheet;
			Qty_Sold = months{i};

			* Output to result data set. Without an output statement, there is an ;
			* implicit output at the end of the data step, so we would get one; 
			* output observation per input input observation with the values set;
			* in the last loop iteration;
			* We want output from every loop iteration, if there is a sales quantity;
			output;
		end;
	end;
run;

* example with numeric Year and Month;
data want;
	set have;
	array months &monthlist;
	drop &monthlist Total_Qty_Sold i cMonth;
	do i = 1 to dim(months);
		cMonth = substr(vname(months{i}),2);
		if months{i} ne . then do;
			Year = input(substr(cMonth,1,4),4.);
			Month = input(substr(cMonth,5,2),2.);
			Qty_Sold = months{i};
			output;
		end;
	end;
run;

 

 

Sofianeuk
Fluorite | Level 6

That was really helpful Erik, many thanks for all your support, I was lost with vname (obviously its a SAS function to call an array var ref), also I am going through the code now line by line alongside your comments to try to understand the concept.

 

Again thank you ever so much,

 

Best Regards

 

Sof

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @Sofianeuk 

 

IN SAS, an array is a list of variables, eg. ARRAY VARLIST VAR1-VAR3. Instead of referring to the variable by name, eg. VAR2, it can be referred by position in the array, eg. VARLIST{2}. The position can be expressed as literal as here, but can also be another variable, eg. VARLIST{IndexVar}. This makes it possible to loop over the array variable by variable by incrementing the index variable.

 

If a variable name is used in an assignment, the normal notation is eg. NEWVAR = VAR2, but if the variable is declared as a member of an array, the array reference can be used instead: NEWVAR = VARLIST{2}.

 

If a variable name is used in an assignment or function, the result is the value of the variable, so if VARLIST{2} = 2222, then NEWVAR = VARLIST{2} will set NEWVAR to 2222.

 

But sometimes it can be useful to access information about a variable, like name, type, length or format, instead of the variable value. SAS has a score of functions for that, so eg. vname(VAR2) returns the name VAR2 instead of the value 2222, and vtype(VAR2) returns N (for numeric). And in these functions, an array reference can be used instead of the literal variable name.

 

I made an example. It is basically the same code as my previous post, but here all complications are left out to demonstrate the working principle: A list of 3 variables in one observarion is transposed to 3 observations, each with one variablename / value pair:

 


data have;
	ID = 1; VAR1 = 1111; VAR2 = 2222; VAR3 = 3333;
run;

data want; set have;
	keep ID varname varvalue;
	array varlist VAR1-VAR3;
	do i = 1 to 3;
		varname = vname(varlist{i});
		varvalue = varlist{i};
		output;
	end;
run;
Sofianeuk
Fluorite | Level 6
Thats really very generous of you, clearer now, many thanks for your contribution,

Thank you ever so much

Sof
Sofianeuk
Fluorite | Level 6

Dear Erik,

 

quick question please, given the data i am using now, do you have any guide on how to conduct or what is the best apprach to use for a good descriptive analysis? i am aware of the Proc Freq, means etc.. is there anything to follow to make sure that i have tackeled most or all the nedded analysis on my data.

 

Many Thanks

 

Sof

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1225 views
  • 3 likes
  • 4 in conversation