Threaded Mode | Linear Mode

Ender · 11-24-2015, 03:29 AM

Maybe those in the know here can help me out with a statistics problem as follows:

A comparative prices study took price samples in two regions to find out whether there are any price differences in the two areas. The prices of the same items in the two regions were compared and their differences tabulated as a percentage of a benchmark price in one region. The task at hand is to determine whether there is any correlation between the size of the price differential (expressed as %) and: (i) the value of the product (expressed as $); (ii) the product category (C1, C2, C3, …); the product type (consumable or non-consumable); (iii) the location of the distributor of the product (D1, D2, D3, …); and the chain store selling the product (S1, S2, S3, …).

My question is: what is the best approach to solving this and how do I frame this as a problem statement or equation? Any pointers to the right direction is very much appreciated. Thanks and cheers.

walter b · 11-24-2015, 01:49 PM

IMHO, the most important question is: Are the observed differences significant at all? Else you're wasting your time. You find a simple three-step method for checking significance in the WP 31S User's Manual available here. Or look at pp. 49f of this: http://sourceforge.net/p/wp34s/code/HEAD...4s_3_1.pdf. HTH a bit.

d:-)

CR Haeger · 11-24-2015, 04:28 PM

(11-24-2015 03:29 AM)Ender Wrote: Maybe those in the know here can help me out with a statistics problem as follows:

A comparative prices study took price samples in two regions to find out whether there are any price differences in the two areas. The prices of the same items in the two regions were compared and their differences tabulated as a percentage of a benchmark price in one region.

The task at hand is to determine whether there is any correlation between the size of the price differential (expressed as %) and: (i) the value of the product (expressed as $); (ii) the product category (C1, C2, C3, …); the product type (consumable or non-consumable); (iii) the location of the distributor of the product (D1, D2, D3, …); and the chain store selling the product (S1, S2, S3, …).

My question is: what is the best approach to solving this and how do I frame this as a problem statement or equation? Any pointers to the right direction is very much appreciated. Thanks and cheers.

No offense, but first you may need to provide readers more background on the data and the context of the initial comparative study and the proposed "correlation task". As Walter suggests, the data and results from the initial comparative study may guide if and how to conduct the proposed correlation task.

For the initial comparative study: what is the data and how was it collected? How was it analyzed to determine if there were any statistically and/or practically different prices? Do the results point to the need for the correlation task? If so, can you re-use the data or do you need to collect additional?

Is this an academic or "real world" exercise?

Ender · 11-25-2015, 01:55 AM

To elaborate a bit more on the subject. Yes, this is not a hypothetical case but a real life experiment. The experimental design is as follows:
The price of a product is sampled in 2 regions based on its UPC and from the same chain store operating in both regions to eliminate store differences. Other product prices are sampled in a similar fashion. There are no multiple samples of the same product, meaning that for any product, there are only 2 prices.
The hypothesis is that the price differential is a function of the price of the product, store pricing policy, origin of distribution of the product, the product type (consumable or non-consumable), and the product classification (i.e., food and beverages, clothing and apparel, etc.).
I guess the simplest way would be to undertake a uni-variate analysis of each variable independently. I am wondering whether there is a better approach that is more succinct and illustrative at the same time, capturing the effects of all the variables in one go. Thanks again.

walter b · 11-25-2015, 07:00 AM

(11-25-2015 01:55 AM)Ender Wrote: The hypothesis is that the price differential is a function of the price of the product, store pricing policy, origin of distribution of the product, the product type (consumable or non-consumable), and the product classification (i.e., food and beverages, clothing and apparel, etc.).

May well be. If true, however, what shall we learn from that?

d:-?

CR Haeger · 11-25-2015, 02:42 PM

It seems to me that you may be data "poor" with only two data points per product.

At best, you may be able to run a paired T test for many different products sampled only twice - once in each of two regions. This may allow you to conclude if one region tends to have higher prices for all products, where region is the factor.

If you then want to determine correlation between product pricing (continuous Y) and several mainly discrete X factors (product, region, consumer/non consumer, pricing policy, origin, type and classification) and a continuous X (price of product) then you would need much more data than two prices per product.

Csaba Tizedes · 11-27-2015, 04:38 PM

(11-24-2015 03:29 AM)Ender Wrote: Maybe those in the know here can help me out with a statistics problem as follows:

A comparative prices study took price samples in two regions to find out whether there are any price differences in the two areas. The prices of the same items in the two regions were compared and their differences tabulated as a percentage of a benchmark price in one region. The task at hand is to determine whether there is any correlation between the size of the price differential (expressed as %) and: (i) the value of the product (expressed as $); (ii) the product category (C1, C2, C3, …); the product type (consumable or non-consumable); (iii) the location of the distributor of the product (D1, D2, D3, …); and the chain store selling the product (S1, S2, S3, …).

My question is: what is the best approach to solving this and how do I frame this as a problem statement or equation? Any pointers to the right direction is very much appreciated. Thanks and cheers.

And how you can quantify these parameters? What is the measurement of the location? Distance from manufacturer store by road? The other parameters are only categories without any quantified parameters I think...

Csaba