Post Reply 
Logarithmic Regression: Different correlation from 3 different calculators
06-09-2015, 03:09 PM
Post: #1
Logarithmic Regression: Different correlation from 3 different calculators
I need a sanity check here. I'm running a logarithmic regression on some data, and I'm getting very slightly different correlation coefficients (r, not r^2) from three different calculators.

HP 48SX: 0.968372745387
TI-36X Pro: 0.96835943144
TI-89 Stats flash app: 0.968372745387
TI-89 custom function: 0.968372683432

Notice the 48SX and TI-89 stats app match up, so I'm inclined to believe those are the most accurate. The 36X Pro may have lower internal precision, or it's using a different faster/less accurate method to produce the result.

The custom function I made for the TI-89 (since the built in stat commands don't calculate correlation for logarithmic, exponential, or power regression for some reason) is also a little bit off. I used the formula shown about halfway down this page:

http://brownmath.com/ti83/regres89.htm

sum((x[i]-meanx)*(y[i]-meany),i,1,n)/((n-1)*sx*sy)

Where sx and sy are sample standard deviations of the x and y lists respectively. Also, the x list has been transformed with LN prior to any calculations.

I have a feeling taking the sum of products is making it lose precision somewhere. And if that's the case, is there a better approach? I tried the z-score method given on that same page, basically moving the standard deviations into the products within the sum, but I end up with a repeating decimal that looks a bit fishy.

This is the data I'm looking at. Note that a logarithmic fit is NOT correct for this particular data, I'm just testing the correlation calculation.

1999, 8456
2000, 14959
2001, 13516
2002, 11298
2003, 11109
2004, 15256
2005, 29316
2006, 46038
2007, 51726
2008, 56686
2009, 58372
2010, 68426
2011, 70760
2012, 77238
2013, 100836
2014, 95461
Visit this user's website Find all posts by this user
Quote this message in a reply
06-09-2015, 07:36 PM (This post was last modified: 06-09-2015 07:38 PM by CR Haeger.)
Post: #2
RE: Logarithmic Regression: Different correlation from 3 different calculators
I wonder if your r and r^2 values change slightly if you run the regression on normalized X, Y data?

Also, I think that calculating r for non linear regressions may require separate calculations of SSE and SST, then using r = sqrt(1 - SSE/SST) .
Find all posts by this user
Quote this message in a reply
06-09-2015, 07:45 PM (This post was last modified: 06-09-2015 07:53 PM by Dave Britten.)
Post: #3
RE: Logarithmic Regression: Different correlation from 3 different calculators
(06-09-2015 07:36 PM)CR Haeger Wrote:  I wonder if your r and r^2 values change slightly if you run the regression on normalized X, Y data?

Also, I think that calculating r for non linear regressions may require separate calculations of SSE and SST, then using r = sqrt(1 - SSE/SST) .

Yeah, I'm not enough of a statistician to know if there's a problem with my calculation method that's causing inaccuracy, or if I'm supposed to be using a different formula entirely. I know a lot of the HPs with accumulated stats just transform x and/or y on the fly as you enter them, so I'm not sure what else I'd have to do. I'll have to crack open my old stats 215 textbook later and see what it says about curve fitting.

EDIT: Running a linear regression on a list of transformed ln(x) values gives me the same value for r as doing logarithmic regression on the original x values, at least with the 36X Pro. So I assume the formula is the same, but something must be clobbering numeric accuracy along the way. Hmm...
Visit this user's website Find all posts by this user
Quote this message in a reply
06-09-2015, 08:51 PM (This post was last modified: 06-09-2015 08:53 PM by CR Haeger.)
Post: #4
RE: Logarithmic Regression: Different correlation from 3 different calculators
OK. By normalized, I meant:

x_n = [2.0*(x - x_min)/(x_max - x_min)] - 1.0 // changes x data to go from -1.0 --> +1.0

do the same with y data then run the regression.
Find all posts by this user
Quote this message in a reply
06-09-2015, 09:37 PM
Post: #5
RE: Logarithmic Regression: Different correlation from 3 different calculators
Just another data point for you - my Casio fx-9750GII gives 0.96837274.

It shows the regression results in a 10 character space, and I can't figure out how to get any more digits out of it.

Also, I'm sure you've checked this, but it took me three tries to get all the numbers entered correctly (and that's with the list display). When I had transposed the 2 and 7 in 58372, I got 0.96835999, pretty close to your TI-36X number.

And a question - I'm really weak in my stats knowledge, but is there any possible reason why more that 2 or maybe 3 significant figures for this value would matter? Of course, playing with numbers is its own fun.
Find all posts by this user
Quote this message in a reply
06-09-2015, 10:00 PM
Post: #6
RE: Logarithmic Regression: Different correlation from 3 different calculators
(06-09-2015 09:37 PM)groundbeef Wrote:  Just another data point for you - my Casio fx-9750GII gives 0.96837274.

It shows the regression results in a 10 character space, and I can't figure out how to get any more digits out of it.

Also, I'm sure you've checked this, but it took me three tries to get all the numbers entered correctly (and that's with the list display). When I had transposed the 2 and 7 in 58372, I got 0.96835999, pretty close to your TI-36X number.

And a question - I'm really weak in my stats knowledge, but is there any possible reason why more that 2 or maybe 3 significant figures for this value would matter? Of course, playing with numbers is its own fun.

Yeah, I've checked and rechecked the data several times to make sure everything is entered properly. If you want to get a few more digits out of it, this works on my fx-9860g Slim:

1. Evaluate r (from the Catalog) in Run mode.
2. Subtract the most significant digits from And and multiply by powers of 10 as needed.

With that method, the Casio gives me 0.968372745386438, even more digits than the TI or HP.

And in this particular case, you're right, I'm still getting more than enough accurate digits to draw a reasonable conclusion (i.e. that log regression isn't appropriate for this data). My fear is that the way I'm calculating correlation could have even worse accuracy with other datasets. If it were off in the hundredths place, that would be a bigger problem.
Visit this user's website Find all posts by this user
Quote this message in a reply
06-09-2015, 10:38 PM
Post: #7
RE: Logarithmic Regression: Different correlation from 3 different calculators
(06-09-2015 10:00 PM)Dave Britten Wrote:  that log regression isn't appropriate for this data

Huh? r of 0.968 means r2 of about 0.93 which is very high. What level of correlation is appropriate ????? This is about as good as most data analysts wish for.


- Pauli
Find all posts by this user
Quote this message in a reply
06-09-2015, 10:53 PM
Post: #8
RE: Logarithmic Regression: Different correlation from 3 different calculators
(06-09-2015 10:00 PM)Dave Britten Wrote:  1. Evaluate r (from the Catalog) in Run mode.
That's what I missed. I did LogReg, and it just gave me the same screen as stats mode.

(06-09-2015 10:00 PM)Dave Britten Wrote:  With that method, the Casio gives me 0.968372745386438, even more digits than the TI or HP.
Mine too.

I have a TI-84 Plus I can try for giggles.

I went through the computation in steps with excel, so I could see intermediate results. Most of the numbers involved in sums span about 3 orders of magnitude. 4 in a couple of cases. But it looks like you lose more digits than that. The spreadsheet gives 0.968372745386560.


I get .968516807361777 for a linear regression (casio). Comparing that might point toward or away from the log calculation.
Find all posts by this user
Quote this message in a reply
06-09-2015, 11:16 PM
Post: #9
RE: Logarithmic Regression: Different correlation from 3 different calculators
I tried my TI-84, and when I retrieved the r value, I realized that you can also pull out all of the sums. It's a little tedious, but if you can retrieve them from the 36X, I'll post mine for comparison.

The r value was .96837274538717, but I think we've already established the outlier.
Find all posts by this user
Quote this message in a reply
06-09-2015, 11:26 PM
Post: #10
RE: Logarithmic Regression: Different correlation from 3 different calculators
r and r^2 may not be valid for nonlinear models per Minitab
Find all posts by this user
Quote this message in a reply
06-10-2015, 12:12 AM
Post: #11
RE: Logarithmic Regression: Different correlation from 3 different calculators
(06-09-2015 10:38 PM)Paul Dale Wrote:  
(06-09-2015 10:00 PM)Dave Britten Wrote:  that log regression isn't appropriate for this data

Huh? r of 0.968 means r2 of about 0.93 which is very high. What level of correlation is appropriate ????? This is about as good as most data analysts wish for.


- Pauli

Oh, I was mostly referring to the fact that you get a TINY bit higher correlation with plain linear regression. This is actually some data from our ERP system showing growth of one of the larger tables in rows by year.

(06-09-2015 11:26 PM)CR Haeger Wrote:  r and r^2 may not be valid for nonlinear models per Minitab

Interesting, I'll read that over. Maybe coefficient of determination would be a better choice.
Visit this user's website Find all posts by this user
Quote this message in a reply
06-10-2015, 03:28 PM (This post was last modified: 06-10-2015 06:30 PM by CR Haeger.)
Post: #12
RE: Logarithmic Regression: Different correlation from 3 different calculators
Thanks - this post made me dig into Minitab and TI36X Pro a bit deeper.

Minitab blog suggests that the standard error, S be used in place of r or r^2 for nonlinear regressions. I believe the formula for S is

S = √(Σ(Y-Y')^2/(n-2)) where n is number of response Ys and 2 comes from 2 coefficients being "consumed" in the a +b*ln(x) regression. So for your data n-2 = 14.

Inputting your data into Minitab resulted in S of 8144 (Y units).

------------

TI36X Pro does not seem to offer S up directly but here is a workaround.
- Enter and regress the x, y data (in L1, L2) using LNReg
- Make sure to store the regression equation into f(x) using RegEQ-->f(x): YES
- 2nd quit to Home screen
- In data table, add this formula to L3 column: L3=abs(L2-f(L1)) which is absolute residuals. Even after this formula entered, always visit data to refresh L3 after running a regression.
- 2nd quit to Home screen
- Compute 1-variable stats on L3. Scroll down to 6:Σ(x^2) = 928584397 which I believe is SSE. MSE would be SSE/14 in this case
- Press enter to put Σx^2 onto the home screen
- You can build √(Σx^2/14) from this which gives 8144.2

I usually use L3 if I want to compute residuals on any regression.
Find all posts by this user
Quote this message in a reply
Post Reply 




User(s) browsing this thread: 1 Guest(s)