Nav: Back

Frequently Asked "Pointed" Questions (FAQ)

"Isn't shrinkage in regression a dead topic? Have new papers been published recently?"

My 2021 arXiv preprint and 2022 paper in Open Statistics introduced the p-parameter "Efficient GRR Shrinkage Path" which is a two-piece linear function. This Path starts at the Ordinary Least Squares [BLU] Estimate, then heads directly to the Estimate "Most Likely to be Optimally Biased" under Normal distribution theory, then heads directly for the Shrinkage Terminus (usually 0) at m = p.

Nine papers appeared in Technometrics in 2020 to celebrate the 50th anniversary of the original pair of Hoerl-Kennard papers. For example, Hastie (2020) focused on "Ridge Regularization" which "shows up in many areas of statistics and machine learning."

Zou and Hastie (2005) JRSS B introduced the "Shrinkage and Selection" family of "Elastic Net" estimators useful in designing sparse learning methods.

Technometrics published Burr and Fry (2005), an important paper giving practical advice on "conservative" limitations for the m-Extent of shrinkage.

The Least Angle Regression paper by Efron, Hastie, Johnstone and Tibshirani (2004) introduced their LAR estimator of the beta vector that ultimately becomes shorter than the OLS vector.

Frank and Friedman(1993) and Breiman(1995) express great confidence in "cross validation" methods in shrinkage estimation. Although Mallows(1995) observes that minimizing C-sub-p to pick a regressor subset can be misleading in situations that aren't "clear-cut," he apparently still recommends calculating C-sub-p while shrinking along "smooth" paths. See also: Tibshirani(1996), Fu(1997) and LeBlanc and Tibshirani (1998).

"Aren't some of the early shrinkage/ridge methods still considered rather controversial?"

In a word: Yes! But the great, subjective "passions" (both for and against ridge methods) of the 1970's are now muted ...if not forgotten.

Importantly, I don't know of any recent papers unfairly critical of shrinkage methods.

In my opinion, there are two keys to avoiding controversy. First, use "statistical thinking" to decide how much shrinkage of what type to perform. Secondly, be rather conservative rather than "greedy" in choosing your m-Extent of Shrinkage. For example, in RXshrink R-code, maximum likelihood methods under Normal distribution-theory are stressed. See the "Shrinkage" pages on this site for details and examples.

"How can I form confidence intervals for shrinkage estimates?"

A reasonable (and simple!) approach is to simply use classical confidence intervals, centered at least-squares estimates, computed using your favorite statistics package; see Obenchain(1977). In other words, even though point-estimates of effects change as shrinkage is imposed, there really is no basis in "classical" statistical theory for either shifting the location or changing the width of interval-estimates. In fact, a shrunken estimate can look quite different, numerically, from the least-squares solution without being significantly different, statistically. (Obviously, you don't want to shrink so much that your point estimate ends up OUTSIDE your reported interval!)

If you feel you ABSOLUTELY MUST have an interval centered near or at your shrunken estimate, you are going to have to use either bootstrap resampling, Vinod(1995), or Bayesian methods. Highest Posterior Density (HPD) intervals incorporate "added information" from your prior (usually centered at zero) to the information from your sample. This characterizes shrunken Bayes estimates as "unbiased" compromises between prior and sample information.

"How can a so-called OPTIMAL shrinkage estimator be inferior to a so-called GOOD shrinkage estimator?"

"Optimal" shrinkage estimators attempt to minimize a single (scalar valued) measure of overall MSE risk. "Good" shrinkage estimators are simply those that are better than Ordinary-Least-Squares (OLS) ...but they have to dominate OLS in EVERY (matrix valued) MSE sense. So good shrinkage estimators generally do much less shrinkage (are much closer to OLS, numerically) than optimal shrinkage estimators. In fact, a useful guideline is provided by the "2/p-ths rule-of-thumb," Obenchain(1978), where p=Rank(X). Namely, in terms of the MCAL measure of extent-of-shrinkage, the upper-limit on good shrinkage extents is only 2/p-ths of the extent of shrinkage most likely to the MSE optimal. For example, since p=6 for the Longley example, MCAL = 4 along the Q-shape = -1.5 path is most likely to be MSE optimal; thus good shrinkage estimates tend to be limited to MCAL of no more than 2*4/6 = 1.33 ...which is confirmed by the corresponding excess eigenvalue and inferior direction TRACES for the Q-shape = -1.5 path.

"Why not simply use either Stein-like or Minimum-Estimated-Risk Rules?

Minimax rules can tend to do very little shrinkage. Frequently, they are almost indistinguishable from the ordinary least squares solution. Minimum estimated risk rules, like those of Mallows(1973), can shrink quite aggressively. This can lead to a big reduction in MSE risk in "favorable" cases, but aggressive shrinkage can also lead to even bigger MSE risk penalties in "unfavorable" cases. Maximum likelihood approaches represent some sort of "middle ground" between these "extremes." They reduce risk by only about 50% even in the most favorable cases ...where the risk could be reduced 100% by shrinkage all of the way to ZERO. But they also tend to increase MSE risk by at most 25% when truly unfavorable cases are encountered (i.e. when shrinkage factors within the [.8, .9] range are MSE optimal.) See Gibbons(1981) and Obenchain(1996) for more on this.