- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- BIRS Workshop Lecture Videos /
- A Split-and-Conquer Approach for Analysis of Extraordinarily...
Open Collections
BIRS Workshop Lecture Videos
Featured Collection
BIRS Workshop Lecture Videos
A Split-and-Conquer Approach for Analysis of Extraordinarily Large Data Xie, Min-ge 2014-02-10
mp4
Page Metadata
Item Metadata
Title | A Split-and-Conquer Approach for Analysis of Extraordinarily Large Data |
Creator |
Xie, Min-ge |
Publisher | Banff International Research Station for Mathematical Innovation and Discovery |
Date Issued | 2014-02-10 |
Description | If there are extraordinarily large data, too large to fit into a single computer or too expensive to perform a computationally intensive data analysis, what should we do? To deal with this problem, we propose in this paper a “split-and-conquer'' approach and illustrate it using several computationally intensive penalized regression methods, along with a theoretical support. Consider a regression setting of generalized linear models with n observations and p covariates, in which n is extraordinarily large and p is either bounded or goes to ∞ at a certain rate of n. We propose to randomly split the data of size n into K subsets of size O(n/K). For each subset of data, we perform a penalized regression analysis and the results from each of the K subsets are then combined to obtain an overall result. We show that under mild conditions the combined overall result still retains desired properties of many commonly used penalized estimators, such as the model selection consistency and asymptotic normality. When K is well controlled, we also show that the combined result is asymptotically equivalent to the result of analyzing the entire data all at once (assuming that there is a super computer that could carry out such an analysis). In addition, when a computational intensive algorithm is used in the sense that its computing expense is at the order of O(na pb), a > 1 and b ≥0, we show that the split-and-conquer approach can substantially reduce computing time and computer memory requirement. Furthermore, we demonstrate that the approach has an inherent advantage of being more resistant to false model selections caused by spurious correlations. Similar to what reported in the literature, we can establish an upper bound for the expected number of falsely selected variables and a lower bound for the expected number for truly selected variables. The proposed methodology is illustrated numerically using both simulation and real data examples. |
Extent | 40 minutes |
Subject |
Mathematics Statistics Biology and other natural sciences Applied statistics |
Type |
Moving Image |
FileFormat | video/mp4 |
Language | eng |
Notes | Author affiliation: Rutgers University |
Series |
BIRS Workshop Lecture Videos (Banff, Alta) |
Date Available | 2014-08-07 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivs 2.5 Canada |
DOI | 10.14288/1.0043878 |
URI | http://hdl.handle.net/2429/49832 |
Affiliation |
Non UBC |
Peer Review Status | Unreviewed |
Scholarly Level | Faculty |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/2.5/ca/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 48630-201402101451-Xie_lrv.mp4 [ 81.08MB ]
- Metadata
- JSON: 48630-1.0043878.json
- JSON-LD: 48630-1.0043878-ld.json
- RDF/XML (Pretty): 48630-1.0043878-rdf.xml
- RDF/JSON: 48630-1.0043878-rdf.json
- Turtle: 48630-1.0043878-turtle.txt
- N-Triples: 48630-1.0043878-rdf-ntriples.txt
- Original Record: 48630-1.0043878-source.json
- Citation
- 48630-1.0043878.ris
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
data-media="{[{embed.selectedMedia}]}"
async >
</script>
</div>

https://iiif.library.ubc.ca/presentation/dsp.48630.1-0043878/manifest