POSTS
Gawler Mines and Deposits
This work is part of a submission to Unearthed’s ExploreSA Gawler Challenge. The source code is posted at this link: Gawler Public Repository
The ExploreSA Gawler Challenge is a contest to predict areas of potential mineralization in the Gawler region of South Australia. In this post I will lay some more of the groundwork for a computational approach to this mineral favorability mapping challenge. The SARIG data package contains a whole series of CSVs, sarig_md_*_exp.csv
, that hold details of occurrences and deposits found in the area. This will be a very important dataset for any team intending to use a supervised learning approach, as these could serve as labels. Some work has also been done on clustering.
Mines and Mineral Occurrences
A similar dataset (with a slightly different format) may be downloaded from the SARIG mapping application. The datasets hosted on the mapping application come in a Shapefile format, which is a little more convenient as the Shapefile format has better support for geospatial applications. In practice I have found that it’s time consuming to correctly format the coordinate columns, select the correct coordinate reference system, and then convert the tabular data to a geospatial format. With Shapefiles, all of that work is already done and opening the shapefile is a Geopandas one-liner.
gawler_mines = gpd.read_file('E:/Gawler/mines_and_mineral_occurrences_all_shp/mines_and_mineral_occurrences_all.shp')
And the Geopandas geodataframe
has the same methods and attributes as a Pandas dataframe, with some added geospatial functionality, so all of the usual methods for data exploration may still be used:
>>> gawler_mines.COMMOD_MAJ.value_counts()
Cu 1422
Au 1280
SAND 559
CALI 452
LMST 405
...
TRN 1
TRQ 1
PEGM 1
VERM 1
Ra 1
Name: COMMOD_MAJ, Length: 111, dtype: int64
>>> gawler_mines.columns
Index(['MINDEP_NO', 'DEP_NAME', 'REFERENCE', 'COMM_CODE', 'COMMODS',
'COMMOD_MAJ', 'COMM_SPECS', 'GCHEM_ASSC', 'DISC_YEAR', 'CLASS_CODE',
'OPER_TYPE', 'MAP_SYMB', 'STATUS_VAL', 'SIZE_VAL', 'GEOL_PROV',
'DB_RES_RVE', 'DB_PROD', 'DB_DOC_IMG', 'DB_EXV_IMG', 'DB_DEP_IMG',
'DB_DEP_FLE', 'COX_CLASS', 'REG_O_CTRL', 'LOC_O_CTRL', 'LOC_O_COM',
'O_LITH_CDE', 'O_LITH01', 'O_STRAT_NM', 'H_LITH_CDE', 'H_LITH02',
'H_STRAT_NM', 'H_MAP_SYMB', 'EASTING', 'NORTHING', 'ZONE', 'LONGITUDE',
'LATITUDE', 'SVY_METHOD', 'HORZ_ACC', 'SRCE_MAP', 'SRCE_CNTRE',
'COMMENTS', 'O_MAP_SYMB', 'geometry'],
dtype='object')
Now we have a sense of the commodity types contained in the dataset, and we can start digging into the column abbreviations.
The value_counts
method can tell us more about the levels that these features take.
>>> gawler_mines.CLASS_CODE.value_counts()
OCCURRENCE 7249
DEPOSIT 914
PROSPECT 488
TREATMENT SITE 26
Name: CLASS_CODE, dtype: int64
>>> gawler_mines.SIZE_VAL.value_counts()
Low Significance 7819
Locally Significant 794
Significant to SA 43
Significant to Australia 9
World-wide Significance 2
Name: SIZE_VAL, dtype: int64
It is immediately clear from these frequencies that significant mines are vastly outnumbered by low-significance occurrences. Visual Capitalist has a great explanation of the typical trajectory of a greenfield exploration project that really drives this fact home: The odds are 1:1000 that a greenfield exploration target ever becomes a profitable mine, and 1:3333 that a greenfield mineral target ever becomes a “world-class” mine. Specifically for gold, only 10% of deposits contain enough gold to justify development. Various mining companies report similar numbers in their literature.
Major Mines and Development Properties
The commodities list also included a number of industrial minerals, so before we can move forward with visualization and analysis we should filter out those and look exclusively at metals:
metals = ('Cu', 'Au', 'Fe', 'Ag', 'Pb', 'Zn', 'Co', 'Ni', 'Cr', 'Mn', 'Ti', 'V', 'PGE', 'Mo', 'W', 'Sn', 'REE', 'U')
gawler_mines_metal = gawler_mines[gawler_mines["COMMOD_MAJ"].isin(metals)]
folium
is a really handy visualization tool for Python that writes leaflet.js
maps from Python geospatial data formats. The following code will map our selected deposits:
gawler_coords = [-32,135]
map = folium.Map(tiles='https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{z}/{y}/{x}',
width = 640, height=480, attr="<a href=https://opentopomap.org/>OpenTopoMap</a>",
location=gawler_coords, zoom_start= 6)
deposits = gawler_mines_metal[gawler_mines_metal.CLASS_CODE == 'DEPOSIT']
mc = folium.plugins.MarkerCluster()
for index, row in deposits.iterrows():
mc.add_child(folium.Marker([row['LATITUDE'], row['LONGITUDE']],
popup=[row.DEP_NAME, row.COMM_CODE, row.GCHEM_ASSC], radius=1))
map.add_child(mc)
map
The map gives us a sense of the distribution of mines in South Australia (click on the markers to see the mineral commodity):
Now a bit of an aside, with a perspective from the energy industry:
Lessons from Oil and Gas
Risk mapping has been studied extensively in petroleum exploration. We can benefit from the decades of experience in that industry if we consider some of their tools and techniques.
The “play based exploration” paradigm is a very useful way of viewing mineral prospects. The petroleum “play” is the set of elements and conditions that must exist for a specific kind of occurrence to develop in the subsurface: the play elements are necessary, but not necessarily sufficient. It is a powerful way of looking at a variety of subsurface phenomena, and there has been recent enthusiasm for PBE in mineral exploration, geothermal, and even space mining (!)
One important idea often borrowed from PBE is the concept of the “play fairway”: the fraction of the “area of interest” that has high favorability for mineral occurrence. Drilling on the play fairway is a low-risk proposition compared to drilling off the fairway, where the proper geological conditions are not as likely to exist.
Harbaugh’s Computing Risk for Oil Prospects: Principles and Programs has a nice passage on the advantages of this perspective:
… we dealt with estimates of probabilities of field discoveries in different size categories provided that a field is discovered. These estimates tell us nothing, however, about dry hole probabilities.
Classifying a hole as unmineralized or mineralized based on the collar location, and then modeling the extent or magnitude of the discovered mineralization, are two related but entirely different tasks. Considering them at the same time complicates the problem considerably, as unmineralized holes (the dry holes of the mining world) destroy the spatial structure of the data and wreak havoc with the grade distributions. In addition, the “large” or “world-class” mines sample size is very small and won’t make for an easy supervised learning problem.
This way, we can model the “unmineralized hole risk” as a binary classification task and worry about regression later. It definitely has to be part of the model (even worthless mineral properties can generate some very impressive assays) but there is a great advantage in leaving it for later.
This is a very important perspective as we move closer to the modeling stage.
Spatial Clustering
It is a well established fact that mineral deposits tend to cluster in a spatial sense. This occurs for a variety of reasons: the same conditions that created one deposit occurred nearby and created an identical deposit, or associated mineralization of a different character, or modeling may have underestimated the scope of the mineralized area, or subsequent faulting might have displaced part of the deposit. There are clearly a number of mechanisms, and it has been observed that spatial proximity to known mineralization is a pretty good predictor of mineralization.
There is likely bias to the numbers on clustering, as there are some strong business incentives for mining companies to explore near existing deposits. Mining companies are very risk averse (majors especially) and they like to minimize their “unmineralized hole” risk. They tend to drill nearby to expand their reserves with associated mineralization in well-understood geology. They also love an opportunity to reuse infrastructure and permits, and benefit from established social license and goodwill from neighbors, and an existing workforce, all of which are found close to a producing mine.
(People have also done very interesting work clustering on the properties of the mineral deposit, (see this article as well) but it’s more directly useful for supervised learning to focus on spatial clustering)
Voronoi Diagrams
Voronoi diagrams partition the plane into regions assigned to the closest point, so in this case each point in the plane is “assigned” to the nearest deposit. The following code will produce a Voronoi diagram from the deposits:
points = np.vstack((deposits.geometry.x.values, deposits.geometry.y.values)).T
vor = Voronoi(points)
voronoi_plot_2d(vor)
The resulting Voronoi diagram:
There are very dense areas of clustered occurrences (which we also saw in the map) that clutter the diagram and make interpretation difficult. I have left the code in the notebook but I believe there are better alternatives.
KDTrees
With a small number of samples, we can get away with a brute force search for the nearest neighbors of any point in the domain. As soon as we consider a large sample set or a large number of inputs, we should consider a specialized data structure to avoid performance bottlenecks.
A couple tutorials on KDTrees:
The kdtree is a very useful data structure for spatial data, as we can quickly and easily retrieve the n nearest neighbors of any point in the domain, or all neighbors within some distance d. The following code will build a kdtree storing the metallic deposits for easy lookup:
from scipy.spatial import cKDTree
deposit_coords = np.vstack((deposits.geometry.x.values, deposits.geometry.y.values)).T
deposit_tree = cKDTree(deposit_coords)
We can test the kdtree with the following code, querying a known point to confirm the code is working:
deposit_tree.query(deposit_coords[0], 3)
>> (array([0. , 0.03679629, 0.04744327]), array([ 0, 12, 129]))
deposit_tree.query_ball_point(deposit_coords[0], 0.5)
>> [125, 169, 149, 12, 0, 156, 129, 141, 1, 79, 78]
We put in the coordinates of the first deposit, and we get it back, along with the 3 closest points, and then all points within 0.5 units of distance, almost instantaneously.
KDTrees are mostly advantageous for computation and are not readily interpretable by humans. Ultimately, the most useful product for this dataset (other than the mineralization labels) might be a separate map for each commodity showing the distance to nearest occurrence for that specific commodity.
Sources and additional reading:
Here are some of the best resources I have found on this problem:
- Application of Machine Learning Algorithms to Mineral Prospectivity Mapping by Justin Granek
- Computer Programs for Mineral Exploration
- Artificial neural networks: A new method for mineral prospectivity mapping
- Porphyry Copper Potential of the United States Southern Basin and Range Using ASTER Data Integrated with Geochemical and Geologic Datasets to Assess Potential Near-Surface Deposits in Well-Explored Permissive Tracts
- PROSPECTIVITY ANALYSIS OF GOLD USING REGIONAL GEOPHYSICAL AND GEOCHEMICAL DATA FROM THE CENTRAL LAPLAND GREENSTONE BELT, FINLAND
- Orogenic gold prospectivity mapping using machine learning
- Applying spatial prospectivity mapping to exploration targeting: Fundamental practical issues and suggested solutions for the future
- Mineral potential mapping with mathematical geological models
- Weights of Evidence Modeling of Mineral Potential: A Case Study Using Small Number of Prospects, Abra, Philippines