The dataset originates from eReuse.org, a global community formed by social, public and private organizations interested in responsible reuse and recycling of computing devices.
Original dataset anonimized and exported from resellers inventory management and the resulting dataset after data cleansing and filtering process under license CC BY 4.0.
Some reports about device durability available from here under license CC BY 4.0:
With more computing devices (computers, mobiles) than people on Earth and powerful manufacturing companies whose business is manufacturing, the successful implementation of the 3Rs (Reduce, Reuse, and Recycle) for electronics, and the migration towards circular ICT goods and services involves diverse stakeholders and becomes vitally important for the environmental sustainability of our planet (ITU 2020). Providing a detailed and large dataset about the computing devices beyond the first usage is the aim of this report.
The dataset originates from eReuse.org1, a global community formed by social, public and private organizations interested in responsible reuse and recycling of computing devices (Franquesa et al. 2015; Franquesa and Navarro 2018). Since 2013 eReuse collects details about the second-hand computers and their components collected, refurbished or recycled, with the purpose of keeping track of these devices and be able to analyze that data over their full lifespan until final recycling.
Devices are handled and data is extracted by different entities using eReuse software tools. They can pool together part of their extracted data according to a data sharing license.
From that data pool, it is possible to deduct detailed life-cycle data for each device across its lifespan, from the initial registration of a device until the last recording in a recycling center before being destroyed. The traceability dataset of individual devices and components allows different analysis per device and groupings. Different actors, such as buyers, consumers and device owners disposing devices, can obtain useful statistics about the real durability of the equipment, showing for instance which manufacturers make more durable and repairable devices and components. As the dataset becomes more quantitatively robust, it may be of interest to consumer watchdog and reuse and repair platforms to help raise public awareness about durability, so buyers can choose the more environmentally sustainable electronic devices, increasing the reuse and recycling ratio at the end of the devices second and third lives.
The rest of the paper is structured as follows. The next section describes the dataset: the scope, collection, anonymisation and cleansing. Section 4 explores the potential of the dataset for durability analysis. Section 5 looks at other analysis, such as exploring duplicates and distribution of features.
The main material of this analysis is a dataset that contains the devices processed by resellers organizations federated to eReuse device data commons license. This report is the first attempt to know the value of the data we collect.
The dataset only contains technical data about the functional status of computing devices and components its contains. It does not report on whether the devices have been reused, their total time of reuse, whether their users have any kind of socio-economic vulnerability, the price paid for the devices, or whether and if devices are in the end recycled at authorized points.
We have limited the study to only resellers with operations in Spain that has accepted the eReuse device data commons license. The total number of resellers are 20. Data has been collected between 2013-10-08 and 2019-06-03.
The resellers uses the Workbench software tool (link) for hardware discovery, stress test and operating system installation. The device boots from an USB drive or PXE network server, runs unattended and only requires two minutes for the basic test. The result of the hardware scan and stress test is a json file per device that is extracted and stored elsewhere: in the resellers server’s disk, in a USB drive, or uploaded to their private inventory system (DeviceHub in our case).
Once the device information is stored in our device inventory system (DeviceHub), for participants (resellers) that accept the open data license, part of this data can be exported to this dataset, through the DeviceHub API of those resellers that accepted the license. The name of the inventory is automatically anonymized with a code that represent the source (the reseller).
The Devicehub API only offers anonymized data, excluding any business sensitive or personal details. There are two modes of anonymization, the more stringent includes the anonymization of the serial numbers of the devices.
The dataset contains one record (row) per computing device with a set of features or characteristics that contain metadata (such as type, model, manufacturer, version, date, serial, address, capacity) about each part/component detected.
The dataset is available from here under license CC BY 4.0.
In this table sample we show the first 5 columns and rows of our data set. A row is a record/observation/trial, which corresponds to the statistical unit of the dataset. In our analysis each row or observation represents a device. In the table, a column is a variable/feature, for us are features of this devices, its components or information of its origin.
This is an example of a dataset:
Only technical data columns are exported and the serial numbers are anonymized. We have a total of 192 variable/feature.
We will focus on studying only the significant features for studying durability. Descriptive statistical data for all features is shown in Annex2.
The variables we will study are: Source, Type, Subtype, Serial.Number, Registered.in, Model, Manufacturer, HardDrive.1.lifetime.hours, HardDrive.1.lifetime.years, RAM.GB, HDD.MB
For our selected 11 features (columns) there are 8458 observations (rows) giving a 93038 total observations, there are only 3257 complete values and 11238 total of missing values.
You should immediately notice some surprises:
Given an observation (a device), if it is missing the data of some of its features, for example, we do not have the model feature for it, we will say that we are missing a value for this observation. The values therefore are the cells of our table.
Statistical Description of the dataset before data cleansing process
11 Variables 5870 Observations
Value Computer Frequency 5870 Proportion 1
Value Desktop Laptop Microtower Netbook SAI Frequency 52 4428 278 688 421 1 Proportion 0.009 0.754 0.047 0.117 0.072 0.000 Value Server Frequency 2 Proportion 0.000
|highest:||Sony Corporation||To Be Filled By O.E.M.||Toshiba||TOSHIBA||Unknown|
|lowest :||2013-10-08 07:28:34+00:00||2013-10-08 08:28:38+00:00||2013-10-08 08:50:24+00:00||2013-10-08 08:51:05+00:00||2013-10-08 08:51:47+00:00|
|highest:||2019-05-29 16:05:18.163000+00:00||2019-05-31 12:22:47.314000+00:00||2019-05-31 15:48:20.782000+00:00||2019-05-31 17:11:26.760000+00:00||2019-05-31 19:24:35.628000+00:00|
n missing distinct Info Mean Gmd .05 .10 .25 5843 27 71 0.961 167048 181154 0 0 0 .50 .75 .90 .95 152627 238475 476940 476940lowest : 0 3 7 8 9 , highest: 629527 629567 715404 953869 953880
Value 1 2 3 4 5 6 7 8 9 10 11 12 Frequency 1 615 1668 12 166 3 405 12 4 74 136 782 Proportion 0.000 0.105 0.284 0.002 0.028 0.001 0.069 0.002 0.001 0.013 0.023 0.133 Value 13 14 15 16 17 18 19 20 Frequency 16 440 64 120 673 461 161 57 Proportion 0.003 0.075 0.011 0.020 0.115 0.079 0.027 0.010
From the above chart, HardDrive.1.lifetime.years and HardDrive.1.lifetime.hours is mostly missing with 5193 values. The reason because there are many missing values for HardDrive.lifetime is that we only have this value for devices of type “Computer”, so our table contains many nulls for ComputerMonitor, Peripheral and other types.
Let’s just select the “Computer” type filtering in that way non desired observations. We have reduced rows (devices) from 8458 to 5870 after removing rows diferent of type “Computer”.