The dataset originates from, a global community formed by social, public and private organizations interested in responsible reuse and recycling of computing devices.

Original dataset anonimized and exported from resellers inventory management and the resulting dataset after data cleansing and filtering process under license CC BY 4.0.

Some reports about device durability available from here under license CC BY 4.0:

1 Introduction

With more computing devices (computers, mobiles) than people on Earth and powerful manufacturing companies whose business is manufacturing, the successful implementation of the 3Rs (Reduce, Reuse, and Recycle) for electronics, and the migration towards circular ICT goods and services involves diverse stakeholders and becomes vitally important for the environmental sustainability of our planet (ITU 2020). Providing a detailed and large dataset about the computing devices beyond the first usage is the aim of this report.

The dataset originates from eReuse.org1, a global community formed by social, public and private organizations interested in responsible reuse and recycling of computing devices (Franquesa et al. 2015; Franquesa and Navarro 2018). Since 2013 eReuse collects details about the second-hand computers and their components collected, refurbished or recycled, with the purpose of keeping track of these devices and be able to analyze that data over their full lifespan until final recycling.

Devices are handled and data is extracted by different entities using eReuse software tools. They can pool together part of their extracted data according to a data sharing license.

From that data pool, it is possible to deduct detailed life-cycle data for each device across its lifespan, from the initial registration of a device until the last recording in a recycling center before being destroyed. The traceability dataset of individual devices and components allows different analysis per device and groupings. Different actors, such as buyers, consumers and device owners disposing devices, can obtain useful statistics about the real durability of the equipment, showing for instance which manufacturers make more durable and repairable devices and components. As the dataset becomes more quantitatively robust, it may be of interest to consumer watchdog and reuse and repair platforms to help raise public awareness about durability, so buyers can choose the more environmentally sustainable electronic devices, increasing the reuse and recycling ratio at the end of the devices second and third lives.

The rest of the paper is structured as follows. The next section describes the dataset: the scope, collection, anonymisation and cleansing. Section 4 explores the potential of the dataset for durability analysis. Section 5 looks at other analysis, such as exploring duplicates and distribution of features.

2 Data source

The main material of this analysis is a dataset that contains the devices processed by resellers organizations federated to eReuse device data commons license. This report is the first attempt to know the value of the data we collect.

The dataset only contains technical data about the functional status of computing devices and components its contains. It does not report on whether the devices have been reused, their total time of reuse, whether their users have any kind of socio-economic vulnerability, the price paid for the devices, or whether and if devices are in the end recycled at authorized points.

2.1 Data scope

We have limited the study to only resellers with operations in Spain that has accepted the eReuse device data commons license. The total number of resellers are 20. Data has been collected between 2013-10-08 and 2019-06-03.

2.2 Data collection

The resellers uses the Workbench software tool (link) for hardware discovery, stress test and operating system installation. The device boots from an USB drive or PXE network server, runs unattended and only requires two minutes for the basic test. The result of the hardware scan and stress test is a json file per device that is extracted and stored elsewhere: in the resellers server’s disk, in a USB drive, or uploaded to their private inventory system (DeviceHub in our case).

2.3 Data anonimization and aggregation

Once the device information is stored in our device inventory system (DeviceHub), for participants (resellers) that accept the open data license, part of this data can be exported to this dataset, through the DeviceHub API of those resellers that accepted the license. The name of the inventory is automatically anonymized with a code that represent the source (the reseller).

The Devicehub API only offers anonymized data, excluding any business sensitive or personal details. There are two modes of anonymization, the more stringent includes the anonymization of the serial numbers of the devices.

2.4 Data content

The dataset contains one record (row) per computing device with a set of features or characteristics that contain metadata (such as type, model, manufacturer, version, date, serial, address, capacity) about each part/component detected.

The dataset is available from here under license CC BY 4.0.

2.5 Data sample

In this table sample we show the first 5 columns and rows of our data set. A row is a record/observation/trial, which corresponds to the statistical unit of the dataset. In our analysis each row or observation represents a device. In the table, a column is a variable/feature, for us are features of this devices, its components or information of its origin.

This is an example of a dataset:

Only technical data columns are exported and the serial numbers are anonymized. We have a total of 192 variable/feature.

2.6 Data selection

We will focus on studying only the significant features for studying durability. Descriptive statistical data for all features is shown in Annex2.

The variables we will study are: Source, Type, Subtype, Serial.Number,, Model, Manufacturer, HardDrive.1.lifetime.hours, HardDrive.1.lifetime.years, RAM.GB, HDD.MB

2.6.1 Summary of significant features of the study

For our selected 11 features (columns) there are 8458 observations (rows) giving a 93038 total observations, there are only 3257 complete values and 11238 total of missing values.

You should immediately notice some surprises:

  1. 38.51% complete rows: This means only 38.51% of all rows are not completely missing!
  2. 12.08% missing values: Given the 38.51% complete rows, there are only 12.08% total missing values.

2.7 Data cleansing

Given an observation (a device), if it is missing the data of some of its features, for example, we do not have the model feature for it, we will say that we are missing a value for this observation. The values therefore are the cells of our table.

Statistical Description of the dataset before data cleansing process


11 Variables   5870 Observations

 Value      Computer
 Frequency      5870
 Proportion        1

lowest : Desktop Laptop MicrotowerNetbook
highest:Laptop MicrotowerNetbook SAI Server
 Value                    Desktop     Laptop Microtower    Netbook        SAI
 Frequency          52       4428        278        688        421          1
 Proportion      0.009      0.754      0.047      0.117      0.072      0.000
 Value          Server
 Frequency           2
 Proportion      0.000

lowest :00108035bff3f8f39589d7003c5fb543fa7185a2f002a38e00cd37682c75db830015a7b61d48a22fa3198a02d07f88348ac3f1afcd1e988171f9ee41ef67f77900229537a27e4e37bfc6ae16c01881d6f8fede2815aecdd068b637232a1a5db20026810c83544bb7db6c21ad2f99e9ee4127998590ab6eb3ff8b1d61d8c6df920030331f5a2fd519e23cac49852deef61a8e8f27fc21bccbf521fb2486f82295

lowest : * 000000000000000000000000266RS8 0401-CUG

lowest : 6072 acer Acer ACER
highest:Sony Corporation To Be Filled By O.E.M.Toshiba TOSHIBA Unknown
lowest :2013-10-08 07:28:34+00:00 2013-10-08 08:28:38+00:00 2013-10-08 08:50:24+00:00 2013-10-08 08:51:05+00:00 2013-10-08 08:51:47+00:00
highest:2019-05-29 16:05:18.163000+00:002019-05-31 12:22:47.314000+00:002019-05-31 15:48:20.782000+00:002019-05-31 17:11:26.760000+00:002019-05-31 19:24:35.628000+00:00

584327220.89823612202 0 0 02048409640964096
lowest : 0 192 256 512 768 , highest: 8192 10240 12288 14336 16384
        n  missing distinct     Info     Mean      Gmd      .05      .10      .25 
     5843       27       71    0.961   167048   181154        0        0        0 
      .50      .75      .90      .95 
   152627   238475   476940   476940 
lowest : 0 3 7 8 9 , highest: 629527 629567 715404 953869 953880
32652605280011703715787 0 1 555014126253683989245346
lowest : 0 1 2 3 4 , highest: 62871 62891 63941 64050 65332
lowest : 0.00 0.01 0.02 0.03 0.04 , highest: 7.15 7.18 7.30 7.31 7.46
58700200.9719.3536.92 2 2 310161818
lowest : 1 2 3 4 5 , highest: 16 17 18 19 20
 Value          1     2     3     4     5     6     7     8     9    10    11    12
 Frequency      1   615  1668    12   166     3   405    12     4    74   136   782
 Proportion 0.000 0.105 0.284 0.002 0.028 0.001 0.069 0.002 0.001 0.013 0.023 0.133
 Value         13    14    15    16    17    18    19    20
 Frequency     16   440    64   120   673   461   161    57
 Proportion 0.003 0.075 0.011 0.020 0.115 0.079 0.027 0.010

2.7.1 Data cleansing: Removing non interesting observations (rows)

From the above chart, HardDrive.1.lifetime.years and HardDrive.1.lifetime.hours is mostly missing with 5193 values. The reason because there are many missing values for HardDrive.lifetime is that we only have this value for devices of type “Computer”, so our table contains many nulls for ComputerMonitor, Peripheral and other types.

Let’s just select the “Computer” type filtering in that way non desired observations. We have reduced rows (devices) from 8458 to 5870 after removing rows diferent of type “Computer”.