Machine learning and repair

neil · 25 September 2019 09:28

We were fortunate to get free tickets and have @Monique and @Elena heading to the MCubed conference on machine learning next week in London.

https://www.mcubed.london/

Has anyone been working on the use of machine learning or artificial intelligence as it relates to repair data?

For example, at TU Leuven they’ve been looking at the use of image analysis to quickly determine brand and model data from photographs of a device, and how this could link to quick access to repair/service information. @Wouter_Sterkens

Any other activities around this? Let’s get the conversation going and share ideas!

Janet · 25 September 2019 10:05

Not really sure how much of these require machine learning or could be achieved more easily via other means

Fault categorisation / failure modes

Can ML can help us keep a better “state of the art” fault categorisation for key product categories… Like what we are working on for laptops, what we did at our two data dives this year (latest: Fixfest).

Impact of Right to Repair

From a policy perspective, we’d be interested in using ML to predict or model what impact right to repair policy agenda would have on repair outcomes

easy access to affordable, quality spare parts
easy access documentation and diagnostics
design for disassembly

Live prediction or clues for fixers

I’m also wondering to what extent it could help live at events with predicting the origin of a fault (and possible solution) once we have data on category/make/model? @philip @Panda your thoughts would be helpful here.

Insights into the changing profile of equipment brought to events

Regulators would probably like to know what we are seeing brought to our events, and how this compares to figures on Electronic and Electrical Equipment (EEE) sales. I would be interested in insights into how this is changing over time, and maybe being alerted to extraordinary growth in a certain product category. Some kind of sensing or predictive capacity here would be interesting too.

Panda · 25 September 2019 10:29

I’m concerned by the GIGO principle. I find it more important and urgent to improve the quality of the data we capture, as we identified at the data analysis event at Newspeak House. At the moment any analysis will have to be at a very large-grain level.

Updated to clarify I do not mean that the data captured is not correct, but that it is not precise and specific.

Janet · 25 September 2019 10:32

Totally agree - I guess I was blue-sky thinking, horizon scanning…

And the only work that is already underway is related to data quality

TU Leuven they’ve been looking at the use of image analysis to quickly determine brand and model data from photographs of a device

philip · 25 September 2019 10:55

I really wouldn’t want to pour cold water on this idea, but from what (little) I know, ML works best with very large datasets, like millions of data points for training. Our dataset is by no means small, but the data on kettles, for example, is unlikely to relate well to that on smartphones, and hence the dataset on each individual class of item is not that big. Some of the successful applications I’m aware of are in machine translation and speech recognition (Google has maybe petabytes of textual data in different languages to train on), medical diagnosis (virtually an entire national population’s data might be available) and malware analysis (again, huge datasets of both innocuous and malicious samples are availble).

As for live prediction of fixing clues then it might even be unhelpful. If the fault is at all obscure then what you need most is an open mind and good deductive powers. Not to say that hints or previous experirence aren’t useful, but an inscrutable ML could lead to tunnel vision.

Insights into the changing profile of failures may be more promising, but this needs skillful application of statistical techniques rather than ML.

For all that, great that we’ve got the free tickets. @Monique and @Elena will certainly come back with a much better appreciation of where ML can be usefully applied, and if any of that is actionable I guess it might be in an area none of us will have thought of.

neil · 25 September 2019 12:59

Good discussion! I’m far from an expert in any of this so I’m keen to explore blue-sky thinking of what we’d love to achieve if possible, then some more dark sky thinking from those in the know to think about what’s practical.

I don’t think we need to assume that our dataset will be the training set. In the case of Leuven, for example, I’m not sure what the system has been trained on (@Wouter_Sterkens ?), but it isn’t our data. And if it’s sufficiently robust it could in fact be a way to enhance data quality at the point of capture with less demand on volunteers. So there may be other ‘out of the box’ solutions that can help us grow and improve our own dataset.

We’ve also dabbled with text analysis for identifying recurring themes in the problem text that is recorded in data. e.g. @Lewis_Crouch used R for this on Open Data Day (whether it was ML or statstical modelling I’ll leave someone else to explain, I’m not well up enough on it to always know the difference!) This could be an interesting avenue to explore for generating a ‘starter list’ of recurrent fault types we’ve seen for different categories. (We’re definitely hindered here by insufficient data quality though.

@Steve_Cook looked at a classification web service before to explore automatically reclassifying each record based on existing data (category, brand/model and comments) - A journey into our repair data

Absolutely

neil · 25 September 2019 13:14

Could well be an interesting session to attend…: Artificial Stupidity: getting state of the art results with less than ideal data sets, classifiers, or timescale

Janet · 25 September 2019 13:35

Just to share that we have 30,000 records, currently many clustered on key product categories. This represents the efforts of a handful of super-involved groups in Holland, Germany and the UK. If our data collection, or Repair Cafe Foundation or Anstiftung data collection were to scale, within 10 years, we could be looking at a pretty large dataset and this is why we’d like to think about this now rather than later…

Monique · 25 September 2019 16:03

I’ve been following ML stories for some time, even so far as to do a bit of Coursera study. It’s looking a lot like Blockchain in being over-hyped and yet to prove very useful, not to mention it’s carbon footprint. I don’t see it going away though and it will probably get more efficient, useful and hopefully greener over time.

I’m with Panda in giving priority to the interface between humans and data entry. The most important part of ML is having great training data. We can certainly concentrate on that for now while keeping an eye on developments in modelling etc. for later.

My expectations for this event have been managed by my experience at the Google Dev Conference earlier this year where half a dozen AI specialists from Google and Imperial College spent 40 minutes trying to connect a laptop to the projector.

Anyway, we got free tickets and they usually have nice biscuits!

neil · 26 September 2019 08:06

@Data we’ve ended up with one more free ticket for MCubed, if anyone is interested to attend along with Monique and Elena, please send me a message by the end of today I think it’s fine if you can only make one of the days, too.

Wouter_Sterkens · 26 September 2019 11:16

Hi everybody. Sorry for my late response:

At KU Leuven they’ve been looking at the use of image analysis to quickly determine brand and model data from photographs of a device, and how this could link to quick access to repair/service information

So together with dekringwinkels (stores that repairs and sells second-hand goods), Kunlabora (local software development company), Maakbaar Leuven (repair café host in Leuven) and others we are working a proof-of-concept app. It’s a Flemish govornment funded OVAM project (sorry it’s written in Dutch):

The app allows you to capture an image of the label of a product or manually enter a brand/model/ean-code. If you capture an image, an attempt is made to detect text and the barcode on the label. This information is compared with a database to find a matching product. If no match is found, a new instance of a product is created. Logs of recurring problems with useful information such as a link to an Ifixit rapair can then be added to this product.

Kunlabora has written two blogs on the project with more general information (draft) and more app development information.

Machine learning
For now, ML is only used to detect text on the labels. For this specific goal, existing networks trained on large datasets are available of the shelf. No ML is currently used to for example compare our detected text with the database but we would like to work on this in the future .

Janet · 26 September 2019 11:37

@monique did such a good job of selling the conference! Get in there for the biscuits and the AI IT fails!

Janet · 26 September 2019 11:40

This could be helpful at registration, for sure. Many questions about how it could fit in a larger repair data workflow but sounds of interest. So it’s basically like one of those plant recognition apps, or Shazam but for broken stuff?

Ian_Barnard · 26 September 2019 13:33

Or useful for the owner submitting a picture of the label in advance of an event, so we can prepare a little, or perhaps ask more questions before the event. Might even be able to send them an email with relevant articles…

Ian_Barnard · 26 September 2019 14:06

How automatable is the diagnose+fix process? Not really, but what might help is access to condensed info about previous successful and doomed experiences at each stage - If the headline steps are “Has it got power?”->Yes->“Is something externally accessible preventing function?”->No-> “How do I open the pesky thing?”->Opened->“Is something internal preventing function?”->… how much of that could be accelerated by extracting information from worldwide repair efforts then that sounds helpful. What I mean by doomed experiences is where for example there isn’t any point trying to open something because it isn’t possible without making it even more broken. Or successful experiences would be like when I was trying to change the battery on my toothbrush it would have really helped to get the tip about dipping the compression-fitted plastic parts in hot water to soften them - but I didn’t and it took 30 minutes and the plastic parts have the levering marks to show for it.

That all needs more detail (and fixer time) in collecting the repair data, doesn’t it? And then the analysis.

Elena · 30 September 2019 08:25

Looking at previous experiences seems useful, though it’s much easier to address through changes to the data collection form than through ML or text analytics.

I’ve created this dashboard at the spring data dive which may help narrow down specific types of fault by product category, brand and fault keywords. If you click on various things in charts, it will filter comments at the bottom.

username: elena@d8a.solutions
password: Z00Bl6xzkKl

https://restart-1-6927513437-eu-west-1.k4s.bonsaiapps.net/app/kibana#/dashboard/7a7e1ab0-e359-11e9-abb6-69ed34c6cb94?_g=(refreshInterval%3A(pause%3A!t%2Cvalue%3A0)%2Ctime%3A(from%3Anow-7y%2Cto%3Anow))

Elena · 30 September 2019 08:30

(pls let me know if the login doesn’t work for you… I don’t want to folow in the footsteps of the 40 google devs! he-he)

I think we don’t need to worry at this stage what is or isn’t suitable for ML. What would be most useful is to create a list of ‘pain points’, ideas and wishes and work out which of these would add most value to Restart / volunteers / public. Then we can see how best to address the top ones and it may or may not involve ML. I’m happy to volunteer data entry of CO2e fby product, or example…

Elena

Elena · 30 September 2019 11:09

Username and password could be these:
p7gnnws2pv
ivrn3xpt3f

Ian_Barnard · 30 September 2019 21:08

I agree that capturing better fixer data can only help, aiming to provde something helpful for the next time someone (me!) encounters something similar, e.g. the distilled and ranked experiences of previous fixers? How do we distil that? IMO that’s where ML could capture perhaps sentiment from what previous fixers wrote into information which might be helpful for a fixer in the hot seat facing a borken thing and its owner - at least it might say “of the 13 people who tried to fix a similar WHATEVER-IT-IS these are what they said: (and links in most-positive-sentiment-first order with a red-highlighted ‘but be aware this problem/outcome’)” - doesn’t that seem possible?

Ian_Barnard · 30 September 2019 21:12

“Not Acciptible” - none of the id/password seem to work