Machine learning and repair

Hmm… They are my credentials. I wonder if the hosting service is blocking you because it thinks your access pattern is different. Can you try in incognito browser mode?

Hello, I’ve been keeping a little eye on the ML for repair research. Here’s an article that describes the possibilities. In my experience these systems are only being used in large scale equipment or infrastructure deployment. Continuous monitoring data (e.g. from vibration sensors) is analysed by AI / ML systems to determine at what life stage the equipment is at. As these types of systems usually involve some sort of predictive maintenance already (based on estimated time to failure rates) AI / ML systems deliver a greater degree of accuracy, therefore in theory saving money as redundancy is reduced. I’ve met other academics working in this area in the mining industry where the equipment is really big and difficult to reach.

It’s intriguing to think about how these systems might apply to everyday objects. In theory this is all possible in the amazing IoT revolution (along with everything else). But at the moment IoT don’t seem to last long enough to truly wear out, and bricking due to manufacturer action (or inaction) is still much more likely. It’s exciting to read about the “Shazam for products” you’re developing @Wouter_Sterkens.

2 Likes

I like a lot the idea of “Shazam for broken stuff” Thanks Janet this will definitely help to explain the RepairApp we envisage and that we are developing in the framework of the SmartRe project (only in Dutch https://vlaanderen-circulair.be/nl/doeners-in-vlaanderen/detail/smart-re ) in our research group (https://www.mech.kuleuven.be/en/research/LCE). In first instance we use AI to extract information from an image to be able to identify the product model (similar to how Shazam extracts information out of a noisy recording). If we think to know the product model we can search with this info for a best match in databases of past experiences. The next step I see as most valuable is to be able to not only find past experiences for exactly the same product, but for similar products, e.g. from the same series, as more information will be available at the level of a product series. Within the Interreg SHAREPAIR project we plan to continue working on this!

6 Likes

this is clearly far away from being a real application, but i love the idea, & for professional repairers of phones etc it’d be highly relevant & possibly cost saving.

But as our repair cafe is only 3 events old & yet already i can see a blockage in our system to get repairs assessed & dealt with efficiently & quickly, having already turned some electrical repairs away.

However as some of the participants want ‘‘an experience’’ on the day of the cafe, so time is not their greatest worry but getting it fixed certainly is.

Just a comment for others to mull over

BW Steve

1 Like

Dear All,
I am new to the question, but deeeply involved in datascience,
I have spent some few hours on the dataset to compute the embryon of a usefull too to motivate people to go for a repair: an app that answers the questions “can my coffee machine be repaired?”,

see the full article here: https://www.linkedin.com/pulse/my-coffee-machine-dead-exploring-reparability-home-goods-jean-milpied/

with best regards

Jean

2 Likes

Thanks @JEAN_MILPIED - I’ll link to your post on the topic for more detailed discussion: Can you predict reparability?

Hello, I’ve been keeping a touch eye on the Predictive maintenance Machine Learning. Here’s a piece of writing 5 that describes the chances . In my experience these systems are only getting used in large scale equipment or infrastructure deployment. Continuous monitoring data (e.g. from vibration sensors) is analysed by AI / ML systems to work out at what life stage the equipment is at. As these sorts of systems usually involve some kind of predictive maintenance already (based on estimated time to failure rates) AI / ML systems deliver a greater degree of accuracy, therefore in theory saving money as redundancy is reduced. I’ve met other academics working during this area within the mining industry where the equipment is basically big and difficult to succeed in .

3 Likes

For anyone who might be interested in having a stab at predicting fault types using ORDS data and the results of the Quests that have been held over the past few years. :smile_cat:

I’ve thrown together a mixed bag of scripts and stuck them into a Github repo. Very much a “starter kit” for playing with ORDS data, don’t expect a professional level package!

One of the scripts attempts a simple pass at training a model using Quest data. The following is in the block comment at the top of the script.

A rudimentary attempt at using Quest data to train a model using NLP.

Quests are citizen-science type microtasks that ask humans to evaluate and classify repair data.
Most quests aim to determine a set of common fault types for a given product category.
See the dat/quests/README.md for details.
Some of the problems of using the quest data for training and validation are that:
  . Early quests did not filter out very poor quality problem text.
  . Quest data is multi-lingual and NLP tends to require a single language.
  . Human evaluation was not always conclusive or accurate when problem text was ambiguous or poorly translated.
Consequently, after cleaning and filtering, the data left for training is not really sufficient.
The training data may benefit from manual curation. (Also cleaning stopwords #todo)

The following is a live data "test":
1. Selected quest no. 5 "DustUp" as it has decent quality data.
2. Removed the validation step and used all of the quest data for training. (English language, 738 records)
3. Used the model on all ORA "Vacuum" records from countries GBP and USA. (English language, 1446 records)
4. Exported the results to a spreadsheet and manually reviewed the predictions, excluding records used in training.
5. Found that I agreed with 60% of the predictions and that this was roughly reflected across all fault types:
    58.67%	Power/battery
    64.17%	Blockage
    67.39%	Poor data
    58.43%	Motor
    65.08%	Cable/cord
    55.81%	Internal damage
    56.41%	Button/switch
    60.61%	Brush
    71.43%	Filter
    57.89%	External damage
    50.00%	Hose/tube/pipe
    46.15%	Other
    41.67%	Overheating
    50.00%	Dustbag/canister
    33.33%	Accessories/attachments
    50.00%	Wheels/rollers

I don’t expect it to ever achieve even close to 100% given the low volume and quality of the text data but I reckon there are script improvements to be had. I have only dabbled with ML and NLP and will continue to dabble with this hard problem but if anyone cares to have a go themselves please do, I’d love to learn more!

3 Likes

Troubleshooting problems and finding repair solutions. Then providing steps on how to repair, tools and parts required, and how to source tools and parts. This would need integration with other services, and probably a community approach too.

Mucking around with PrivateGPT, fed it the entire Restarters Wiki (100 docs, English only) and it works reasonably well. Pretty much like a mildly chatty search engine, far from sentient but it doesn’t seem to hallucinate at least. Apart from the initial downloads it can work completely offline. Very slow on my PC, only utilizes CPU apparently. LocalGPT fork worth a go, it can use GPU though probably needs one with a lot more RAM than mine.

In other repair-related ML news, have managed to train a model to detect the languages in the ORDS dataset. Accuracy surpasses Google and DeepL, owing to the repair-specific cleaning and stopwords.

Here is my own personal DeepThought answering queries in it’s own good time. :robot:

Same GPT bot but with its sources revealed:

This AI stuff is really interesting, I’ve learned a lot about it recently, in particular about zero-shot and multi-shot prompting and the RAG pattern, although I’m not sure how those work with *GPT.

@Monique I’ll message you.

Yes far from sentient.

Isn’t the first challenge with ML/AI defining what we might want to get from it?

To clarify, my “mucking around” consists of installing PrivateGPT, feeding it the Wiki documents and adjusting one parameter. It took less than 10 minutes.

I spent about 10 minutes asking it questions and watching the CPU cycles hit the roof for a minute each time.

I then looked at the kind of GPU (about £1200) needed to improve performance and a PSU (about £200) to power it. Mostly people use cloud services, Amazon AWS being a favourite.

It would be stupidly ironic if the emissions saved by a repair were cancelled out by learning how to do the repair. :person_facepalming:

There is no way that this would be viable as an online tool. I read yesterday that it costs something like $700k per day to keep ChatGPT running.

I guess the reason you couldn’t use online ‘free’ service is the amount of wiki data? Did you try asking GPT repair-related questions, what sort of replies did you get? Sorry this might be in the thread above…

The same with any tool, software or otherwise, no?

The ML I’m dabbling with for ORDS data started with what I wanted out of it.

I wanted to see if a model could do as well as people on one of our crowdsourced fault_type finding Quests. So I trained a classifier using the data that we collected in the DustUp (Vacuum cleaner) Quest. 3k repair records with majority opinions as to the fault type.

Of course the first problem I encountered is the mixture of languages in the data. The available NLTK tools require a known language for parsing and tokenization. Training on all of the Quest data produced a confused model.

In order to filter by language I went off on a time-consuming tangent to find the best tool for language detection. Google, DeepL, Python libraries and Java libraries. They all had issues with messy repair data that is full of abbreviations and poor grammar. This led me to train my own language detection model that turned out to be quite successful.

Filtering the Quest records for English left only 823 Quest records. Not a lot for training purposes.

Next I scooped up all of the ORDS Vacuum cleaner records with non-empty problem text - 6,375.
Sifting out the records that were used in the DustUp Quest left 3,210.
Split out the remaining English records and removed any that were useless, e.g. those containing nothing but punctuation (e.g. “?”) and non-useful text (e.g. “n.a.”), leaving 893 records.
Manually reviewed 728 of these records to give my own verdict on fault_type to produce a validation dataset.
Ran the classifier. Its classification of the fault_type disagreed with mine on 272 of the 728 records.

The main issues are with the input. Not enough training data, poor quality text, and classification labels (fault_type) that aren’t specific enough. Lots of disagreement over “Filter” vs “Blockage”, which is fair enough, I had the same issue. Also the word “brush” is ambiguous with regard to vacuum cleaners - could be an internal component or the actual external rotating part.

By “online free” you mean ChatGPT? I’ve asked it all sorts of things, it hallucinates wildly :mushroom: and it changes from version to version.

The premium offering would probably allow the uploading of 100 html pages for upward of £20 a month (+VAT), but OpenAI has probably already scraped all of the Restarters Wiki data, in fact I think @Panda mentioned that restarters.net showed up in its vast list of scraped sites.

My purpose with trying privateGPT was to dabble with a local tool that could be trained on a private, domain-specific corpus.

Came across this while looking at GPU emissions.

Carbon Neutral Calculator for Academic Labs

This calculator estimates your carbon footprint by looking at the two largest factors: (1) flights, (2) GPU computing. It adds a multiplicative factor on top of this for other emissions (default 20%). This repository is inspired by the NYU ML^2 Lab going carbon neutral.

Training data for chatGPT is I believe secret. What I had found is that both therestartproject.org and restarters.net are in Google’s C4 AI model training dataset:

RANK		DOMAIN					TOKENS	PERCENT OF ALL TOKENS
337,559		therestartproject.org	69k		0.00004%
9,529,350	restarters.net			510		0.0000003%

Source: https://www.washingtonpost.com/technology/interactive/2023/ai-chatbot-learning/

(My personal website surprisingly ranks even higher. I’m not sure what to make of this.)

1 Like