Usable speech recognition

LeonSchuurbiers · November 2016

I've been using my homey for about a month now, mostly just to control the lights in the house. And while it's great when it works, most of the time it fails.

I've created a flow to talk back what it hears, and if I make sure there is no background, stand at max 2 meter distance, and articulate very well:

- 50% it recognizes the right words
- 40% it's close (and I think the system should recognize similar words; if I configure "livingroom on' and it recognizes 'livingstone on' (just an example, I'm using Dutch speech) it should still activate the flow without me configuring all the different ways Homey can almost-recognize the speech
- 10% it just spins and nothing happens.

Now as soon as I am around the corner, more than 3 meters away, or the cat decides to walk by, in the percentages are 10 / 10 / 80. Which just isn't usable.

I have no experience with the competition (Google / Amazon), but I assume they are performing better?

What are the plans to improve this?

ZperX · November 2016

I have submitted the issue: https://github.com/athombv/homey/issues/920
I am planning to make review video. Could you also make a video otherwise we are regularly slapped by the enhusiast: `It works for me why are you complaining`?
I can confirm that both Alexa and Google works as advertised (English).

Mathijs · November 2016

It's indeed not really as advertised and I would love some official comment. If voice recognition is not improved a whole lot we'll just have to learn to do without, not nice and not honest, but it is better to know and deal with the facts then keep on trying.

Here is what Homey understood when I asked it to switch the office light off.

Lol

RuudvanBeek · November 2016

I can confirm this type of behavior. But I also have been digging into several forum items from the past. Somewhere I found a suggestion that the "app-load" might be to high or a app might have crashed which would use too many system resources.

I cleaned up some of the apps I just tested and not really needed and loadavg dropped quite some. It looks like the speech recognition is behaving better now.

Image: https://forum.athom.com/uploads/editor/tu/ivl12di261cb.jpg

It is just a suggestion..... but it never hurts to try

msmits · November 2016

Yep... im curious what Athom has to say about this. I have also issues with the speech recognition. Tried almost every spot in my living room to test if Homey performs better, but not a big difference.

When im 2 meters > away from Homey i barely see the feedback led respondig to the sound of my voice. I would like to have the ability to adjust the volume/sensitivity of the mic, but dont know if its technically possible for the noise cancellation..

bvdbos · November 2016

The memory-issue was about speech-output, not about input afaik... Raw recorded speech is send to the Athom-cloud. There Athom uses several services for recognition, I suspect one of these is really bad as opposed to the other services...

ZperX · November 2016

I always had this issue even with wiped clean homey. Therefore it is not memory issue... But the memory issue should be sorted too.

@Mathijs That is a good one, the engine used the urban dictionary.

RuudvanBeek · November 2016

BasVanDenBosch said:

The memory-issue was about speech-output, not about input afaik... Raw recorded speech is send to the Athom-cloud. There Athom uses several services for recognition, I suspect one of these is really bad as opposed to the other services...

Ah... you could be right, I have read many many different postings. Probably coincidence that the recognition seems better.

klaas · November 2016

LeonSchuurbiers said:

I've been using my homey for about a month now, mostly just to control the lights in the house. And while it's great when it works, most of the time it fails.

I've created a flow to talk back what it hears, and if I make sure there is no background, stand at max 2 meter distance, and articulate very well:

- 50% it recognizes the right words
- 40% it's close (and I think the system should recognize similar words; if I configure "livingroom on' and it recognizes 'livingstone on' (just an example, I'm using Dutch speech) it should still activate the flow without me configuring all the different ways Homey can almost-recognize the speech
- 10% it just spins and nothing happens.

Now as soon as I am around the corner, more than 3 meters away, or the cat decides to walk by, in the percentages are 10 / 10 / 80. Which just isn't usable.

I have no experience with the competition (Google / Amazon), but I assume they are performing better?

What are the plans to improve this?

Same Here

Mathijs · November 2016

The main issue is that is Athom is indeed using external engines to recognize speech they leach on the investments of others who will not appreciate this. I am running a similar project and the moment it got some volume we were cut off and were asked to pay a sum per 'transaction'. And as Athom does not sell a service per month that would make it impossible to use these services. That would explain why it used to work well and does not work well anymore.

Now clearly I got no idea how these things work behind the scenes. I just know that the videos Athom has online about this show a idea that I am unable to recreate. And that high end voice recognition in serious volume without any cost just does not exist. Again, I honestly would prefer an honest sorry, 'won't happen', over the silence we have now. It would mean I can move my Homey from my desk to a place where connections are better.

Right now my Homey is like a deaf device next to my $50 Echo. That one is magical in recognizing voice. I do understand Athom and Amazon have different resources and I back Athom with the two Homeys I got. I just ask for some information on what to expect.

honey · November 2016

Google speech to text api:

Pricing Table

0-60 minutes Free
61+ - 1 million minutes $0.006 / 15 seconds*

Well if homey would not go through the Athom servers but connect directly to Google than would be within that 60 minutes. I think google is the best bet because of the wide ranges of microphone types supported, works in noisy environment, supports many languages and accents, reliable and fast service.

I believe Amazon`s service is also accessible for other devices. But that depends a lot on microphone tuning.

Ivona text to voice is not a free service either. But man that works! The best quality I have ever seen.
Probably they have done a special deal but Ivona cost around $30-45/voice. Their cloud service (that is not used by Homey just for comparison): $1000/month (prepaid) + $0.003/unit for usage above 250k units/month.

Ps: I haven`t realized till now that Ivona is an Amazon company.

honey · November 2016

Microsoft:

Short form recognition	15 sec per call	$4 per 1000 calls
Long form recognition	2 min per call	0-10 hours at $9 per hour, 10-100 hours at $7.50 per hour, over 100 hour at $5.50 per hour

jjtbsomhorst · November 2016

I have to agree on you al. The speech recognition sucks sometimes. This morning I gave up asking for lights after 5 tries. Luckily there is the app. But still if you say : lamp San and homey thinks you said lang haar... or stamp aan...

Fire69 · November 2016

This has been discussed many times before...
The problem is not the speech recognition service they use, it's the hardware (or lack of software optimization of the hardware (read: microphones))

Whether you use your phone or Homey itself, the same recognition service is used.
And if you use your phone, speech is recognized much much better then with Homey.

msmits said:

When im 2 meters > away from Homey i barely see the feedback led respondig to the sound of my voice.

I think this proves my point. I see exactly the same behaviour.
Homey just doesn't hear you when you're 'too far' away.

jjtbsomhorst said:

I have to agree on you al. The speech recognition sucks sometimes. This morning I gave up asking for lights after 5 tries. Luckily there is the app. But still if you say : lamp San and homey thinks you said lang haar... or stamp aan...

The problem with this example is that the recognition software tries to make a logical sentence from your speech.
When you use very short commands, it fails at doing this lots of times.
Try saying 'Doe de lamp van Sam aan'. That will probably work better.

jjtbsomhorst · November 2016

I have to disagree on this @Fire69 . We also have a flow that is triggered when we say 'Netflix'. This flow is trigger all the time. No matter where we are. So I think it depends on both sentence but also the combination of letters used or something else..

Fact is that this needs to be addressed by @Athom very soon because they are talking about going retail ( in their newsletter) now.

bvdbos · November 2016

I wonder, if I have my phone listening when I'm four meters away, would the recognition still be that good...

MarcelTimmermans · November 2016

For me speech is 90% of the time not working. Personally I backed-up Athom because the Speech Recognition feature because for all the other functionallity there are plenty of other cheaper devices. But I must admit that I am now hooked up to my homey. Eventhough the Speech Recognition sucks. I also prefer to have a honest answer from Athom. I think when people buy a Homey at the shop they will return it very fast because this functionality is not working and I think for many people one of the hotest feature.

jjtbsomhorst · November 2016

This is the unique selling point because other companies are doing it so why can't homey do it?

Pils · November 2016

I agree to you all. Athom should respond on this, but i see for a while now that Athom does not respond that much on the forum. Also the new newsletter contains not that much news for us. It looks more like an advertisement of how good they are, nothing mentioned about the big problems they have and sollutions they have in mind or time table.

Athom did have a communication problem since the start. @emile did admit that and promised to do better, and they did a little bit for a while. Writing a weekly software status, but now i only see the same behaviour: don't tell, don't respond, don't mention, only "working hard on it". That is not enough for us people who are spending hours weekly in order to get homey working. If they don't want to sent that kind of information it to the normal user (if they have some), they could write a newsletter to the forum users, or they could write weekly a status report of the 10 biggest issues they have and about the progress. Not pointing to github or slack, just mention the biggest problems they have and their status like:
-433 range BAD (know the problem, working on sollution, estimated release week 40)
-Speech recognition (problem not known, could be hardware, Currently no sollution available)
-Slow speech (etc)

I also would like to hear the truth from them. Are they able to fix it? I It is hardware and if so are they working on homey v2.0 who can really do as promised? And if so, do we all get that for reduced price for all our testing effort? And use the homey v1 for in our bedroom, from where the distance is so short , it does recognise our speech.

I know they work hard and have a potential beautiful product, but right now too much works too unstable. And they seem to focus on too many things.

And what is the status about the financial situation of Athom and what about the current sellings of homeys? Do they have enough money to keep developing, or should we be afraid that they run out of momey. But i hardly can expect a honest response on that one :-)

I really like the homey and are willing to give more of my time to test it, but they should start communicate better and be more transparent.

Fire69 · November 2016

jjtbsomhorst said:

I have to disagree on this @Fire69 . We also have a flow that is triggered when we say 'Netflix'. This flow is trigger all the time. No matter where we are. So I think it depends on both sentence but also the combination of letters used or something else..

Fact is that this needs to be addressed by @Athom very soon because they are talking about going retail ( in their newsletter) now.

Netflix is a pretty specific term, and this is one of the reasons why they use an online recognition service instead of doing it offline on Homey itself. The online service can easily adapt to current trending words, offline they would have to regularly update their dictionary.

jjtbsomhorst · November 2016

Agreed, but still light aan or lamp aan is one of their default terms. It should not mis interpreted them so much if what is you say is true. I guess. But hey lets see what athom has to say about it.

TheoDeKoning · November 2016

Reading the news letter make me very sad but also afraid.
They start delivering the orders form there web-store in couple of days.
Also they are make them become ready to go for selling Homey in the shops.

Sad because of the feeling that don't listen to us here although they write there love the community.
Afraid that selling Homey for real, the refunds with the extra cost becomes the begin of the and of Athom.

The two things that for me counted to became first batcher where: Speech and better KaKu handeling.
We know where we stand in this.

msmits · November 2016

This was a example, i said: "Gaat het nog regenen"? This was the result.
I had contact with Athom about this, they said it can be caused by my voice. Quote: For "some" voices it's difficult for Homey to recognize. But when my gf talks to Homey it's exact the same....

They also told me that they're still working on the voice recognition, and the system is "self learning", so the more people talks to Homey, the better it gets.

Now i've running a flow which mutes the tv when Homey is listening.. it's a little bit better, but still not acceptable in my opinion.

ZperX · November 2016

Of course we can`t see the source code but most issues are pointing to the same thing. Athom has no experience with signal processing: Voice, IR, 433Mhz receiver. Z-wave range issue. How handle noise and physical signals is clearly beyond their knowledge. They should hire a specialist or outsource it. Since Homey is all about wireless signals and voice there is no way around it.

Pils · November 2016

@TheoDeKoning ;
I agree with you.

I hope that @emile or @JeroenVollenbrock or @annemarie or someone else from @Athom ;
will respond on our concerns and give us more inside information about the current situation.

Ik hope they respond very soon

honey · November 2016

Not everything is about coding. This is where the coding meets with the physical world. I am sure they do their best to learn about signal processing but they can`t just accumulate a 10-15 years of experience overnight or over 2 years.

techniman · November 2016

Fire69 said:

Whether you use your phone or Homey itself, the same recognition service is used.
And if you use your phone, speech is recognized much much better then with Homey.

I read somewhere on the forum that the reason is that the data sent from the phone to homey is plain text,
e.g. the phone's speech recognition software is used, e.g. Google or Siri
that's why it's a) better and b) faster

Personally I gave up on speech recognition & spoken replies through homey,
because it works so bad the WAF of the homey is also very low (to a point where she said " I'm going to throw that paperweight out")

One workaround I can think of is to use Tasker + Autovoice to trigger a HomeyFlow
But that kind of defeats the purpose of Homey..

Fire69 · November 2016

techniman said:

Fire69 said:

Whether you use your phone or Homey itself, the same recognition service is used.
And if you use your phone, speech is recognized much much better then with Homey.

I read somewhere on the forum that the reason is that the data sent from the phone to homey is plain text,
e.g. the phone's speech recognition software is used, e.g. Google or Siri
that's why it's a) better and b) faster

That is incorrect and it has been confirmed by Athom on Slack that the same service is used on phone and Homey.

Mathijs · November 2016

Fire69 said:

That is incorrect and it has been confirmed by Athom on Slack that the same service is used on phone and Homey.

Now that is VERY scary. It means that indeed the problem is hardware and that makes a fix nearly impossible.

Let's hope that at the meeting this month the developers can show that Voice Recognition does work for them, if it does not work for them I think we bought a defective device and should be repaired. Happens all the time, not a big deal and I'm not worried. I am sure Athom will make it work as they advertised it. It they are able to throw in spotify....

The fact they do not respond at all however is not a good sign.

KoenMartens · November 2016

Mathijs said:

Fire69 said:

That is incorrect and it has been confirmed by Athom on Slack that the same service is used on phone and Homey.

Now that is VERY scary. It means that indeed the problem is hardware and that makes a fix nearly impossible.

Let's hope that at the meeting this month the developers can show that Voice Recognition does work for them, if it does not work for them I think we bought a defective device and should be repaired. Happens all the time, not a big deal and I'm not worried. I am sure Athom will make it work as they advertised it. It they are able to throw in spotify....

The fact they do not respond at all however is not a good sign.

No that is not scary at all.
It means that they have problems with their echo-cancellation and noise-removal, and not with the speech recognition. Homey's mic is on par if not better than most phone's. The problem is that you're not 10-20cm away from Homey's microphone.

Fire69 · November 2016

KoenMartens said:

No that is not scary at all.
It means that they have problems with their echo-cancellation and noise-removal, and not with the speech recognition. Homey's mic is on par if not better than most phone's. The problem is that you're not 10-20cm away from Homey's microphone.

Yep

And it has 2 high quality (according to Athom

) microphones, so that should not be the problem

Usable speech recognition

Comments

Pricing Table