Rate calculation gives wrong result #14

Closed
opened 4 years ago by g · 6 comments
g commented 4 years ago
Owner

It should be 50% but it's 75% from the /photo_points endpoint.
image

Probably we shouldn't use ad hoc calculation and just define like as 1 and dislike as 0.

9dfacfd6c2/main.py (L175-L177)

To be fair now it is 2 for like and 1 for dislike on the frontend. I'm not sure why exactly I decided to make it that way. It seems 0/1 values will work just fine and will make the logic more obvious.

Can you see any downsides in this approach?

It should be 50% but it's 75% from the /photo_points endpoint. ![image](/attachments/703158e1-7567-46b9-8f2e-7dc7ebc98688) Probably we shouldn't use ad hoc calculation and just define like as 1 and dislike as 0. https://git.iamonlyherefortheicecream.ml/DIWHY/photovoter_backend/src/commit/9dfacfd6c23eac04b9766ad21c46af6199bad4e5/main.py#L175-L177 To be fair now it is 2 for like and 1 for dislike on the frontend. I'm not sure why exactly I decided to make it that way. It seems 0/1 values will work just fine and will make the logic more obvious. Can you see any downsides in this approach?
Owner

I simply used the wrong formula

100 * SUM(marks.mark)/COUNT(marks.mark)/MAX(marks.mark)

in no way represents percentage of likes for anything other than the basic 1 and 0 case, where everything just gets reduced. Not sure why i thought it was.

The formula should be something like

100 * (AVG(marks.mark) - MIN(marks.mark)) / (MAX(marks.mark) - MIN(marks.mark))

for boolean variable (restoring count percentages from aggregates for non-boolean variables seems impossible to me)

Further, if we assume MIN and MAX differ by 1, (MAX(marks.mark) - MIN(marks.mark)) will be = 1, so we can simplify

100 * (AVG(marks.mark) - MIN(marks.mark))

And further, if we, as you suggested, decide to represent like and dislike as 1 and 0, respectively, MIN(marks.mark) will become = 0, it will simplify to

100 * AVG(marks.mark)

So, to answer your question, no, i don't see any downsides in using values 0/1.

I simply used the wrong formula ``` 100 * SUM(marks.mark)/COUNT(marks.mark)/MAX(marks.mark) ``` in no way represents percentage of likes for anything other than the basic 1 and 0 case, where everything just gets reduced. Not sure why i thought it was. The formula should be something like ``` 100 * (AVG(marks.mark) - MIN(marks.mark)) / (MAX(marks.mark) - MIN(marks.mark)) ``` for boolean variable (restoring count percentages from aggregates for non-boolean variables seems impossible to me) Further, if we assume `MIN` and `MAX` differ by 1, `(MAX(marks.mark) - MIN(marks.mark))` will be `= 1`, so we can simplify ``` 100 * (AVG(marks.mark) - MIN(marks.mark)) ``` And further, if we, as you suggested, decide to represent like and dislike as 1 and 0, respectively, `MIN(marks.mark)` will become `= 0`, it will simplify to ``` 100 * AVG(marks.mark) ``` So, to answer your question, no, i don't see any downsides in using values 0/1.
Owner

Still, for now i stopped at second simplification, it should work with either, 0/1 or 1/2.

Still, for now i [stopped at second simplification](https://git.iamonlyherefortheicecream.ml/w2/photovoter_backend/commit/1dd01e2457931e8f004104377867d08b9d553783), it should work with either, 0/1 or 1/2.
g commented 4 years ago
Poster
Owner

Nice!

Nice!
g commented 4 years ago
Poster
Owner

Won't it 100 * (AVG(marks.mark) - MIN(marks.mark)) give a wrong result if all marks will be equal (e.g. all likes)?

I changed frontend to 0/1 values so we are able go for the simpliest option.

Also thought it could make API more robust if we restrict possible values in the rate picture endpoint to 0 and 1. Now an evil can make this request http://localhost:8000/rate_picture/{cookie}/{picture_id}/999 and break a rate.
I actually did it and it made rate lower for image number 1.
image

Won't it `100 * (AVG(marks.mark) - MIN(marks.mark))` give a wrong result if all marks will be equal (e.g. all likes)? I changed frontend to 0/1 values so we are able go for the simpliest option. Also thought it could make API more robust if we restrict possible values in the rate picture endpoint to 0 and 1. Now an evil can make this request `http://localhost:8000/rate_picture/{cookie}/{picture_id}/999` and break a rate. I actually did it and it made rate lower for image number 1. ![image](/attachments/4a951356-d621-4365-b8bb-8d47a902fffd)
Owner

Won't it 100 * (AVG(marks.mark) - MIN(marks.mark)) give a wrong result if all marks will be equal (e.g. all likes)?

You're right, it will break, since we would have no way of knowing the intended MIN(marks.mark), in our db it will be equal to MAX(marks.mark) and so make the rate 0%, instead of 100%. Dancing it the other way will make all disliked picture 100%, which is probably not much better.

I changed frontend to 0/1 values so we are able go for the simpliest option.

I'll just get rid of these fallible heuristics then.

Also thought it could make API more robust if we restrict possible values in the rate picture endpoint to 0 and 1. Now an evil can make this request http://localhost:8000/rate_picture/{cookie}/{picture_id}/999 and break a rate.
I actually did it and it made rate lower for image number 1.
image

Isn't it a good thing? It's easy to detect, and we'll know at least the session of our bad actor. Maybe more, if we gonna save more information later. The price is a temporary skewed rating, not sure how bad that is.

> Won't it `100 * (AVG(marks.mark) - MIN(marks.mark))` give a wrong result if all marks will be equal (e.g. all likes)? You're right, it will break, since we would have no way of knowing the intended `MIN(marks.mark)`, in our db it will be equal to `MAX(marks.mark)` and so make the rate 0%, instead of 100%. Dancing it the other way will make all disliked picture 100%, which is probably not much better. > I changed frontend to 0/1 values so we are able go for the simpliest option. I'll just get rid of these fallible heuristics then. > Also thought it could make API more robust if we restrict possible values in the rate picture endpoint to 0 and 1. Now an evil can make this request `http://localhost:8000/rate_picture/{cookie}/{picture_id}/999` and break a rate. > I actually did it and it made rate lower for image number 1. > ![image](/attachments/4a951356-d621-4365-b8bb-8d47a902fffd) Isn't it a good thing? It's easy to detect, and we'll know at least the session of our bad actor. Maybe more, if we gonna save more information later. The price is a temporary skewed rating, not sure how bad that is.
g commented 4 years ago
Poster
Owner

Isn't it a good thing? It's easy to detect, and we'll know at least the session of our bad actor. Maybe more, if we gonna save more information later. The price is a temporary skewed rating, not sure how bad that is.

Never thought about this. True, it's good thing.

> Isn't it a good thing? It's easy to detect, and we'll know at least the session of our bad actor. Maybe more, if we gonna save more information later. The price is a temporary skewed rating, not sure how bad that is. Never thought about this. True, it's good thing.
w2 closed this issue 4 years ago
w2 referenced this issue from a commit 4 years ago
Sign in to join this conversation.
No Label
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date

No due date set.

Dependencies

No dependencies set.

Reference: DIWHY/photovoter_backend#14
Loading…
There is no content yet.