This discussion is for you to explain, critique (nicely), and otherwise show how awesome your strategy was and why you think it'd win. Please no code, just written explanations if you feel like it. Upvote those strategies you really like and we'll see if the most popular strategies actually win!
Some questions to get you started.
Why does always slack not necessarily win?
How does one identify individual players and play tit-for-tat or other colluding strategies? Can one even?
What is the purpose of m? It adds to everyone, so what difference does it make?
Is it better to construct a really good fixed strategy via game theory, or try a bunch and evolve the strategy over time (machine learning)?
Easy Math Editor
This discussion board is a place to discuss our Daily Challenges and the math and science related to those challenges. Explanations are more than just a solution — they should explain the steps and thinking strategies that you used to obtain the solution. Comments should further the discussion of math and science.
When posting on Brilliant:
*italics*
or_italics_
**bold**
or__bold__
paragraph 1
paragraph 2
[example link](https://brilliant.org)
> This is a quote
\(
...\)
or\[
...\]
to ensure proper formatting.2 \times 3
2^{34}
a_{i-1}
\frac{2}{3}
\sqrt{2}
\sum_{i=1}^3
\sin \theta
\boxed{123}
Comments
My player uses the following very simple (tit-for-tat-esque) strategy: on the first turn, hunt with everyone. Then, if k people hunted with me on turn n, hunt with k random people on turn n+1.
This has the nice property that after the initial investment of hunting with everyone on the first turn, we can't do any worse than slacking (the extra food it takes to hunt with k people on turn t is balanced by the k people that gave me food on turn t-1). On the other hand, if there's a clique of high-reputation players that hunt with each other, this player will naturally fall into this clique. Moreover, by distributing the hunts randomly, we remove some of the advantage the colluding players have (ideally, we would spend our hunts on the players with the least amount of food, but it's hard to keep track of these players, so hunting randomly seemed like a good secondary option).
Overall, it's kind of a "best-of-both-worlds" situation; we're guaranteed not to do much worse than any slackers, even when everyone is a slacker, but we can also sneak our way into groups of colluding players (while giving away all of their food). It's not perfect - ideally, there'd be some learning to know when we can undercut and reduce our reputation further (i.e. hunt with fewer than k people on turn n+1) - but hopefully it is good enough for a t-shirt.
Log in to reply
I really like your strategy ! I'm so willing to see something that simple go for a high score.
I based my algorithm on the following premises:
To address the premises 1 and 2, I started with a Gaussian probability distribution. I would hunt 'randomly', but it would be weighted according to my opponents reputation. So at very low and very high reputations, I would be very unlikely to hunt. The greatest chance of me hunting would be around the middle. This is a achieved with the equation (the 0.5 shifts it to centre around 0.5 and the 4 squishes to be between 0 and 1):
hunt_probability=e((rep−0.5)×4)21
However, I wanted to reward players that hunt and punish those that slack. I therefore altered my probability distribution by weighting it in favour of those with good reputations. I did this by adding the opponent's reputation to the above probability distribution and dividing by 2:
hunt_probability=(e((rep−0.5)×4)21+rep)/2
To address premise 3, I assumed most strategies are likely to punish slackers (i.e. players with a low reputation). To avoid this punishment, I aggressively ramp up my hunt_probability when my reputation falls below 0.5. this is achieved by adding the following to the previous equation:
0.5−current_reputation
This stops me getting starved out by players who will only hunt with those who have a high reputation.
I address premise 4 by keeping track of the proportion of hunts relative to interactions over the entire game, and increase hunt_probability by a small amount when m is lower than the average of this.
I made the strategy less forgiving by adding a hard cut off at each extreme: If a player has a reputation < 0.2 or greater than 0.9, I will certainly slack (hunt_probability = 0).
To have a look at my probability distributions, see http://fooplot.com/plot/fexxqzxw4f
The pink line represents a pure Gaussian, the dark blue represents my weighted version (that I used in my strategy) and the green shows the ramped up version used when my reputation falls below 0.5.
A lot of people have mentioned that slacking or 'freeloading' appears to do very well, according to the bots in Chad's testing framework (which was excellent, thank you Chad!). This is only true when it plays against very simple strategies that don't punish players for slacking. My strategy relies on the idea that slackers will be starved out by other players. So I'm hoping that the simplistic bots in the testing framework may have led people to believe that slacking is a good option, and I can take advantage of this!
With more time I'd have liked to implement some machine learning techniques (I'm sure my strategy will crumble compared to more sophisticated ones that use ML or ANN!). Ah well, it was fun to have a go!!
Log in to reply
I had the same philosophy as you but to find my goal reputation just reverse engineer all the reputation data to find out the average number of times people hunt each round and I keep raising my reputation just until people hunt when paired up with me more than average.
Out of laziness I did not account for m anywhere in my code
By the time the game is through enough rounds I switch into a mode where I just lose because we are deep enough in and so few players are left that it won't affect my reputation by the time the game is over.
Log in to reply
Interesting, how did you decide when to switch modes? I considered lowering my hunt probability gradually as the game went on (so that I am more likely to slack in later rounds) but wasn't sure how to estimate when the game would end...
Log in to reply
My strategy is weak against yours (slacks those with rep > 0.9). I hope not many people decided to do the same.
Log in to reply
Hopefully it won't matter if lots of others had that idea, unless your algorithm maintains an extremely high reputation!
Log in to reply
I know that it depends on what other players are doing (as it depends on how they are using your reputation) but maybe, instead of incrasing the probability of hunting when you've got less than half of the total reputation, did you think about doing it when you are below the average or the median of all reputations (which you can calculate every turn)? ^^
Appart from that, I didn't thought about actively starving slackers, which in fact, it's a really good idea o.o My thought was more like: well, if they slack too much, they will eventually die because nobody will hunt with them 'cause they have really low reputation. (Which I realise in fact is starving them but, didn't thought about doing it on purpose :D)
Log in to reply
I did consider doing it that way, but it is quite risky because if the average of all the reputations is quite low, you will also be restricted to a low reputation and have no protection against being starved out. You're right, it definitely depends on how the other players work so it could be a better option. The best way would probably be to use a combination of the two systems based on how the average reputation of the tribe varies. If only we had more time!! :)
I'm quite surprised nobody is talking about sorting the players by reputation and then assigning to each one the same behaviour obtained by the player that occupies its position in the previous round (i.e. hunting with the i th player if the i th player of the last round hunted with you and slacking the j th player if the j th player of the last round slacked you).
For me this is the most straightforward way of implementing the Tit for Tat strategy under the constraints of this contest (I even call it Fuzzit-For-Tat!).
Has anybody walked this path? Any thoughts about it?
Log in to reply
This is exactly what I ended up doing as well.
Log in to reply
FWIW, I doubt this strategy will win because I don't think it's exploitative enough, but it's also nearly impossible to exploit and very likely to survive to the end. Even if I had liked any of my other candidates enough to use them, I would have had a Tit for Tat fallback (similar to how competitive RPS bots often have a [1/3, 1/3, 1/3] fallback)
Nice! Could you explain a little bit how do you arrived to this algorithm?
Do you tried to track other algorithms using "feasible reputation" intervals and then realized that sorting them by reputation is enough (as we did) or do you used the reputation ranking by other reason?
Do you hunt with everyone in the first round? How do you manage the situation when an algorithm 'dies' and you have more 'behaviours' to deliver that adversaries to get delivered?
Log in to reply
On the first round, and on any round where a player died, my bot just hunts with everyone. This both gives it a high reputation early and adds a forgiveness mechanic so that bots with similar strategies (like yours) don't get stuck in perma-slacking mode.
I go into more detail in my writeup.
Log in to reply
Wow, that's really nice! This indeed implements the Tit for Tat, and keeps track of players in a nice way. I believe your algorithm is, for sure, the best TfT algorithm there - which is a shame, as there will scarcity of algorithms to play correctly with you. I wish I had this idea before. Have you submitted it?
Log in to reply
Yes, we submitted Fuzzyt for Tat to the competition.
We are not really confident in winning the contest but we feel that this is the algorithm with the best cleverness/complexity ratio we could invent :-P
But, again, it will be really surprising if nobody else implemented this idea (or something closely related) :-/
But can you really deduce the necessary information from reputation alone?
For instance, compare
Player A hunted for 10 consecutive turns, but for some reason on the 11th turn slacks.
Player B hunted for 9 out of 10 consecutive turns, but then on the 11th turn hunts.
both players will have a reputation of 10/11 so how do you distinguish them?
Log in to reply
The assumption is that if players have the same or very close reputation, they are likely playing similar strategies. And even if they aren't, we're varying between them randomly because of the shuffling effect from the game engine itself. In your example in particular, we only have a 1/2 chance of getting turn 11 "wrong" and even if we do it's a minor error, and if those are their long term strategies we'll still play approximately correctly on average.
Log in to reply
In fact, when I ran a simulation of 11 of these player tracking Tit-for-Tats vs. 11 random-hunting TIt-for-Tats, plus a bunch of other stuff to pad out the population, the latter 11 tended to outlast the former 11.
Log in to reply
If a cycle gets started in this fashion by accident, it can also be ended by a similar accident
If a player dies, I reset the algorithm and hunt with everyone for one turn
BTW, I wouldn't call the random hunter Tit for Tat because it actually plays dramatically differently in heterogenous pools (that is, pools where players have very different reputations) and I think the strategy Carlos and I used has a far better claim to that name (especially since they degenerate to TfT if the game falls to 2 players and becomes PD).
I'm actually curious what you're padding the population with in your sims, because I actually had the exact opposite results; even 2 or 3 of my TfT bots could handily outcollude much larger numbers of random hunters, particularly in pools with a lot of slack-heavy bots.
Log in to reply
My implementation of the player-tracking TfT bot did all-hunt on player death.
For padding, I have: Brilliant's sample bots, most of your sample bots (with no BoundedHunter but more Random: Random(0.1), Random(0.2), ..., Random(0.9)), a bunch of "hunt if their rep > X" and "hunt to maintain rep == Y" variants with either fixed X and Y or very simple calculations (like "hunt if their rep >= my rep"), and a dozen or so reputation adapters. I notice in the one run I have logged that there appears to be a fair amount of reputation shifting, even amonst the player tracking TfTers. The top algorithm in that run ended up being "GoWithTheFlow", which recomputes average reputation each round based on roundend()'s numhunters, and all-hunts if average reputation >= 0.5, else all-slacks. (In other words, an even cruder interpretation of TfT than either of us are using!)
I agree that random hunter TfTs are less TfT than player tracker TfTs. I would say that random hunter TfTs use something more accurately called "population Tit-for-Tat," since they're following the spirit of "individual Tit-for-Tat" (i.e., the original Tit-for-Tat), but interacting with the population as a whole, rather than on an individual-by-individual basis.
In fact, random hunter TfT could be seen as a recasting of the game into a variant of the prisoner's dilemma where players can partially cooperate. Like, "I 35% cooperate (65% defect)." I think I've actually heard of such variants, but I don't know what the "official" name of them is, or how much they've been studied. Anyway, in this game, I'm one player, and the rest of the population is combined into the other player. How many hunts we do in a round maps to how much we partially cooperate.
Log in to reply
If you mean long-term, I disagree, assuming the other strategy is also actively trying to collude.
On the contrary, I'm assuming that either reputations are stable, or that they are chaotic enough that my occasional missteps won't be noticed. Your worst-case only appears to happen if you assume that the field is tracking my reputation far better than I'm tracking theirs and that there is some combination of strategies that maintain similar reputations but play significantly differently against me.
The
Random
bot simply isn't a very realistic opponent, in my opinion. A large number of those certainly would blunt the inherent weaknesses in theFairHunter
strategy, but I would be suspicious of any simulation that used a large number of them.The random hunter isn't TfT at all. Its payout isn't even the same as the mean of the opponents'! (easy example to prove it: have all opponents hunt the first round, then slack forever. The Random hunter is guaranteed to hunt with all of them once, then have a nonzero hunt probability every round after that). I'm even loath to call
GoWithTheFlow
a Tit for Tat bot. If we need a blanket name for bots that try to match the opponents' behavior, I'd use "reciprocal strategy"; the range is too broad here to call all of them by the same name.The PD variant you're thinking of is Continuous Iterated Prisoner's Dilemma.
Log in to reply
I just tried a run with no Random() bots, and the tracker-TfTs did a little better, but still lost to the random-TfTs. Note that the padding I described was just from one run. I varied the population quite a bit between runs, and most of my runs had only your two default Random() bots.
This is a good time to mention that population makeup tends to make a huge difference in which strategies do how well, in some cases making the difference between ranking #1 and being one of the first to die off! Although the Brilliant folks have said they'll do what they can to minimize the effects of randomness, it's still largely a matter of luck who wins first place--the luck of everyone else entering strategies that happen to flatter your own. This is especially true since they plan to do only a single, big run. It's a shame they don't do several smaller runs and average the results together.
No bot in this contest is truly Tit-for-Tat. It's simply not possible, since flawless player tracking is impossible within the rules. All we can do is argue who's the tittiest of the tattiest. I anticipated this tedious jockeying for nomenclature before submitting, which is why I called my strategy "Copycat" instead of, say, "GroupTitForTat," or "AsTitForTatAsICouldGetWithoutPlayerTracking." I figured there'd be just short of a billion people submitting things they thought were, and called some variant of, Tit-for-Tat. Maybe I'll get style points for not GoingWithTheFlow. :P
Log in to reply
Only if you make some really specific and unjustifiable assumptions. It's a function of:
To come up with 25% you have to answer "exactly one" and "exactly half the time".
Sure, but I don't think it's nitpicking to say that the strategy that best approximates TfT with the given information, and degenerates to TfT in two-player is the one that should have the name. The "random according to opponent's reputation" strategy, right or wrong, is just a completely different thing.
Huh. That's actually a pretty elegant way of tracking players, and probably about as accurate as it's possible to get. I like it! If I were a judge, I'd give you style points. :)
Log in to reply
Our first though was to compute the reputation intervals where every player should fall in the next round and use this information to track them (you know their current reputations and how much these reputations will change if they slack everyone or hunt with everyone).
At first, reputations will vary a lot and the intervals will surely intersect each other, but at some point the reputations will somewhat converge and the intervals will shrink enough to track individual algorithms or, at least, small groups of them (lets call them 'clusters'). Perhaps the algorithms that fall inside one of this 'clusters' behave really different (2 algorithms could have the same hunt/slack ratio but betray or cooperate with very different criteria...) but this is as much accurate you can be without tracking "trends" and trying to track "trends" seems quite difficult with the amount of information available...
Once you are able to track individual algorithms the Tit for Tat strategy seems a good option. The only question that remains unanswered is what to do when you receive different outputs from a given 'cluster' (2 algorithms slack and 3 algorithms hunt in a 'cluster' of 5 indistinguishable algorithms)... The answer that we put in practice is to assume that they will keep their relative positions in the reputation ranking. Its quite a bold assumption at first but it turns out to be quite reasonable once enough round had been played.
At this point we realize that it is not necessary at all to compute the "feasible reputation" intervals. We only need to sort the adversaries by reputation! The key point is that if there isn't any intersection between the intervals of group A and the intervals of group B then sorting them by reputation is enough to separate them (assuming you know the sizes of A and B) since no algorithm of group A can change enough its reputation in the next round to exchange its rank with an algorithm of group B.
This lead to the very elegant algorithm explained above (sort the players by reputation and then assigning to each one the same behaviour obtained by the player that occupies its position in the previous round).
Log in to reply
Log in to reply
We also thought some other aproaches to the problem as trying to guess what were other players planning to do by gathering information from responses from other player as a function of our reputation and his from previous rounds but finally we kindda get through the Keep It Simple, Stupid way od doing thing.
Even thought I thing that other aproach could have been quite powerful if we could have done more trials to improve some "arbitrary" parameters and make it a little bit more intelligent o.o
My algorithm first tries every possible reputation rounded to .01 in a certain range, to see which one has the highest gains. After it's gone through that range, it picks the reputation with the highest gains and hunts with the frequency to reach that 'ideal' reputation.
Log in to reply
Hi Tim, what do you mean by gains?
Log in to reply
I think by gains he means outcomes from hunting decisions. Highest gain = least food lost.
By gain, I mean the average of the
food_earnings
list.Log in to reply
Log in to reply
food_earnings
already. As it says on the Sample Code page:Similar to what I did [would have done].
Log in to reply
But that's because our names are similar.
Clever, but I think there's a weakness that appears late-game. Once many slackers have been starved off, shouldn't the optimal reputation shift with the now changing demographic of the population?
Say, 0.5 may be really handy early on because so many bots defect against you regardless of your rep, but as lower than 0.5 bots get eliminated you become less valuable relative to the community, and must even slack against more and more of the co-operative hunters to maintain 0.5.
Log in to reply
It keeps updating, so if 0.5 suddenly becomes rubbish, it will look for something else.
I did not care with who I hunted and with who I slacked. I just calculated how much I had to hunt, then create a list with that number of hunts and fill it with slacks to have its length to be the number of opponents, and then I just shuffled that list (which wasn't necessary actually, as the list was randomized by Brilliant anyways). Either way, I did not care about anyone's reputation.
Log in to reply
Well you might not, but those co-operative bots will care if you're slacking against them, and may respond as such. Learning the optimal reputation is a plus, but I think your random choice between partners isn't using all of your available power to actually shape the community into an environment that's the most useful to you.
I guess the tradeoffs of the suboptimal noise involved in "Shaping your strategy to the community" and the noise involved in "Shaping your community to your strategy" should be compared, whatever that end up being.
Log in to reply
You can search for an optimal rep midgame when you can only slightly change your rep each round (in a gradient descent type manner) but then you would be susceptible to local optima. And on top of that you could start falling to far behind the pack while doing your search.
Log in to reply
My algorithm does something really similar. I periodically switch back and forth between to raising my reputation by always hunting and lowering my reputation by always slacking.
I'm hoping there aren't too many "cooperative" bots out there that try to track who I am from round to round!
Log in to reply
Ah, that's interesting. My bot indeed hunts everything when the desired reputation is way higher than the current reputation, but when they're equal, it just hunts the amount to maintain that reputation. I guess it doesn't really matter, hunting with all or slacking with all in such a way that the reputation bounces around the desired reputation will probably be just as good, and easier to implement (although this only was a few lines of code as well).
I've posted my documentation online: <http://chill-dream.com/TitForTat.html>
I had a far cleverer idea in mind but there was some fatal algebra or logic error somewhere that I still haven't found. So instead I just ended up with a boring unexploitable (but unexploitative) Tit For Tat algorithm.
Log in to reply
My story is about the same: I had a brilliant idea for an algorithm, it showed great promise, but I just couldn't get it polished in time, so I ended up submitting a TfT variant. My brilliant strategy could climb the rankings, then outpace everyone else by a wide margin for a long time, but would always inexplicably slide back down into the middle of the tribe (or worse). Plus, it was ugly and kludgy, so I wasn't going to inflict it on the world unless I was sure it'd win.
Log in to reply
It's too bad I didn't find anything that performed even that well. I outlined this in the link above, but if I had found an algorithm that could exploit the early game and only the early game, I think starting with that strategy and then switching to TfT when it starts to fail would be best.
Log in to reply
I noticed that the slower you do this (e.g., take N rounds between each decrement), the bigger your lead will end up being, but the slower it'll take to get there. In fact, my "brilliant" algorithm seems to have done so well because it basically just did a very slow decrement. Where it failed, in my estimation, is that it got too greedy, and let its reputation fall far below 0.5.
One strategy I considered was to just keep looping through this process--hunting with all, decrementing to hunting with none, then resetting to hunting with all, etc. I never bothered to try it, though. I was too busy trying to fix my algorithm "the right way."
Incidentally, against my better judgment, I returned to my code and made a couple of changes that seem to have fixed my "brilliant" algorithm. It no longer gets a monster lead, but it does get a lead, and it keeps it (I stopped the sim after 35,000 rounds and it was still in the lead). So, now I'm just hoping that five teams came up with something otherworldly in its awesomeness and effectiveness, so I don't have to feel bad about losing $1000 because I was a day late!
Thanks Chad M. for the simulation code. That was awesome. I think I should give you half the money if I win. Your simulation was the only reason I was able to remove bugs and finish my algorithm on time.
Check out my ideas on the game on the link: www.kelvinsantos.com/kran.html
Which took longer; code or write-up?
Log in to reply
"I'm sorry that this was such a long letter, but I didn't have time to write you a short one." --Blaise Pascal
Log in to reply
I went through many iterations.
Boneheaded interpretation of Tit-for-Tat (called CjTitForTat). Basically, hunt with at least the top-reputationed 50% of players (be generally cooperative), and hunt with anyone else with a reputation > 0.5 (cooperate with those who seem to be cooperating). Ultimately decided this was a stupid interpretation of Tit-for-Tat, but in my throes of last minute helplessness and despair, I almost submitted a version that used 60% and 0.8, since that tended to do the best of the pack for reasons that are beyond my desire to analyze. (Indeed, it scored #1 overall with its pack to back it up.)
Super-simple algorithms. First there was Bro, who hunted with everyone whenever the average reputation of the tribe was > 0.5, and slacked with everyone otherwise. (Bro follows the crowd.) Even unto the end of this contest, this dumb algorithm that I came up with after reading half the rules and understanding half of what I'd read /still/ managed to win more than half the hostile-tribe (food tending towards 0) play-outs. I actually grew to hate Bro. Then there was Hipster, who did the opposite of what Bro did. (Hipster bucks the trend.) This guy was almost always the first to die off in hostile populations. Then there was GoWithTheFlow, which was Bro, but who recalculated average tribe reputation each round, based on foodearnings (Bro just averaged all playerreputations together, similar to AverageHunter). This was intended to respond more quickly to changes in the tribe dynamic than Bro. It tended to do about as well as Bro. Then there were many variants of these algorithms with different parameters (cut-off between hunting and slacking). For reasons I had neither the time nor ability to understand, 0.5 tended to be the best whenever any of these algorithms did well (which was only in hostile tribes).
Game theoretic algorithms. Okay, I was getting serious at this point. I actually looked at the payout tables and crunched some numbers. My idea was to maximize my reputation, but do so rationally--only buy reputation (i.e., hunt) when it would not cost food (i.e., when the average payout would be >= 0). There were four of these algorithms, based on two binary categories. The first category was whether to hunt/slack the whole tribe at once (Grp), or hunt/slack on a player-by-player basis (Ind). The second category was whether to average in hunt bonuses (Bna), or use m to try to predict them (Bnp). To predict a hunt bonus, I multiplied each players reputation by P-1. If the result was >= m, a bonus was predicted. Since a hunt bonus essentially adds 2 to each payoff in the table, it lets us rationally buy reputation for more low-reputationed people. RatRepGrpBna hunted with everyone if the average reputation (ravg) was >= 3/5. Else, it slacked with everyone. (In other words, it was basically Bro with a 0.6 parameter instead of 0.5.) RatRepGrpBnp: If no bonus, hunt with all if ravg >= 1 (yes, if everyone has a reputation of exactly 1--so, basically never); if bonus, hunt with all if ravg >= 1/3. Else, slack with everyone. RatRepIndBna hunts with any population member whose reputation is >= 1 - 2/3*ravg, and slacks with that member otherwise. Finally, RatRepIndBnp: If no bonus, hunt with each player with a reputation >= 1 (yes, only exactly 1); if bonus, hunt with each player with a reputation >= 1/3. Slack with everyone else. These algorithms did so-so. The problem, I think, is that they fail to factor in the worth of reputation. At the extreme, if no one cares about reputation, you shouldn't buy it even if you can do so at >= 0 food earnings. If everyone cares about reputation, you should buy reputation even if you have to do so at < 0 food earnings sometimes. Nonetheless, I was most proud of these algorithms and the time I'd put into researching them, and thought about entering RatRepIndBnp, the most complex one, out of spite (not for the competition, but out of reality, for not letting my beautiful algorithms work!).
Adaptive algorithms. This was a series of algorithms that attempted to adjust our reputation (or, in simpler cases, how many individuals we hunted with vs. slacked with) based on the payouts we were getting. The most successful of these, PushPull, typically managed an over 100,000 food lead over second place in a friendly, rep-caring population of around 90 members. Sadly, it took about 2000 rounds for it to do this, and by round 15000, it started slipping from first place and into the middle of the tribe. Also, PushPull typically died about half-way through any hostile-population run. Sadly, I didn't have the time (or probably, ability) to work out the kinks in PushPull, since it was probably the closest I came to an algorithm that could take first place.
What I actually submitted: Copycat. This is yet another Tit-for-Tat variant, this one truer to the original Tit-for-Tat than CjTitForTat was. It has three basic rules: 1) Hunt with everyone on round 1. (Like Tit-for-Tat's "cooperate on round 1.") 2) Else, if <= 50% of people hunted last round, slack with everyone. (A crude, group-level "defect against defection.") 3) Else, do what everyone else did last round. Literally, food_earnings is converted into a list of 'h's and 's's, trimmed to account for deaths, and fed right back into the population as our actions for this round. (A group-level, literal interpretation of "repeat what they did last round.") I have rule #2 instead of letting rule #3 absorb it because I found through testing (and Bro's aggravatingly good performance) that an average reputation <= 0.5 tends to be highly correlated with a population's being hostile, and in such populations, it's best to always slack. The implementation of rule #3 was mainly to be cute and get style points--the idea came from a one-liner someone entered into an old Rock-Paper-Scissors tournament. In retrospect, though, it turned out to be a very clever way of interpreting Tit-for-Tat's "I do what you just did" behavior.
Log in to reply
Post your code? :3 (i.e., what is your opinion on letting others see/use your work for the future?) I'm particularly interested in 3 and 4, of course.
Log in to reply
I gather that the maintainers want source code kept secret until the prizes have been awarded, since they mentioned something about a video verification for winners, which would probably be thrown off if people started sharing source code. Although I doubt I'll win anything (except maybe a t-shirt), I'd like to honor (what I assume to be) their wishes. After the prizes are awarded, though, if there's still interest, I can post my code. In fact, I seem to have fixed my adaptive algorithm in #4. It doesn't get a huge lead any more, but it does get and keep first place.
Log in to reply
This isn't really true if you can figure out who's hunting and hunt only with those people. I was able to build test populations were the early-game rep was in the neighboorhood of 20% because of a large proportion of "always slack" players, but a small handful of cooperative bots could identify each other and starve them out.
Log in to reply
The problem with a hunt-only-with-those-people strategy is it only benefits you significantly if there are a significant number of them who also use the same strategy you implemented, as in, they can identify that you hunted with them too.
Otherwise, the only advantage I can see from that is you're hunting with someone who's using up 3 food on you in the hopes that they die soon and waste your food. However, I think the differences made by who you choose to hunt with is tiny when the number of players is large.
Log in to reply
They could also be keying off reputation as well. Assuming they go by a relative threshold and not an absolute one, that is. For instance, my bot + a bot that only hunts with reputations >= itself will still prop each other up against a large population of slackers until the slackers die off.
I really, really wish I'd simulated my bot with others. I have no idea if the weights I put on the different sections of my algorithm are right, and reading the comments here I'm not sure I even analyzed the game correctly now.
This is something I thought about during the competition, and maybe could qualify as a "question #5":
In general, should an ideal/optimal strategy for this game implicitly collude with copies of itself?
On the surface, it seems the answer must be "yes." After all, if a strategy does not collude with copies of itself, then the more copies of that strategy there are in the population, the lower they'll all tend to score. The strategy becomes its own worst enemy! If it does collude with itself, then adding copies of itself to the population at the very least doesn't hurt it, and might even help.
Although strategies can't spawn copies of themselves in this particular competition*, clusters of people seem to have independently come up with very similar strategies. Such clusters of strategies that don't implicitly collude with each other may end up bringing everyone in the cluster down!
My hypocritical strategy was to pretend to be a cooperator, and enjoy a good reputation, while secretly sabotaging my fellow cooperators. It's a strategy that works for politicians, so why not here? I start by calculating an ideal reputation. If I were to cooperate with other cooperators, what would my reputation be? I then make sure to maintain my reputation with the appropriate number of hunts. The hypocritical twist is that I do not actually cooperate with the other cooperators. I slack against them, while hunting with the slackers. Why would it be beneficial to do this? If I slack against a hunter I earn +1 food, and if I hunt against a slacker I lose -3 food. This equals out to -2 food. What if I had been honest and cooperated with the hunters and slacked against the slackers? I would earn 0 from the hunter, and -2 from the slacker. -2 = -2, so these two strategies are equal. However, I am benefiting by being a hypocrite in two ways: One is that I am hurting the other cooperators, moving me ahead. The second is that some of the slackers may actually hunt with me because of my reputation. In that scenario I would get +1 from the cooperating hunter that I slacked against, and then 0 from the slacker that decided to hunt with me. Part of my strategy is to pick out the slackers most likely to hunt with me on occasion, so I will hunt against the slackers with the best reputations. This competition really got me thinking. This was the fifth strategy that I developed, and I kept this one mainly because of the parallels with nature.
In the first place, I only heard about the competition last Thursday, and only started coding my actual strategy the day of the deadline, as the day before I had tried to come up with an overcomplicated plan for the length of time I had to develop it. :P The deadline extension would have been a lifesaver, had I actually finished debugging it and got it to work for more than one round, so you will not be seeing me in even preliminary rankings, I think.
The more complex strategy I had planned was a specialization in the game's domain of the general AI I am perpetually planning. I will not share details of this AI's design. I thank brilliant.org for turning me on to the superior development speed and high-quality libraries of Python. I had always disparaged it before, but the competition being in Python was the reason I was able to develop a relatively complex strategy so quickly. If I had heard about this competition when it started, or at least a week before the deadline, I might have made a better showing (i.e. any showing at all).
The less complicated strategy I developed [would have] used several calculations to get some most probably salient data streams from the information given. To list all the data I [would have] used:
These data streams would all be correlated with repute to give the weights each data stream's current value would have to make a calculation of the desired reputation, and thus the desired number of defections in the strategy.
As for deciding which people to cooperate/defect with...
scipy.stats.gaussiankde was my friend. There was no other tool to help me get an estimate of a continuous probability density function from the discrete data provided. For every round, I would add in the reputation distribution and the hunt outcome associated with each, if they cooperated or defected. It was sorted by time, and the amount of food I currently had affected how many samples I drew from the database, or the short-sightedness of the prediction, essentially. I would take a large amount of the most recent rounds, (should have been up to fifty or so, but the amount of food affecting the short-sightedness had a bad calculation, like antilogarithmic growth divided by a smaller antilogarithmic growth) draw up a distribution of the cooperation instances normalized by distribution of population, which gave the predicted probability density function for the current round. I would order the playerreputations by the given predicted likelihood, then defect against those least likely to cooperate, up to the optimal limit of defections given by my desired reputation. I wasn't able to implement betrayal of extremely likely cooperators in time.
To the given discussion questions:
Why does always slack not necessarily win?
Slacking is of course the Nash equilibrium. However, in a dwindling-resource game such as this, it is not a successful long-term strategy, as well as supremely ineffective if everyone always slacks, as in the tragedy of the commons. Everyone always hunting is the superrational equilibrium, and is the most sustainable strategy, a "success of the commons," as it were. Everyone gets a constant food growth from the m reward, and no one dies forever. A bot that always slacks would gain food equivalent to the number of their hunts, in other words, half of the reward. But this game has reputation in it as well, and a strong 1000USD incentive to win, which means that bots can be tracked, and there is a high likelihood that people will enter smart bots instead of 0th-level players. The reward complicates things in a smart-bot reputation game. Ultimately, the smarter thing to do is to cooperate as much as possible to increase the likelihood of the reward as well as the value of the reward, while defecting as much as possible on likely defectors to prevent unrecoverable losses, and on likely cooperators to get a leg up on other competitors. Then there's metagaming above that, player tracking, the applications of machine learning and other adaptive affectors, and so on.
How does one identify individual players and play tit-for-tat or other colluding strategies? Can one even?
Player tracking. This was going to have been part of my more complicated model, in cooperation with a strategy analyzer to predict likelihood of cooperation in a smarter way. I believe it is possible. As this comment states, a player can move their reputation only so far in one round. I used this in part of my desired repute routine, though not as effectively as I would have if I had more time. I was also originally planning to use this as an indicator/perceptor in the more complicated player. Each player_reputation given would be assigned a connection to other possible reputations in the last round, with weights, as in a graph formation. As reputation stabilized, it would become a very reliable indicator of past identity (not empirically tested, but it is my prediction), and would contribute as another heuristic to strategy guessing, probability of cooperation given reputation, optimal action, etc. It would get less accurate with more players, but it would still be useful in the endgame, as the number of players thinned out, and reputations stabilized even more.
Colluding would require the use of outside communication, obviously illegal, or Schelling points. An obvious Schelling point would be a reputation of 100%, but how to verify that it is due to plotting collusion instead of dumb luck or dumb strategy is not as clear. Still, for bots who can track one another, making yourself distinguishable by being on the high end of the distribution for a period of time suitable to stabilize your position/trackability is the true Schelling point in consideration. Working out a strictly numerical solution would not be best, it would not result in as many collusions (given that you have the exposure to other colludable bots), and the others would be better off for it as you would have less ability to influence game outcome.
Then it comes to betraying alliances. Metagaming to the metametath power. 9_9
What is the purpose of m? It adds to everyone, so what difference does it make?
The difference is in the weighting. From the rules, it is twice the amount of times you've hunted, I believe. This is strong enough to motivate a smart game theorist, I think, as it rewards cooperation, more than individual rewards for defection would influence the decision, all things being equal, and it would also be a motivator to trick others into not hunting as much while still getting them to add to m's likelihood of success. It wouldn't recoup losses at less than or equal to a 2/5 betrayal ratio, and that wouldn't be enough to recoup the losses from its mutual defections. M is still a big thing to take into consideration, though, as it pushes permaslackers further down the winning scale into no-fly territory. Given smart enough algorithms, of course.
Is it better to construct a really good fixed strategy via game theory, or try a bunch and evolve the strategy over time (machine learning)?
Given a month, wouldn't someone be able to construct a machine learning-principled bot that can analyze its opponents for use of game theory, near-perfect or flawed, and then exploit the resulting predictions? Also consider that the population of the tribe will not consist of a predictable group of strategies, so at least some method of perception must be utilized for even a fixed strategy bot.
That said, I really wished I had heard of this when the competition was announced. ;_; I hoped to leverage this particular contest into making me a small amount of money to buy a laptop of my own. I'm always using someone else's...
Lot of interesting stories here, accompanying the strategies. I, too, wish I had heard about this competition (or indeed, even this site) earlier, but since I was busy with summer classes/exams anyway, it likely wouldn't have mattered. I ended up implementing my solution on the 17th and 18th. Here's a quick summary for those interested:
My approach is based on machine learning, I suppose, though I didn't use that term in my README. (It didn't sound sophisticated enough for that label, haha.) Basically, it builds up data on past player behavior to aid in the choice to hunt or not. In particular, I implemented calculations for the probability of getting the award, as well as the probability of opponents as a whole to hunt, refined round by round. Award probability, since it is highly dependent on the random m integer, is used sparingly, only in the calculation of expected food returns from future rounds. Hunt probability was used to estimate the amount of hunts that could be expected from any one round. It was only used in my "Food Opportunist" algorithm (more on that below).
The solution has a few extra features. For example, I called my script "FourFaced", due to the four main algorithms I implemented, representing four complementary play styles/personalities (get it? It's like TwoFaced, but...more). The aforementioned "Food Opportunist" plays offensively, attempting to score the most hunting awards possible, slacking otherwise, but ignores food and reputation. The "Indifferent Observer" bides its time, attempting to keep these values stable, in order to better collect probability data. The "Social Climber" and "Frugal Survivalist" are emergency correction algorithms, designed to quickly accrue and/or stabilize low reputation and food, respectively. The former always hunts, while the latter relies on a safer but less paying version of the "Food Opportunist" algorithm.
(You'd be surprised how much thought I put into those names. I may get my butt handed to me in the competition, but at least I might win points for creativity :) )
There's some more logic in switching between these strategies as well as a few "safety measures", but I've chosen to keep the specifics out for now so as not to crowd up space. If anyone is interested in more, this would be as good a time as any for me to figure out where to post READMEs online...-_-
In any case, it was quite fun to come up with solutions for the challenge. Thanks for the opportunity, Brilliant, and good luck to everyone (even though that's not technically possible in a game like this)!
Hey everyone! My partner Justin L. and I created an algorithm that does not rely on any assumptions other than the rules of the game.
The algorithm features a few cool aspects:
1) Iterated linear regressions (partial correlation method) for maintaining the optimal reputation.
2) Recursive opponent tracking and collusion with individual players
3) Hunting with weak players, slacking with strong players.
4) "Wasting" our reputation by the end of the game to gain maximum benefit from it.
Feel free to take a look at our write-up here, and tell us what you think! If you want to take a look at the code (about 800 lines of it), we'd be happy to share that too.
Log in to reply
I have finished reading the paper; here are my impressions:
First, you have probably made the most profissional work here. Mathwork is excellent. Except the partial correlation stuff, which I did not understood, everything else has more logical rigor than I could possibly expect. There are some things, however, I believe could be improved. I believe some of the info you guys collected from linear regressions and such are underutilized.
First, at section 2, each player's food can be way better estimated.
"we know with absolute certainty all contributing terms except for the overall number times the opponent was hunted with."
This is, unfortunately, the bigger part of the sum. I think an useful estimation can be made by considering the average reputation, which can be used to calculate the total number of times opponents hunted with someone (except you). You can consider the 'a' slope from formula (7), at section 4.1, to create a distribution of expected hunts received by each opponent. You guys ended up ignoring this calculation altogether and doing it the arrogant way. I like that.
However, my algorithm, which does very well on simulations, would be considered as weak by your code. I don't think also that reputation is a good indicator of how much food someone has. Take two examples:
Player 1 has an absolute advantage, as the integral of the reputation, its expected value, is much, much, larger. Player 1 is close to my algorithm, and will be considered as weak by your code. Good for me. :D
At section 9.1, you say about the dangers of over-interpreting data.
"One of such methods attempted tried to calculate a dependency of our benefict on our reputation change, much like in the Reputation Block."
This is exactly what I wanted for my algorithm, and I did so. It worked very well, despite the fact that it relies reavily on slope. And my slope was poorly calculated: my regression required many simplifactions and had little statistic rigor. I really believe this could be implemented, given the precision you had on these parameters. You just have to aproximate the harmonic series as the logarithmic function, in order to calc the effect reputation has on the integral of reputation.
About considering the relactionship as linear, this is a simplifying hypothesis - I don't see how a deviation from linearity can affect calculations so badly.
I am also afraid that so much energy is wasted on the communication efforts. Do you really think many algorithms will correctly implement the tit for tat to justify such a large investment? Also, is action toward you being better than action against others a good indication that the algorithm has responded to the communication? What if opponent is simply rewarding you for having a higher reputation? I can't understand either why you choose not to implement a code that can test whether or not opponents are actually listening to the communication attempts. It seems easy: just store his actions and correlate with your last actions, so that what he did is paired up with what he saw as the last action you had on him. You can then help if and only if there is a statisticaly significant relation between data.
But, anyway, your code is awesome. I believe it will perform well, but I don't think it will win first prize. It will probably win a 1000$ prize anyway.
Log in to reply
Wow, thank you for your feedback! I'll get right into responding to the criticisms: 1) On estimating amount of food of other opponents. A) First, my argument against using average reputation (or average anything for that matter) - since we are interested in the relative performance of players against each other (not compared to us, sincere we feel the average is a very poor indicator of whether or not we are doing well), using the average of their performance adds no new information.
B) Did you read the Appendix talking about the poisoned wine dilemma? If you assume linearity, it may work with simple algorithms as competitors, but as they get more complicated, things may get very strange, and surely non-linear. In fact, I don't even want to assume higher reputation means you'll be hunted with more :)
C) As I said in the paper, if one's "state" is the reputation, the total num times one is hunted with is not a state function. Thus, unless I m tracking your two example players for most of the game (which is impossible, since their reputations will cross other reputations many times) there is NO way to estimate their foods properly. At all. In fact, remember my indistinguishability axiom? It applies here even if their reputations aren't exactly identical, if you think about it.
D) It really does not matter with whom we hunt and with whom we don't. It benefits us very little until the end game, and by that time, I would be tracking you anyway, and not let you take an advantage of me ;)
2) About tracking. You ask great questions. A) you say too much effort is wasted on this. That is not true. With my tests, (and mathematically, too), if I collude with 5% of all players, the advantage is very significant. This is exactly why the more responsive and universal my collusion method, the better - it would link with more people. And the closer we are to the end of the game, the greater the concentration of stronger algorithms, thus the more important collusion is. So no, collusion is by no means a waste.
B) Action towards me being better than towards others, could it be that it's just a response to my reputation? Well, first off, since, again, I am interested in the relative behavior of opponents, my reputation as a variable disappears. Next, about my being able to differentiate between those who really are colluding and those who are not. Your correlation idea is reasonable, but also flawed, since, in reality, since I am just mirroring my opponent, the correlation isn't so much of my moves against his but rather of his own moves cross correlated against his own moves with a phase shift of 1 round. So by this method, a 100% hunter would be at the top of my list, clearly not good. And testing my partner by not hunting on purpose just to see what happens may hurt my collusion with real colluders (think about what happens if everyone tested each other - collisions just wouldn't form), so we don't want to do that.
Now, to justify my method. Think about it not in terms of whether it will work with everyone, but rather will it be able to bring to the top of my list of priorities those who are really colluding. I feel that it will. Most collusion methods will be tit for tat, which would mean 100% hunting between us (in which case, colluding with a guy who hunts a lot on average is less important than with one who slacks on average, as per my method). Those who may be all crazy with their colluding method would then, actually, not be ranked very well by your method either, since correlation would be lower with them too.
Of course, a combination of the method you suggest and method i have would be optimal. But there is sometimes simply not enough time, so we made the best of it. Correlation is quite complicated if done right.
Finally, your comment about the grand prize - a universal algorithm just can not beat one that predicted the optimal reputation through some sort of analysis. So as for us, we're just surviving. :)
For whatever reason, the write-up wouldn't show for me, so I'll ask questions:
About #1: Could you explain this some more? If your linear regression is just the kind of straight-line, 2D regression I'm thinking of, wouldn't it always tell you to have a reputation of either 0 or 1?
About #2: What role did recursion play? More to the point, what about your opponent tracking was more easily done recursively than iteratively? I don't know why that detail jumped out at me, but it did.
About #3: How do you determine the strength of a player? My first thought would be how much food they have, but we're not privy to that information. Unless you're able to track that, as well?
About #4: How do you determine when the game is about to end?
Log in to reply
In short, this is how the four methods work:
1) Begin with reputation .7, descend (linearly) to reputation .3 in 25 turns. Use linear regression (elimintaing all explanatory variables other than our reputation) to see whether it's better to have a high or a low reputation on the interval [.3, .7]. Say, it is better to have a high reputation. Then, we take the interval [.5, .7], and then vary our reputation from .5 to .7. Repeat. Once the interval becomes smaller than .08 (experimentally chosen), instead of making the interval smaller, we follow a Markov procces of simply adjusting the regression's center, allowing our reputation to 'drift' into the optimum.
2) The basic way to track is to put two reputation lists from consecutive rounds side by side, and for each reputation from round t, calculate to how many reputations in rount t+1 it could go. If there is just 1 reputation to which it could go, we tracked a player. The recursion comes in when we take all the pairs of reputations that were tracked, delete them from the two lists, and repeat the pairing process. Remove tracked pairs and repeat until no more are tracked. This allows for a very significant number of additional tracked opponents without any probability and guessing.
3) Yes, strength is measured by food. After several attempts at determining how much food opponents have (that's in the documentation), we turned to a very arrogant assumption that a few people have mentioned here: Since we think we have an optimal reputation (although there probabily won't be just one, so this method is not very good, clearly), the farther an opponent is from our reputation, the less successful we believe the opponent is. This isn't very effective, of course, but it is better than nothing, and, frankly, it makes very little difference.
4) The game is about to end when we are about to die. There is no other way (to my knowledge) to do it reliably, because tracking the average of all players is pretty useless, since players will vary greatly in performance. We again use a linear regression to extrapolate our own food levels. Our 'starvation' method begins to act 500 turns before our predicted death by limiting the amout of times that we allow ourselves to hunt. The closer we are to death, the less we hunt.
The most important part is 1). Second-most important is probably going to be 2), since colluding even with some 5% of all players gives you a very big advantage (btw, collusion used the Generous Tit for Tat). Next by importance is 4), as it really gives you a boost by the end. And 3) is just there, hopefully a bit helpful.
I hope this answers your questions.
Log in to reply
Just to make sure I'm understanding your "linear regression" correctly, are you just using the sign of the slope to tell you whether to take the top or the bottom half of the interval?
Your algorithm sounds like it could work very well; however, clearly the optimum reputation will drift as everyone's algorithms go to equilibria and people die. Can you explain more clearly how the "Markov process of simply adjusting the regression's center" will work?
Log in to reply
May I look at the code? I'm very interested. Many functions I needed (and implemented poorly), you guys seem to have implemented in elegant and mathematically correct ways.
Maybe you could e-mail me. (contact at www.kelvinsantos.com)
Log in to reply
Nice music suggestions. And I sent you the code.
I used simple mix of game theory, VERY simple machine learning (which may not be learning to the letter), weighting and tiny pinch of strategies. No idea how it will perform against other smart/unique algorithms but it was the winning one among dumber algorithms during my brute-force tests. Just curious here: Did anybody use ANNs? Would be interesting to see how those perform even tho they may need lots of training.
Log in to reply
What are ANN's?
Log in to reply
Artificial Neural Networks. They're actually rather simple if you just take a peek at them. Edit: I just noticed I could have added some "smoothing" to learning curves to mimic human behavior >.<'... oh well.
I was on the brink of using ANNs to map personal reputation to expected payout, but ran out of time. The idea was to train it on each sample as it came in, but only once per sample. Ideally, the ANN would perform a kind of regression on the data and be able to "intuit" the optimal reputation from that. Would not surprise me if someone managed to get something like this running. It'll probably kick serious butt. Or lose against dirt-simple algorithms. Based on my experience, it could very easily go either way!
Similarly, I was keen on using genetic programming, which seems near-optimally suited for a domain like this, to "grow" an algorithm. Again, lack of time prevented it. Am quite interested to hear from anyone who did well with this approach.
Log in to reply
Indeed using any kind of machine learning is kinda risky in this competition. It makes it more adaptive in general but then there's a chance it looses to most simple bots. This is what mostly happened when I tried my algorithm with 100 other random ones (5 unique ones + my algorithm)
I hated when my algorithm lost to a stupid freeloader (always slacker) bot during some runs when I was testing.
My program bases its strategy off the assumption that people will give hunts to people who have hunted the most previously. (which it seems like is true for most strats_
First, it determines how often people give out hunts, on average.
Then it determines how many times it would have to hunt to be in that top percent. (so if people hunt 30% of the time, it tries to get in the top 30% of the reputations) Lastly, it tries to get to that amount of hunts.
Log in to reply
My agent does this as well. I try to move towards this target reputation gradually by throttling my hunting up and down with M.
I worry big time though, that this target reputation will become less relevant once people start dying. I wonder if I should have based it on the last N rounds (or used an exponential moving average), but alas I based my target reputation on the cumulative hunting % from the beginning of the game.
What does this mean?
Log in to reply
That part was just to complete the thought "It determines how many times it would have to hunt.... tries to get to that amount of hunts." It just hunts the right amount of time to get to the target
My answers to the starter questions:
Always-slack fails if more than 1/3 of the tribe refuses to hunt with it. Slack-slack gives a -2 payout, while slack-hunt gives a +1 payout. If s is the fraction of the population who slacks with us, we have an expected payout of s * -2 + (1-s) * 1, which simple algebra shows is < 0 when s > 1/3. (Hunt bonuses don't really change this fundamental weakness of always-slack--they just mean always-slack is less likely to die, but it'll still place lowly.)
To identify players across rounds, you'd have to wait for (and count on) each player's reputation to converge to a mostly fixed value, then try to map reputations to individual players. Since more than one player can have the same reputation, this is already an uncertain process. Further, not everyone will converge to a particular reputation. It was my belief from very early on that tracking player was too complicated, too error-prone, and, considering the utter simplicity of the strategy that won the original Prisoner's Dilemma tournament, probably a waste of time. Someone may well prove me wrong on this, though.
The purpose of m, and the hunt bonuses, seems to be to cancel the base -2 payout. That is, the expected payout of a play can be expressed as payout(rthem, rme) = 3*rthem - rme - 2, where rthem is their reputation and rme is mine (and assuming that we hunt or slack randomly). So, their hunting is worth +3 to us, and their slacking is worth 0 to us. Our hunting is worth -1 to us, and our slacking is worth 0 to us. The base payout is -2. When a hunt bonus is awarded, you get +2 food for everyone else in the tribe. This effectively cancels the -2 base payout, giving a second payout table. It makes a difference if you're trying to accumulate as much reputation as possible at as little cost to yourself. Whenever there's a hunt bonus, you can hunt with more people of lesser reputation and not take as much of a hit. Also, hunt bonuses allow the game to go on forever, a benefit to algorithms that take a long time to hit their stride.
Due to the very simple, fixed solution to the original Prisoner's Dilemma contest, and how it beat a bevy of more sophisticated, machine learning-based algorithms, my belief throughout most of this contest is that a good, simple, fixed strategy is best. I came to believe (too late) that a strategy with some form of machine learning will probably take first prize, though, since mapping personal reputation to actual payouts seems to be the secret to consistent success. However, I'm looking forward to being blown away and proven completely wrong on this. It certainly won't be the first time!
Log in to reply
Ah, I see my analysis is wrong, at least about the reward. I just reread the rules and saw that everyone gets 2 units of food per hunt, not 2 units of food per cooperation. Ugh.
I still think weighting the reward based on how many hunts you did would be a good 'punishment" for low-rep bots, though.
Our strategy assumes that more reputation --> more people will hunt with you.
It spends the first 5 rounds only hunting, and the rest of first 30 rounds hunting so that its reputation would remain in the top 10%. (because the first few rounds will have the longest impact on reputation)
Afterwards it's allowed to drift towards the median reputation, while every round making a calculation taking into account the gains from last 3 rounds, reputation change with respect to the median, and food gained from others with respect to the average food gained by everyone, and expected number of rounds left (based on average food gained by everyone and amount of food I have). (Basically taking a derivative of expected number of food gained due to reputation with respect to hunts spent) This calculation will show whether it is more worth it to increase reputation next round with respect to the median (by hunting more) or lower it (by slacking more).
When picking who to hunt with, it hunts with the people with the most reputation and the people with the least reputation, because both populations are considered most likely to die (so that the food given out can be wasted).
Since we only had a few hours to code this, some constants will need adjusting and we could not test it out adequately. In addition, we made 2 bad mistakes in the code that made the implementation different from what we expected: 1. We forgot to subtract out the gains from our own hunting decisions from the gains-from-other-people we used. This would have a net effect of us hunting more than we should. 2. We put the number of rounds remaining on the wrong side of our inequality, and we forgot to correct for when the number of rounds remaining would come out negative (meaning we would survive indefinitely). This means our entire effort in this calculation was pointless.
So, since the calculations are pointless, the features that did work would end up making this implement a similar concept to Chad M's mole strategy (in terms of giving hunts to people most likely to die) as well as someone else who mentioned that the first few round are most influencial in terms of reputation.
Log in to reply
Also I noticed that in the long run, this strategy is designed to kill off other copies of itself while rewarding people who use different strategies that get more extreme reputations.
I would like to see someone provide a good answer to #3: What is the purpose of m? It adds to everyone, so what difference does it make?
For my strategy, it strongly influences the calculation in the number of rounds that are expected to remain. More remaining rounds would increase the value of raising reputation, as the higher reputation is expected to last longer. Less remaining rounds would decrease that value and make it favor slacking, as there is little gain in using up 1 food and giving out 3 food in hopes of potentially gaining 3 that it might otherwise already receive.
Log in to reply
One effect m has: if no common goods are awarded, it is possible to starve out players who always slack. If common goods are always awarded, it is impossible -- the payoff for slacking is nonnegative.
I chose to monitor hunt threshold, average opponent reputation, my accuracy, and the streaks of accuracy. By using streaks I could determine if I needed to reverse course or size of adjustment. This allowed me to dynamically change the amount of times I hunted or slacked to give myself maximum hunt reputation without losing consistent rounds (losing was defined by my accuracy). The idea was to make others believe you will hunt. The only way to do that was to have a better than average reputation.... I just want the shirt....
My group's first concern was making a simulator to emulate the game conditions as laid out in the rules. We were all most comfortable in C#, so we created a game simulator that would take any implementation of IPlayer and run it through the game, with modifiers for duplicates of a player instance, and a max number of rounds. Of course, there's a risk here of misinterpreting a rule, but we all gave it a quick code review and hopefully got somewhere close with the game implementation.
After we had that set up, we just started throwing player implementations at it. We had the few simple ones that were given as examples (random, always hunt, always slack) and started looking at how they compare against one another. Then as a team we just started throwing different ideas into player implementations and saw how they stacked up against all the other players.
We tried really complicated algorithms, we tried quick and dirty ideas. After it was all said and done, after simulating thousands of games we had 3 approaches that seemed to do well against an Always Slack player (what we considered our benchmark, since it won more often than not):
1) Hold a Grudge: quite simply, if you had a net loss of food last round (including possible bonus), slack.
2) Karlee (named after its author): (almost always) hunt until you have the highest reputation out of the group, and then slack the rest of the game. This was based on the idea that once you have more food than an AlwaysSlack player, you can just slack and outlast them because you have more food.
3) Hunt If Hungry: if we lost food last round, or if we go below our starting total of food, hunt
It's entirely possible that these three player implementations are only successful in our specific environment - with the other player implementations we've introduced, or with our (possibly wrong) interpretation of the rules, but they're the top ones we came up with :)
Log in to reply
Once you'd tested a strategy, did you remove it from the population or leave it in? It sounds like you took it out. I left all of mine in, both to add variety, and to tell if my latest "bright idea" was any better than my previous ones.
Log in to reply
We left all of them in - when we were done, we had 20 different strategies competing against one another, for 1000+ games at a time. Those three players were consistently in the top 4, with hundreds of games above the rest.
Log in to reply
Log in to reply
Log in to reply
So far I have not seen anyone making use a reputation groupings system to determine the course of action against a specific group.
My algorithm's core idea is to group the public into 20 different reputation groups, the 0.00-0.05 group, 0.05-0.10 group, 0.10-0.15 group, and so on. Then I will proceed to calculate my cooperation probability depending on how people from that group did previously.
While there may be different algorithms within the same reputation range, I suppose this will separate cooperative people from non-cooperative ones, at the very least. Now that I am aware that tit-for-tat is the most tested algorithm for prisoner's dilemma, I suppose this groupings technique will help a lot in deciding which players slack against you?
As for question number 3, I would argue that m makes a difference for people struggling to survive; that is, to buy time to get a t-shirt. As mentioned by the rules, there is no maximum number of survivors, and there is a fixed rule for the number of rounds. Hence, when we realize that our strategy is failing, we can switch to a 'survival mode' and we should aim for m whenever it is feasible to be achieved. As for me, my survival mode will take the number of hunts from the previous round (hunts by myself excluded) times a factor (of 0.9, this was decided by trial and error), and compare it with the value of m. If it is greater than m, I feel that I do not need to make that extra hunt. If it is less than m and my extra hunts can contribute (and there are people who are likely to cooperate), I will go for the hunt. If the margin is too large, I will stick to my standard hunting system.
I also have an unfinished feature due to time constraint: Detecting when you are being ostracised by the society. This is when people hunts in general, but you are not getting the hunts. Likely to happen when rep is too low I suppose. A great ostracization detecting algorithm to prevent starvation would be great, but also risky. I would love to hear from anyone who have thought of this feature.
Lastly, Good Luck for the competition people!
Log in to reply
I considered group systems, but changed my mind later on.
Log in to reply
in the end, what do you use?
Whenever I thought about bucketing (grouping), I quickly realized that what I really wanted was regression analysis.
Log in to reply
The problem with regressing, is, however, we may not have a nice curve due to some outliers. How would you handle these kind of lines?
Log in to reply
Oh, hey! Our group did the exact same thing! .. Well, except, we made our grouping two-dimensional. A matrix, if you would. So that the the observations we recorded were not only based on what their reputation was, but also what our algorithm's reputation was at the time. It worked quite well (in our testing environment, and with the right parameters, at least). We tried variations of it as well (e.g. smaller-sized groups, supersampling), but decided from test runs that they produced no conclusively significant benefits, and at times produced quirky side effects instead. Again, it might have been our testing environment (i.e. the algorithms we pitted it against in our test runs).
If I recall correctly some other algorithms mentioned hwre went with some form of grouping, or "clustering", as some would like to call. So we were probably not the only ones. In our discussions my team thinks one reason this works quite well is that it embodies some form of player-tracking, but also more importantly "community behavior" tracking.
Ostracization detection sounds like a very intriguing concept. It does make sense to figure out, if hostility seems high, if "is it just me or them". Would have loved to see how an algorithm equipped with such measures would fare.
it looks like the conversation venue has shifted:
http://brilliant.org/discussions/thread/hunger-games-the-story-so-far/
Log in to reply
thanks
My algorithm formulates the expected hunting value for a round based on player reputations. It then does the required hunts (if any possible) to reach m. Based on how accurate its prediction was to the number_hunters at the end of the round, it refines an index that the expected hunting value is multiplied by in order to get the expected number of hunts.
Log in to reply
So your algorithm learns? Cool. I don't know if I would've based my strategy around m, though. Unless I'm missing something.
Log in to reply
m tells you a great deal about the way the game is being played.
Your algorithm sounds similar to mine. I take the player_reputations array, and, if it seems likely that my hunts will help us reach m, I am much more likely to do so. If it seems unlikely we will reach m–or if it seems we will reach m without my help, I slack.
Basically, it calculates this "likelihood of reaching m" via the normal distribution probability density function, but it also adds a level of selfishness to be sure that it doesn't hunt too much in the beginning.
If the number of players gets really low, I stop trying to survive and focus on the goal of being the last surviving player.
I'm regretting not trying to implement some machine learning or at least using numberhunters in correlation with playerreputations, though. I really wish I had heard of this contest when it was announced. I spent two days brainstorming and wrote my actual code in about 2-3 hours.
I submitted an incredibly simple strategy, inspired somewhat by the success tit-for-tat has with the standard iterated prisoner's dilemma. It took me 22 lines of code, most of which is white space. The main (and only) logic is described by a single if statement:
Put simply: my bot slacks against any bot that has slacked more than it. This deviates from tit-for-tat in that it punishes bots for every pre-emptive slack, instead of defections merely against itself. This makes my bot a reputable and maximally beneficial community member. That is, It punishes every hunter for every slack and never punishes bots that likewise punish every slacker. Another interesting effect is how severely lazy bots end up being punished by my strategy. Tit-for-tat is very forgiving in comparison, where a single defection is met with defection the next round, and co-operation thereof should the initially defecting bot behave as such. My bot, however, isn't as merciful. Given an example game of 10 players, a bot may defect against all 9 of its opponents in a single round, one of whom is my bot. In response, my bot with defect against this bot for 9 consecutive rounds of unanimous co-operation on its part before continuing to co-operate with it again, despite only being defected against by the offending bot once itself.
My bot can be described as tit-for-tat that treats any single defection against the honorable portion of the community as a personal defection against itself, while maintaining co-operation with any bot as or more co-operative than itself.
How do you guys think I'll do? Are there obvious ways to exploit my bot's strategy? I haven't thought of any, but I may be missing something. I'm hopeful for at least the surviver prize.
Log in to reply
The primary way to exploit this strategy is to hunt against other people but not you. I tried to make a strategy that did something like this but I couldn't debug it in time.
Log in to reply
I just posted my documentation on my website. The Mole is a strategy I designed specifically to beat a game with any number of this strategy >= 2 and one player that never hunts. I tried to generalize it to arbitrary reputations but was getting impossible answers from some bug I still haven't figured out. So I went with TfT instead.
Ahhh, right. My bot sacrifices its own self interest in favor of the interests of the co-operative portion of the population. I can see that now.
Identifying my bot in order to do that would seem to be difficult, but given the simplistic nature of my algorithm it may be pretty easy for some clever machine learning to do (especially late game).
One could exploit this logic just by gaining a high rep in early rounds and using it as an advantage later on, just like mine does (implying it even gets that far lol).
Log in to reply
While true, there's a big trade-off for you. Namely, you're letting slackers survive longer for every turn you don't defect against them. You're punishment is -3 for every extra time a slacker defects against you, but only +1 for every time you successfully slack against my bot. To confer a strategic advantage you'd gave to be able to slack against me bot 3 times for every extra turn (of slacking against you) of every slacker you allowed."
And the population is bigger at the start, meaning more slackers around then than later.
Is letting Slackerbot hit you with an extra -3 really worth getting a free +1 later on?
Log in to reply
This is a completely wrong way to think about things (but a very common mistake). If you hunt against a slacker but slack against a hunter, your earnings are -3 and +1, for a total of -2. If you hunt against the hunter and slack against the slacker, your earnings are 0 and -2, for a total of -2.
An exploitative strategy should always be thinking in terms of advantage against the field, not against a player.
Unless you are specifically considering strategies like mine that attempt to track players, a hunt is spending 1 food to give a specific opponent 3 food and buy reputation. A funny corollary is that if you could guess which players are about to die, you should always hunt with those players because you get the reputation increase without actually helping any of your opponents.
Log in to reply
When death is a factor, The effects of capital punishment can be very important.
Log in to reply
Against 10 copies of your bot and one "always slack" bot, the best response is to hunt against 9 of your bots at random every round, along with the slacker:
This kind of strategy can even work in a 3 player game against your bot and a slacker, though the exploitative strategy in that case still must hunt against your both sometimes to avoid putting the slacker in first place.
Log in to reply
So getting rid of a slackerbot is only a good idea if it slacks against your bot in particular with higher probability than your opponents... and only if it's by a significant factor as well, it seems.
Huh. I guess I should've expected my strategy to be trickable into something like this. These reputations are really interesting when compared to the standard format of the iterated PD.
Additionally: The expected payout of punishing slackers is always +1 more than not slacking, because these dishonest bots will slack against you regardless of what you do with them. Slacking against my bot (or any bot with >= reputation) will result in mine slacking against yours. That's a -2 for every slack you do with me. (This of course is after your reputation has re-lowered to mine. You do still get a +1 for every extra time you didn't slack before.)
I came up with (but did not submit) the same strategy, but in response to the question, "What's the simplest possible algorithm that could collude with copies of itself, yet remain as rational as possible?" The answer I came up with was, "The algorithm that hunts with everyone who has at least as much reputation as itself, and slacks with everyone else." A one-line list comprehension. I called it SimpleCollusionist. It was actually kind of a joke algorithm--I didn't think it'd end up doing anything useful. However, in friendly tribes (those where food tends towards infinity), it typically placed comfortably within the top quartile, outperforming many of my more complex strategies. In hostile tribes (where food tends towards 0), it always died off. As far as your own performance, I don't think "exploitation" is a big risk in this competition. It's more likely that your strategy will simply fail to converge on the optimal reputation, that others will get much closer, and that you will, therefore, lose ground to them.
Log in to reply
Do you happen to know which direction my bot's reputation is likely to error in?
Am I over-punishing or under-punishing, do you think?
Log in to reply
Log in to reply
And I suppose most importantly, what are your error bars for how closely your sims have accurately presented the behavior of the real tournament?
Log in to reply
As far as accuracy, assuming I understand your question correctly, I used Chad M.'s simulator code, which appeared to follow the rules of the tournament as far as I could tell, and appeared to have been vetted by many participants. Probably the biggest mismatch between what I was doing and what the real tournament might do is that I tended to use very long runs (10000+ rounds). I'm not sure what the organizers plan to do in this regard.
One-liner implementation:
Change current_reputation to the 10th percentile of reputations and you have what I submitted.
I (and my friend) use simple lazy learning (kNN) to identify which players are likely to be the prospective partners to hunt together. It uses last 30 rounds result for data training since some players might have died and I don't need their behaviour to be included in data training. At first, we want to identify each player behaviour, but we didn't have enough time for that.
After submitted and did some simulations with my friend's algorithm, I realized that I should betray the most prospective ones, add some tricky strategies, or consider m as a lower/upper-bound to the number of 'h'. Ah well, we are the deadliner, hope we still can get the survivor t-shirt!
The purpose of m? Some algorithms survive for a long run, some do not. Getting a bonus food is really make a difference to the game though everyone gets it.
My algorithm always hunts in the first round only to start off with a reputation of 1.0 to encourage hunting by other good and rational players.
In subsequent rounds the algorithm becomes:
To calculate mthreshold I first convert m from absolute terms to a scale between 0.0 and 1.0 by dividing m by: the number of players in the current round times the number of players in the current round minus one. This is known as the mratio. If mratio is near 0.0, this means only a few hunts need to happen in the round in order for everyone to receive the reward. If mratio is near 1.0, this means that even if many hunts happen in the round, we will probably not get the reward. Therefore, it makes sense to hunt with a higher probability when mratio is in the middle range. This probability is the mthreshold, which I calculate by simply transforming mratio in a linear fashion, going from 0.0 to 1.0 linearly as mratio goes from 0.0 to 0.5, and going from 1.0 to 0.0 linearly as mratio goes from 0.5 to 0.0.
I used a machine learning model. My strategy mainly considers these points: 1. It costs to maintain a reputation. 2. A higher reputation is likely to return more hunts from others
I created a list of length ten which stores how many people hunt with me when my reputation is in the range (0 to 0.1), (0.1 to 0.2) ... (0.9 to 1.0). In the first many rounds I try to go through all reputations and build this table. Looking at the matrix, I gain 3 food every time the other person hunts; lose 1 food every time I hunt (compared to the default -2 when no one hunts). So when I am at rep. 0.5 and others are cooperating 0.3 times at this reputation, I will gain (-2 + 3*0.3 - 0.5 = -1.6) food per hunt per round. I can calculate this value (net gain in food) for all reputation ranges (assuming 0.15 represents (0.1 to 0.2) range and so on).
Then I try to maintain the reputation which is most optimum reputation. Try to do this by cooperating more with those who have a higher reputation.
Further in certain cases try for m. I made another list which stores no. of players who hunt when m in the range (0 to 0.1), (0.1 to 0.2) and so on. If the ratio of number of hunts to m for a particular range is between 0.9 and 1.2 then try for m.
What if you: 1) cooperate with every one till reputations stabilize to some extent (up to your choice) 2) then work for social welfare: estimate expected hunt events S and compare it to m. If you can't influence social reward to take place go for some strategy A which doesn't take m into account, else make m-S hunt events with top rated players. For others go for some strategy B (which might or might not be the same as A) 3) while your food continues to grow stick with 2), when it goes below 95% of max food switch to Slack. 4) if your food grows above the last local minimum for more than 5% of it, switch to 2)
Log in to reply
Personally, I didn't have terribly much luck with strategies that did one thing if they could push the group to a hunt bonus, and another thing if they couldn't. Part of the problem is, there's often more variability in number of hunts per round than you can contribute (i.e., the population varies from the mean by more than the P−1 hunts you can contribute). Another part of the problem is, it's fairly rare that one single individual makes the difference between meeting m and not meeting it.
In other words, you're going to be spending most of your time doing strategy B, and probably half the time you do strategy A, it won't work (i.e., won't force a bonus).
One might be tempted at this point to raise the popular argument given in favor of voting (at least in the USA, where voter turnout tends to be problematic): "If you remove one vote from the tally, it almost never makes a difference, so there's no point in voting. However, if everyone believes that, there will be no votes in the tally at all! So vote!" The same reasoning could apply here: If you want more hunt bonuses, you should always hunt as much as possible, even when it "doesn't matter." It'll probably make a lot of other algorithms hunt more, since a lot of folks seem to be deciding whether or not to hunt based on player reputation (i.e,. number of hunts).
Log in to reply
Well, it depends on the population given and the way you estimate the total hunt events to happen. We used simplification - treated players as a RV's and used their expectations which doesn't take m into account when approximating hunt events to happen. We modeled this on various random populations of various player types. It was the best we got and since it was very data driven approach we used it. Will see how it actually performs.
I've written a blog post describing what my team's approach was here: http://www.chasestevens.com/blog/read_post.php?title=the+anonymized+prisoner%27s+dilemma+challenge+%96+exploiting+limited+knowledge&time=1376983736 . The basic gist is that we implement tit-for-tat using reputation sorting as a means of player identification, unless we have the opportunity to subvert other similar tit-for-tat bots by assuming another player's identity.
There seems to be no talk of algorithm-switching yet. Did anyone submit any code that was preset to switch algorithms under different conditions or time periods? Our team ended up submitting an algorithm that would switch algorithms after a certain time. As in, not merely modifying its own parameters (e.g. Machine learning like), but switching to an entirely different algorithm altogether. Anyone else did this? Our reason was that our later-deployed algorithm was not effecive enough until it observes enough data.
My player only hunts with players falling within top 30% reputations and that too with a probability of 1/3. I am very impressed by the performance of always slackers (using the engine introduced by Chad M.) and decided to be slacker while giving an extra edge to it by hunting with top rep players. I have made this strategy assuming the game would go as follows. Firstly all players who hunt excessively would be out. After a while excessively slackers would have really nice food with low reputations and some other clever players with balanced rep and food who wish to go father than slackers by making a small union who would help each other by fairly hunting and beat slackers in the long run. Eventually excessive slackers who have lots of food will start to lose food and others would start gaining slowly. (The process is so slow that's other reason I decided to be a slacker majorly). My algorithm made an attempt to hunt with those fair players though remaining majorly a slacker. That's it. Thanks for introducing this competition. I had a great time learning Python and designing this algorithm.
How long do you think it's going to take for someone to die? It's kind of important to the running of my code that the estimations are correct.
Log in to reply
Well, everyone gets 300(440−1)=131700 food, so it will take quite a lot of rounds before you die.
Log in to reply
The quickest you can die is 150 rounds because there are (P-1) hunts that you take part in each turn. Obviously I don't think anyone will die that quickly because most of the strategies seem to be biased towards hunting early on.
Log in to reply
You can lose at most 3 * (P - 1) food per round, assuming you always choose to hunt while your opponents always slack, so assuming there are no awards in the interim, the first player would die off after 100 rounds.
Of course, that's highly unlikely. Assuming equal chances of hunting and defecting for both you and the opponent, you would average (P - 1) lost food per round (so, 300 rounds). Also, since there is some bias towards cooperation, one can assume that the award will be given at least 50% of the time early on. This would average out to zero loss/gain. Minor deviances could lead to different outcomes; it depends a lot on your calculations and allowances. I estimated ~750 rounds, myself.
I hunt with those who have higher reputations for me with subtle adjustments based on the number of hunters in the recent rounds and massive adjustments when someone dies.
For Q2, there are actually a lot of ways to track players. I think that strategies which track players well are going to have an advantage and be the top placers in the competition (although as a caveat, they must also make good decisions based off the information the tracking provides).
I chose to implement tracking based off the idea that you can look at a player's reputation and discover how much it could change (go up or down). The next round, if only 1 player falls within that range then you most certainly are looking at the same player from the previous round.
Additionally, I extended this idea to clusters of players with similar reputations. That way, even if I can't track a particular individual I can track a group of players which appear to be playing in a similar way.
Log in to reply
About your last point, say players A and B are part of such a cluster (i.e. a cluster of two opponents) and one is always hunting with you and the other is always slacking, but they maintain the same reputation. Would your algorithm figure out who the hunter and who the slacker is?
Log in to reply
No. At the cluster level I track the entire cluster's reputation towards me. So in that case the cluster would have .5 reputation. But I cannot differentiate between the players A and B in the cluster.
Edit: So I guess what I'm saying is I think there is an advantage by tracking how a group of players behaves towards me specifically, over how a group of players behaves towards the tribe.
Log in to reply
Log in to reply
I want to maximize slacking while maintaining some level of reputation in the hopes that others will hunt back with me. I use the information I gather from tracking players individually, in a cluster, or just base reputation to decide how much I hunt with each until I've hit my goal reputation. Obviously, I hunt more with those individuals or groups which are more likely to hunt back.
Log in to reply
Log in to reply
Of course it is true that I get the best payout when I slack and they hunt. But the point is first I MUST allocate those N hunts, and only afterwards can I slack. In that case, those hunts had better go to players I think will hunt back.
Log in to reply
Note that either way you can maintain a certain reputation. I didn't bother with these reasons, so I just calculated how many hunts I needed for a certain reputation, and spread those hunts over my opponents.
Log in to reply
Log in to reply
I managed to do the same thing in a slightly different way: I keep track of the total number of previous hunting possibilities (summing p-1 for every past round), and then I can calculate the number of times each player hunted from the beginning. The matching is done by defining that the number of hunts should be bigger than last round's and the difference must be smaller than the number of players.
I believe I have developed a strong algorithm for this problem. I have called my game engine 'Kran'. My main idea is that by hunting early and slacking later you can maximize the sum of reputations. My algorithm dynamically define the best moment to stop hunting and start slacking. Kran also features an underdeveloped Image system which can identify some players with distinguishable reputations and tries to contact them. I am particularly interested in critiques on my main idea: hunt early and slack later. Please check my link for more details.
www.kelvinsantos.com/kran.html
It is great to read everyone's thoughts here. My solution uses a number of strategies but my favorite is called CommunityBackstab. It is chosen by the meta-strategy when the chance of hitting m this round is likely. The probability of reaching m in round i is likely when pm<1.0:
pm=mi/(raverage×P×(P−1))
where raverage is the average reputation of the remaining P players.
If pm<1.0 then we only need to hunt with pm×P colluding players to realistically reach m. So we hunt with those players and slack against the remaining randomly chosen (1−pm)×P players.
The advantage here is that we still get m most of the times but during the mad dash for m we occasionally back stab a random bunch of players that we are colluding with and they (hopefully) don't find out.
Of course, this assumes that colluding players will try to achieve m together and hunt more frequently in these rounds as a result.
based on :
my strategy is to stay in the "hunters" groups. So everyturn I do a basic 2-clustering (hunters/slackers) based on people reputation. then I hunt with the higher reputation until I set my reputation to the mean of the hunters group.
I start with hunting with everyone on the first round.
Ok, I didn't actually enter the competition. A friend of mine showed me this website and I read over the materials, thought about them, and did some back-of-the-envelope calculations and came up with an algorithm. But I've never used Python before and didn't invest enough effort to figure its ins and outs and code my thoughts...
In any case, my big idea was to keep things simple and use Tit for Tat as a model of sorts (a while back I'd read about a similar tournament for Prisoner's Dilema, which the Hunger Game strongly resembles). More specifically I wanted to hunt on the first turn. And on subsequent terms use a random number generator and basically hunt with a probability p which equals the reputation of my hunting buddy.
So treat people as they treat others.
Did anyone else use this algorithm? I'm still kind of interested in seeing how it could have been coded and how it might have done in competition.
Log in to reply
Good idea. This should just about do it:
Log in to reply
This is close but you want
random.random() < p
(since 0 <=random.random()
< 1 it even handles the edge cases correctly). This also means special casing round 1 to make it work as stated.Log in to reply
Thanks for the correction.
Log in to reply
I know of several people who came up with this one independently. It has some merit, but I know of some problems with it:
As players die off, reputations will carry artifacts of the living players' habits regarding the dead players and reputations become a less accurate predictor of current behavior.
Even if hunting habits are constant, it overvalues high reputations. In fact I believe if more than 1/3 of the competition includes this bot, the best response is to hunt 100% even if some other people are slacking.
Log in to reply
Thanks for the feedback. I guess I understand the first problem you mention. Ideally I would have wanted to implement TitForTat but it seemed a pain to reverse engineer information about recent moves when what we are given is the reputation and player identities are shuffled.
I'm not sure why the second point is a problem. Isn't the whole idea to encourage hunting and to punish slacking?
Log in to reply