Imitation, as they say, is the sincerest form of flattery. You see this in music with cover songs, samples, collaborations, and remixes. Are there differences in gender, race, age, or style in what artists get referenced?

Hypothesize whether there will be any differences in what artists get referenced on the basis of some personal, social, or cultural quality of the artist. Develop two explanations, one for why you may be right and one for why the opposite outcome may be right. Suggest two datasets or methods for testing your hypotheses. Then relate the content of a peer’s post to yours.

  • Restate phenomenon

  • Generate competing hypotheses

    • Prediction about effects in world
    • Propose a dataset
  • Respond with how data might test or inform their prediction, or vice versa


  1. Higher similarity score between artists, genres gender age …
  2. More culturally impactful, the more listeners an artist the more likely they are to get referenced

The phenomenon we are investigating is what differences there are in the gender, race, age, or style of artists that get referenced in cover songs, samples, collaborations, and remixes.

One hypothesis for what could drive the decisions of what artists to reference in their music could be their similarity to the artist making the referential song. This hypothesis would make the prediction that the song most likely to be referenced by a young male latino reggaeton singer, would be another young male latino reggaeton singer. This prediction could be tested using a dataset constructed with all of our features of interest (gender, race, age, or style) for the referenced artist and the referential artist. If our hypothesis were correct, we would expect to see higher ‘similarity scores’ for artists that collaborated together than would be expected by random chance or for artists that didn’t collaborate together.

Another hypothesis for what could be driving the cause of differences in gender, race, age, or style of artists that get referenced could be that they will reference artists that are most similar to the referencing artists target audience. This hypothesis would make the prediction that the most likely artist to be referenced with a young classical jazz musician, would be an older classical jazz musician because they are the most similar to the demographics of their target audience. This could be tested using a dataset consisting of our features of interest for our referenced artist, referential artist, and most common audience demographic. Our hypothesis would predict that the highest ‘similarity scores’ would be found between the audience demographic and referenced artist, rather than the audience and referential artist as would be predicted by our first hypothesis.


I also shared the hypothesis that one of the driving forces for what artists get referenced could be a connection between them based on experiences or backgrounds, leading to similarities in artists that reference and get referenced. I hadn’t previously considered factoring in qualitative data in the form of interviews to get a better understanding of the artists reasoning behind their choice in reference. The data obtained by a survey of this would likely give a great background for understanding the causes for an artist getting referenced.