Data literacy journal part 1

What are the challenges of trying to count the number of artists in a city?

Similar to the race and census example in the reading, trying to count artists in a city faces many of the same challenges, including the challenge of categorizations and self-identification. Whoever is organizing this count of artists has a difficult task of setting the right parameters to categorize who is and isn’t an artist living in that city. The parameters could be too wide, making the count irrelevant. Or it could be too narrow, which would exclude many different artists. Both a wide and narrow categorization of an artist could be prone to a multitude of inaccuracies and the organizers of this statistical count would need to establish their categorizations so that they align closely with the goal they wish to achieve with the data they are creating.

Then, there’s the challenge of self-identification. Mirroring the race and census example, you may not end up with the data you need to understand and potentially support particular kinds of artists, just as the self-identification race data may not support health-based research. If you have the resources for supporting certain types of artists and only rely on self-identification, then you may end up wasting resources because you spent them on self-identifying artists who may not align with whatever programming or argument you’re creating based on the data you just collected.

What questions do you have about the accuracy of the SNAAP data?

My first question I have about the report’s accuracy is why did they only use digital methods to collect answers? Email is an incredibly congested environment. Additionally, people are liable to use multiple emails over time, abandoning old email accounts. This on its own makes me question the SNAAP data accuracy, as it reflects arts alumni who are a). still using whatever email account their institution has on file and b). arts alumni who are engaged enough to look at emails from their alumni institution. You could even go further to potentially note that survey respondents are likely to have good email hygiene, since they saw this single email and chose to answer it out of the hundreds they might receive each day. Just seeing the sheer amount of people who don’t have email records, then have bad emails on file and then finally just never responded makes me question the data—especially considering the report’s note about how folks taking this survey could be biased towards positivity about this report. And this leads to my next question: what was involved in this “’shadow’ study with alumi from five institutions to test incentives and response bias.” (p. 10)? Was this study an email-only initiative too? What institutions were included? Has it been considered that institutional participation might have skewed the results of understanding bias? There’s a lot to consider and I feel like the sample of institutions for that study itself isn’t large enough, as they could have chosen five well-funded, prestigious schools that may have very satisfied alumni who would make it difficult to determine level of bias. Without understanding the methodology of that study, I don’t think I can trust the accuracy of the broader study because potential biases haven’t been fully understood, especially considering that they used digital methods that could yield their own biases.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at

Up ↑

%d bloggers like this: