Digital Phenotyping: Analytics Science from the Inception of Computer Networks (Data Pushing) to the Age of Fitbit Data (Data Pulling)

Blog Series: PUBLIC IMPACT ANALYTICS SCIENCE (PIAS)

About half a century after Erlang invented the queueing theory, an analytics scientist named Leonard Kleinrock developed a mathematical theory for the efficient routing of data in computer networks. He found that he could make use of queuing theory to optimize how data is transferred in such networks, which was instrumental in designing what we know today as the Internet. Kleinrock was at that time a doctoral student of Claude Shannon at MIT—the father of information theory, which is the science of how information is measured, stored, and transmitted.

Figure 1: Pattern of arrivals to an actual emergency room [Source: [1]]

At the time Kleinrock approached the problem, the main idea behind how data could be transferred was based on what is known as circuit switching. In circuit switching—a useful method of transferring data in phone calls—the bandwidth through which data are transferred between the sender and receiver is constant. As long as a call is going, the bandwidth exists and has the same capacity. Kleinrock realized that it would be a terrible idea to use circuit switching to allow computers to communicate with each other. For starters, computers do not send data at a constant rate. In the words of Kleinrock:

“They go blast! and they are quiet for a while. A little while later, they suddenly come up and blast again.” [2]

Is this pattern idiosyncratic to computer networks? Well, not if you think about it. Restaurants, roads, and even Emergency Rooms (ERs) see kind of similar patterns. Around lunchtime, your favorite restaurant sees a bunch of customers arriving. But almost no one enters in the afternoon. And then it comes dinnertime, when the restaurant sees quite a few people trying to get a table at the same time. Similarly, roads see a spike in demand during rush hours, but then in between rush hours, it seems that there are almost no cars wanting to use them.

And in ERs? If you look at the patterns of arrivals to ERs, like the one in Figure 1, you realize that around noon many people rush to them. But then, after midnight, not many people seem to need emergency care. What is more, as Figure 1 shows, similar patterns (though with some shift to the right) can be seen when hospital beds are requested for ER patients that need to be hospitalized after their ER visit. The same story holds for when patients leave the ER. So the pattern observed by Kleinrock, where demand comes in blasts, is much more general than computer networks.

This invites the following question: how should one match demand and capacity when demand comes in blasts? This is where Queueing Theory can help a lot. In the context of ERs, for example, various studies, including many of my own, have shown how principles of queueing theory can be used to save lives in ERs by cutting long lines [3,4,5,6]. You would not want to waste capacity by providing a constant bandwidth capable of handling maximum demand at all times, a solution that circuit switching would recommend. And this is the essence of what Kleinrock realized.

Notably, Kleinrock understood that to be efficient and reliable, network resources should be shared and allocated as needed. To achieve this, he proposed the idea of packetization of data. This involves splitting data up into smaller bunches that could be routed independently through any available network connection. Kleinrock’s work on developing mathematical models capable of gauging the benefits of this method of routing data, known as packet switching, enabled the technology that today we know as the Internet.

From Data Pushing to Data Pulling

Unlike Kleinrock and others, who worked to find the best ways of pushing data, modern analytics scientists are often more concerned with the vast amount of data that could be pulled. This necessitates knowing when and what part of the data should be accessed and used. That is, data pulling—an important aspect of modern analysis science—requires data filtration.

For example, the use of “wearables,” including smartwatches, Fitbits, electronically enhanced straps, patches, and other smart devices, can now record over 7,500 psychological and behavioral variables. AI and ML algorithms are now being used—along with digital phenotyping and mHealth methods [see, e.g., 8,9, 10]—to provide personalized recommendations by recognizing what variables in such a vast amount of data are useful for each individual. The immediate impact is on improving the overall well-being of society.

An example is our recent work [8,9]. In collaboration with experts from the Department of Psychiatry at Brigham and Women’s Hospital and Harvard Medical School, we made use of digital phenotyping to help patients with bipolar disorder. Using noninvasive data from Fitbit devices, we found that the occurrence of clinically significant depression could be detected with 80.1% accuracy, and the occurrence of clinically significant manic symptoms could be detected with 89.1% accuracy. The relative importance of variables in our prediction model revealed important insights into which Fitbit data features contributed most to our predictive models. Notably, we found that “median bedtime was among the five most important variables for predicting both depressive symptomatology and (hypo)manic symptomatology. Additionally, deep sleep-related variables were high in variable importance for depression symptomatology prediction, whereas REM sleep-related variables were high in importance for (hypo)mania symptomatology prediction.” [8]

Data pulling and its necessary component—data filtration—also allows for knowing when interventions are needed. For example, once you can predict that a person will be in a bad mental health episode, you can do a lot to intervene and prevent adverse outcomes such as suicide.

As the amount of data has increased and IoT-enabled devices are becoming more ubiquitous [11], the shift from data pushing to data pulling has become increasingly pronounced. The creation of the Internet required ideas such as packet switching to make data pushing efficient. What we need today in modern analytics is more efficient methods of data pulling. Of course, developing such methods does not replace the need for what I have argued in the prior posts, such as AI methods that can do causal inference and causal reasoning under ambiguity [12,13,14] or centaur methods that can combine human intuition with the power of AI algorithms [15,16,17]. Indeed, reaching the full potential of analytics requires a wholesome view of these needs.

References

Saghafian, S., Kilinc, D., & Traub, S. J. (2022). Dynamic assignment of patients to primary and secondary inpatient units: Is patience a virtue? HKS Faculty Research Working Paper Series.
Christian, B., and Griffiths, T. (2016). Algorithms to live by: The computer science of human decisions. Macmillan. P.207.
Nadias, S. (2017). Cutting the lines in hospital emergency rooms: Soroush Saghafian employs queuing theory to improve emergency room care. https://www.hks.harvard.edu/faculty-research/policy-topics/health/cutting-lines-hospital-emergency-rooms
Saghafian, S., Hopp, W., Iravani, S., Cheng, Y., Diermeier, D. (2018). Workload management in telemedical physician triage and other knowledge-based service systems. Management Science 64(11):5180–5197.
Saghafian, S., Hopp, W.J., Van Oyen, M.P., Desmond, J.S., Kronick, S.L. (2012). Patient streaming as a mechanism for improving responsiveness in emergency departments. Operations Research 60(5):1080–1097.
Saghafian, S., Austin. G., Traub, S.J. (2015). Operations research/management contributions to emergency department patient flow optimization: Review and research prospects. IIE Transactions on Healthcare Systems Engineering 5(2):101–123.
Saghafian, S., Hopp, W. J., Van Oyen, M. P., Desmond, J. S., & Kronick, S. L. (2014). Complexity-augmented triage: A tool for improving patient safety and operational efficiency. Manufacturing & Service Operations Management, 16(3), 329-345.
Lipschitz, J. M., Lin, S., Saghafian, S., Pike, C. K., & Burdick, K. E. (2024). Digital phenotyping in bipolar disorder: Using longitudinal Fitbit data and personalized machine learning to predict mood symptomatology. Acta Psychiatrica Scandinavica (forthcoming).
Lin, S., Saghafian, S, Lipschitz (M.D.), J.M., Burdick (M.D.), K.E. “Multi-Agent Reinforcement Learning for Mobile Health Interventions Using Fitbit Data.” Working Paper, Harvard University.
Saghafian, S., & Murphy, S. A. (2021). Innovative health care delivery: The scientific and regulatory challenges in designing mHealth interventions. National Academy of Medicine (NAM) perspectives.
Saghafian, S., Tomlin, B., & Biller, S. (2022). The internet of things and information fusion: who talks to who?. Manufacturing & Service Operations Management, 24(1), 333-351.
Saghafian, S. (2021). Ambiguity Versus Risk in Sequential Decision-Making: Incomplete Information, Causal Inference, and Reinforcement Learning. Public Impact Analytics Science (PIAS) Blog.
Saghafian, S. (2024). Ambiguous dynamic treatment regimes: A reinforcement learning approach. Management Science, 70 (9), 5667–5690.
Saghafian, S. (2018). Ambiguous partially observable Markov decision processes: Structural results and applications. Journal of Economic Theory, 178, 1–35.
Saghafian, S. (2023). The analytics science behind ChatGPT: Human, algorithm, or a human-algorithm centaur? Public Impact Analytics Science (PIAS) Blog.
Saghafian, S., and L. Idan. (2024). Effective Generative AI: The Human-Algorithm Centaur. Harvard Data Science Review (forthcoming).
Orfanoudaki, A., Saghafian, S., Song, K., Chakkera, H. A., & Cook, C. (2022). Algorithm, Human, or the Centaur: How to Enhance Clinical Care? HKS Working Paper No. RWP22-027.