The Theory of Information Processing and Psychology

The human mind processes roughly eleven million bits of information per second — but consciously attends to only a tiny fraction of that. This staggering gap between what the brain receives and what the mind actively works with is at the heart of one of the most influential frameworks in the history of psychology: the information processing theory. It is a model that fundamentally changed how psychologists think about perception, attention, memory, learning, and cognition — and its influence can be felt in everything from how classrooms are designed to how artificial intelligence systems are built.

Information processing theory emerged from the cognitive revolution of the 1950s and 1960s — a seismic shift in psychology that moved the field away from behaviorism’s exclusive focus on observable stimulus-response relationships and toward the serious scientific study of mental processes. Researchers like George Miller, Ulric Neisser, Allen Newell, and Herbert Simon began treating the mind as something more like a sophisticated information processor — a system that receives input, transforms it through a series of stages, stores it, retrieves it, and uses it to generate output. The computer, newly arrived on the scientific scene, provided a powerful and generative analogy.

But information processing theory is not simply a metaphor. It is a rigorous, empirically tested framework that has produced some of the most important findings in cognitive psychology — including the identification of the limits of working memory, the architecture of long-term memory, the role of attention as a selective filter, and the specific ways in which encoding, storage, and retrieval determine what we remember and what we forget. It has generated clinical applications in cognitive behavioral therapy, educational applications in instructional design, and technological applications that include the foundations of modern computing and AI.

This article offers a comprehensive, psychologically grounded account of information processing theory: its intellectual origins, its core components, its major theoretical models, its psychological and educational applications, and its enduring relevance in contemporary cognitive science.

What Is Information Processing Theory? A Clear Definition

Information processing theory is a cognitive framework that conceptualizes the human mind as a system that receives, encodes, stores, transforms, and retrieves information through a series of distinct mental processes. It treats cognition as analogous to — though not identical with — the operation of a computer: raw input from the environment is processed through multiple stages before it influences behavior or is retained in memory.

The theory has several defining characteristics that distinguish it from both the behaviorism it replaced and the broader connectionist models that later challenged some of its assumptions:

Mental processes are real and scientifically investigable: Unlike behaviorism, which treated mental processes as inaccessible and irrelevant “black boxes,” information processing theory insists that internal cognitive operations are legitimate objects of scientific study — inferable from carefully designed behavioral experiments even if not directly observable.
Processing is sequential and staged: Information passes through a series of processing stages — sensory register, short-term memory, long-term memory — and each stage has characteristic properties in terms of capacity, duration, and the nature of processing that occurs there.
Capacity is limited: A central and empirically well-supported claim of information processing theory is that human cognitive capacity — particularly at the level of conscious, working-memory-based processing — is severely limited. These limitations are not random; they follow predictable patterns that have been precisely characterized by research.
Processing involves active transformation: Information is not passively recorded as it passes through the system. It is actively transformed — organized, interpreted, elaborated, compressed, and reconstructed — at each stage of processing. What enters memory is not a faithful copy of the original stimulus but a processed and transformed representation.

The computer analogy that initially inspired the framework has proven both illuminating and limiting. It is illuminating because it provided precise, testable models of how information might flow through a cognitive system. It is limiting because human cognition is demonstrably not a serial, deterministic processor — it is massively parallel, deeply embodied, profoundly social, and saturated with emotion in ways that early computer models did not capture. Contemporary information processing frameworks have become significantly more sophisticated in accounting for these dimensions.

The Historical Roots: From Behaviorism to the Cognitive Revolution

To understand why information processing theory was so transformative, it helps to understand what it replaced. For roughly half of the 20th century, American psychology was dominated by behaviorism — the theoretical position, most powerfully articulated by John B. Watson and B.F. Skinner, that psychology should restrict itself to the study of observable behavior and its relationship to observable environmental stimuli. Mental processes — thinking, imagining, remembering, deciding — were either dismissed as unscientific fictions or treated as mere epiphenomena, causally irrelevant to the behaviors they appeared to accompany.

The behaviorist program was not without genuine achievements, particularly in understanding the principles of learning through conditioning. But by the 1950s, its limitations were becoming impossible to ignore. Noam Chomsky’s devastating 1959 critique of Skinner’s account of language acquisition — arguing that the complexity and creativity of human language could not possibly be explained by reinforcement learning — crystallized the dissatisfaction that had been building among researchers who found the behaviorist framework simply inadequate for the phenomena they were trying to understand.

The intellectual ingredients for the cognitive revolution were converging simultaneously. Claude Shannon and Warren Weaver’s information theory (1949) provided a mathematical framework for quantifying information transmission. Norbert Wiener’s cybernetics introduced the concept of feedback-controlled systems. The development of digital computers — and the discovery that they could, in principle, perform operations that resembled reasoning — provided a concrete existence proof that complex information processing was physically realizable. And researchers including George Miller, Jerome Bruner, and Ulric Neisser began conducting experiments that could only be explained by positing internal mental processes of specific kinds.

George Miller’s landmark 1956 paper, “The Magical Number Seven, Plus or Minus Two,” is often cited as a foundational document of cognitive psychology and information processing theory. By demonstrating that the capacity of short-term memory was limited to approximately seven chunks of information (plus or minus two), Miller provided one of the first rigorous, quantitative characterizations of a specific cognitive architecture — and showed that internal mental processes could be studied with scientific precision.

The Three-Stage Model: Sensory Memory, Working Memory, and Long-Term Memory

The most widely taught and most influential structural model within information processing theory is the three-stage model — often called the Atkinson-Shiffrin model after the researchers Richard Atkinson and Richard Shiffrin who formalized it in 1968. This model provides a sequential account of how information moves through the cognitive system from initial sensory registration to long-term retention.

Sensory memory is the first stage — the brief, high-capacity but extremely short-duration register that holds a literal trace of sensory input just long enough for it to be processed and selectively attended to. George Sperling’s elegant 1960 experiments on iconic memory (visual sensory memory) demonstrated that the sensory register can hold a surprisingly large amount of information — but for only a fraction of a second (approximately 200–500 milliseconds for visual information) before it decays. Echoic memory — the auditory equivalent — lasts somewhat longer, up to approximately two to four seconds. The vast majority of what reaches the sensory register is never processed further; only the information that receives attentional selection passes to the next stage.

Working memory (originally called short-term memory) is the central stage of conscious cognitive processing — the system that temporarily holds and actively manipulates the information currently being used in thought. It is here that the famous capacity limits of human cognition most sharply apply. Miller’s “magical number seven” identified the approximate capacity of this system, though subsequent research — particularly Alan Baddeley and Graham Hitch’s 1974 model of working memory — revealed its internal architecture to be more complex than a single limited-capacity store.

Baddeley’s working memory model — which replaced the simpler short-term memory construct in most contemporary accounts — proposed three interacting components: the phonological loop (which maintains verbal and auditory information through sub-vocal rehearsal), the visuospatial sketchpad (which maintains visual and spatial information), and the central executive (an attentional control system that coordinates the other components, manages task-switching, and governs the interface between working memory and long-term memory). A fourth component — the episodic buffer — was added in 2000 to account for the integration of information from multiple sources into coherent, multi-dimensional representations.

Long-term memory is the vast, essentially unlimited repository of everything that has been learned and retained — skills, facts, autobiographical experiences, language, procedural knowledge, and semantic understanding. Unlike working memory, long-term memory has no clear capacity limit and retains information over indefinitely long periods. Its organization is not random but highly structured — knowledge is organized thematically, semantically, and in terms of associative relationships that allow efficient retrieval.

The critical process governing what moves from working memory to long-term memory is encoding — the transformation of incoming information into a form suitable for long-term storage. The depth-of-processing framework proposed by Fergus Craik and Robert Lockhart in 1972 argued that retention is determined primarily by the depth and elaborateness of the cognitive processing applied during encoding, rather than by the mere repetition of material. Deeply processed information — information that is analyzed semantically, connected to existing knowledge, and actively elaborated — is retained far more durably than information processed at a shallow, perceptual level. This finding has had enormous and well-supported implications for educational practice.

Attention as the Gatekeeper: How Information Is Selected for Processing

Given the enormous volume of sensory information available at any moment and the severe capacity limitations of working memory, some mechanism must govern which information receives the deeper processing that leads to retention and use. That mechanism is attention — and its investigation represents one of the richest areas of research within information processing theory.

The foundational experimental paradigm for studying selective attention was the dichotic listening task, pioneered by Colin Cherry in 1953 and extended by Donald Broadbent. In these experiments, participants wore headphones and heard different messages simultaneously in each ear — and were asked to repeat (shadow) the message in one ear while ignoring the other. Cherry’s observations — including the famous “cocktail party effect,” the finding that people could sometimes detect their own name in the ignored channel — pointed toward a selective filtering system that operated early in processing.

Broadbent’s filter model (1958) proposed that attention operates as an early filter: only information that passes through the filter based on its physical characteristics (such as the location of the sound source) receives semantic analysis. This explained why people generally cannot report the semantic content of the ignored message — it was filtered out before semantic processing occurred.

Subsequent research, particularly by Anne Treisman, revealed that the picture was more complex. Treisman’s attenuation model proposed that the filter does not completely block the unattended channel but attenuates it — reducing its strength so that it can still activate semantic representations if they are sufficiently significant (hence the name detection in the cocktail party). Later models, including Deutsch and Deutsch’s late-selection model, proposed that all incoming information receives semantic processing, with selection occurring at the response stage rather than the perceptual stage.

Contemporary attention research — influenced by neuroscience as well as cognitive psychology — has moved toward accounts that recognize multiple attentional systems (sustained attention, selective attention, divided attention, executive attention) with different neural substrates and different functional properties. Michael Posner’s work on the neural basis of attention has been particularly influential in connecting information processing theory’s functional accounts to neurobiological mechanisms.

Encoding, Storage, and Retrieval: The Three Pillars of Memory in Information Processing

Within the information processing framework, memory is not a single process but a system of three distinct operations — encoding, storage, and retrieval — each of which can succeed or fail independently, and each of which follows identifiable principles that have practical implications for learning and remembering.

Encoding is the process of transforming incoming information into a mental representation suitable for storage. Encoding quality depends critically on the type and depth of processing applied. The levels-of-processing framework demonstrated that semantic encoding — organizing and connecting information in terms of its meaning — produces dramatically more durable traces than phonological encoding (how a word sounds) or structural encoding (how it looks). The practical implication is direct: the best way to remember something is not to passively expose yourself to it repeatedly but to actively process its meaning — asking questions, making connections, elaborating, and applying it.

Storage refers to the maintenance of encoded information over time. Long-term memory storage is not uniform. Endel Tulving’s influential distinction between episodic memory (autobiographical memory for personal experiences tied to specific times and places), semantic memory (general knowledge about the world, independent of personal context), and procedural memory (knowledge of how to perform skills and procedures) identifies qualitatively different memory systems with different properties, different neural substrates, and different vulnerability to forgetting and impairment.

Retrieval is the process of accessing and bringing stored information back into working memory for use. A memory trace can exist in long-term storage but fail to be retrieved — which is why the subjective experience of forgetting is often more accurately described as a retrieval failure than a storage failure. The encoding specificity principle, developed by Tulving and Donald Thomson, proposes that retrieval is most successful when the conditions at retrieval match the conditions at encoding — the same cues, context, emotional state, or cognitive framework that were present when the information was encoded. This principle explains context-dependent memory (information learned in one environment is better recalled in that environment) and state-dependent memory (information encoded in a particular emotional or physiological state is better recalled in the same state).

Cognitive Load Theory: Information Processing Applied to Learning and Instruction

One of the most practically impactful applications of information processing theory to education is cognitive load theory, developed by John Sweller beginning in the 1980s. Cognitive load theory takes seriously the central information processing finding that working memory capacity is severely limited — and derives from this finding a comprehensive set of principles for the design of effective instruction.

Sweller distinguishes three types of cognitive load that compete for the limited resources of working memory:

Intrinsic cognitive load: The inherent complexity of the material being learned — determined by the number of interacting elements that must be processed simultaneously. High intrinsic load (material with many mutually interacting elements) cannot be reduced by instructional design without changing what is being taught, but it can be managed by sequencing material so that simpler elements are mastered before more complex ones.
Extraneous cognitive load: The cognitive demand imposed by poorly designed instructional materials — demand that consumes working memory resources without contributing to learning. This is the load that instructional design can and should minimize: eliminating redundancy, reducing split attention (requiring learners to simultaneously process related information from separate sources), and removing irrelevant decoration that attracts attention without aiding comprehension.
Germane cognitive load: The cognitive demand associated with the active construction of schemas — the productive cognitive work of understanding, organizing, and integrating new information with existing knowledge. Germane load contributes directly to learning and should be promoted by instructional design, up to the limits of available capacity.

The practical implications of cognitive load theory are extensive and well-supported by decades of experimental research. They include the worked-example effect (novices learn more from studying worked examples than from solving problems independently), the expertise reversal effect (instructional supports that help novices become counterproductive for experts), the modality effect (information presented in mixed visual and auditory format is processed more efficiently than information presented in a single modality), and the split-attention effect (presenting related information in an integrated rather than separated format reduces extraneous load and improves learning).

Information Processing Theory and Mental Health: Clinical Relevance

Information processing theory has not remained confined to cognitive psychology and educational research. It has generated significant clinical applications — particularly in the understanding and treatment of anxiety, depression, trauma, and other psychological difficulties through cognitive-behavioral frameworks.

Cognitive behavioral therapy (CBT), the most widely researched and most widely practiced psychological treatment for anxiety and depression, is grounded in an information processing account of psychological difficulty. Aaron Beck’s cognitive model proposes that emotional disorders are maintained by systematic distortions in information processing: cognitive biases that cause people to selectively attend to, encode, and remember information in ways that confirm negative beliefs about themselves, the world, and the future.

These cognitive biases — including selective attention (attending preferentially to threat-relevant or negative information), memory bias (better retention of schema-consistent, emotionally negative material), interpretation bias (consistently interpreting ambiguous information in the most negative light), and attention narrowing (reduced breadth of attentional focus under threat) — are precisely the kind of selective, distorting processing effects that information processing theory predicts when cognitive resources are allocated in particular ways.

The treatment implication is direct: if psychological difficulties are partly maintained by biased information processing, then deliberately targeting and modifying those processing biases — through behavioral experiments that disconfirm biased interpretations, attention training that reduces selective attention to threat, and cognitive restructuring that challenges the schemas driving biased encoding — can produce symptom reduction. This is what CBT does, and the information processing framework provides its theoretical foundation.

Trauma processing provides another important clinical application. Research on post-traumatic stress disorder (PTSD) through an information processing lens — particularly the work of Edna Foa and colleagues on emotional processing theory — explains PTSD symptoms as the consequence of inadequately processed traumatic memories: memories that have not been sufficiently integrated into the person’s broader narrative and semantic memory system, and that therefore persist in a raw, highly activated form that is easily triggered by contextual cues. Trauma therapies including EMDR (Eye Movement Desensitization and Reprocessing) and Prolonged Exposure work by facilitating the deeper processing and integration of these incompletely processed memories.

FAQs About Information Processing Theory in Psychology

What is information processing theory in simple terms?

Information processing theory is a psychological framework that compares the human mind to a computer — it proposes that cognition involves receiving information from the environment, processing it through a series of mental stages (including sensory memory, working memory, and long-term memory), and using it to produce behavior or store knowledge. The theory emphasizes that mental processing is real and scientifically measurable, that human cognitive capacity has defined limits (particularly in working memory), and that what we remember and understand is determined not just by what we experience but by how deeply and elaborately we process it. It transformed psychology from a behaviorist focus on observable inputs and outputs to a science of internal mental operations.

What are the main stages of information processing?

The main stages in the classical information processing model are: (1) Sensory memory — a brief, high-capacity register that holds literal sensory traces for fractions of a second before most of it decays; (2) Working memory (short-term memory) — the limited-capacity system that holds and actively manipulates information currently in conscious use, with a capacity of approximately four to seven chunks and a duration of seconds without rehearsal; and (3) Long-term memory — the vast, essentially unlimited repository of all retained knowledge, organized in multiple subsystems including episodic, semantic, and procedural memory. Information moves between stages through attention (from sensory to working memory) and encoding (from working to long-term memory), while retrieval brings information back from long-term to working memory for use.

What is the difference between working memory and short-term memory?

Short-term memory was the original term used in information processing theory for the temporary, limited-capacity memory system that holds information in current awareness. Working memory, proposed by Alan Baddeley and Graham Hitch in 1974, replaced this simpler construct with a more architecturally detailed model. The key difference is that working memory is not just a passive holding store but an active, multicomponent system that processes and manipulates information rather than simply retaining it. Baddeley’s model identifies the phonological loop, visuospatial sketchpad, central executive, and episodic buffer as distinct components with different functions. Contemporary cognitive psychology generally uses the working memory framework, as it better explains the range of cognitive operations — mental arithmetic, reasoning, language comprehension — that depend on this system.

How does information processing theory explain forgetting?

Information processing theory explains forgetting through several distinct mechanisms operating at different stages of processing. Decay — the gradual fading of a memory trace over time without rehearsal — is the primary mechanism in sensory and working memory. Interference — the disruption of one memory by other similar memories — operates in both working and long-term memory: proactive interference occurs when old memories interfere with new ones, while retroactive interference occurs when new memories interfere with old ones. Encoding failure — material that was never adequately processed in the first place — accounts for much of what we think we have “forgotten” but actually never retained. Retrieval failure — the inability to access information that does exist in storage — is arguably the most common form of everyday forgetting, and is remedied by providing appropriate retrieval cues.

How is information processing theory used in education?

Information processing theory has had extensive and well-supported applications in educational psychology and instructional design. Cognitive load theory — derived directly from information processing principles — provides guidelines for designing instruction that respects the limits of working memory: minimizing extraneous load, managing intrinsic complexity through careful sequencing, and promoting the germane load associated with schema construction. The levels-of-processing framework informs teaching strategies that promote deep semantic processing — questioning, elaboration, application, and connection-making — rather than rote repetition. The encoding specificity principle informs the value of varied retrieval practice and context-rich learning. Spaced repetition — distributing practice over time rather than massing it — is directly supported by information processing research on memory consolidation. Collectively, these applications have produced one of the most practically useful bodies of knowledge in educational psychology.

What are the limitations of information processing theory?

Despite its remarkable influence and genuine explanatory power, information processing theory has several important limitations. The computer metaphor, while generative, has led to models that underemphasize the role of emotion, motivation, social context, and embodiment in shaping cognition — humans think with their bodies and in social contexts in ways that purely computational models struggle to capture. The serial, staged architecture of classical information processing models has been challenged by evidence for massively parallel processing and by connectionist models that implement cognitive functions through distributed networks rather than discrete stages. The framework also tends to treat the individual mind as the unit of analysis, underweighting the distributed and culturally embedded nature of human cognition. Contemporary cognitive science has addressed many of these limitations through embodied cognition, situated cognition, and cultural psychology frameworks that extend rather than replace the information processing foundations.

Bibliography

Atkinson, R. C., & Shiffrin, R. M. (1968). Human memory: A proposed system and its control processes. In K. W. Spence & J. T. Spence (Eds.), The Psychology of Learning and Motivation (Vol. 2, pp. 89–195). Academic Press.
Baddeley, A. D., & Hitch, G. (1974). Working memory. In G. H. Bower (Ed.), The Psychology of Learning and Motivation (Vol. 8, pp. 47–89). Academic Press.
Baddeley, A. D. (2000). The episodic buffer: A new component of working memory? Trends in Cognitive Sciences, 4(11), 417–423. https://doi.org/10.1016/S1364-6613(00)01538-2
Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2), 81–97. https://doi.org/10.1037/h0043158
Craik, F. I. M., & Lockhart, R. S. (1972). Levels of processing: A framework for memory research. Journal of Verbal Learning and Verbal Behavior, 11(6), 671–684. https://doi.org/10.1016/S0022-5371(72)80001-X
Tulving, E. (1972). Episodic and semantic memory. In E. Tulving & W. Donaldson (Eds.), Organization of Memory (pp. 381–403). Academic Press.
Broadbent, D. E. (1958). Perception and Communication. Pergamon Press.
Sweller, J. (1988). Cognitive load during problem solving: Effects on learning. Cognitive Science, 12(2), 257–285. https://doi.org/10.1207/s15516709cog1202_4
Beck, A. T. (1979). Cognitive Therapy and the Emotional Disorders. Penguin Books.
Neisser, U. (1967). Cognitive Psychology. Appleton-Century-Crofts.

Use this citation format to reference the article clearly and help readers find the original source.

Recommended citation Updated 2026

PsychologyFor. (2026). The Theory of Information Processing and Psychology. PsychologyFor. https://psychologyfor.com/the-theory-of-information-processing-and-psychology/

Quick format for articles, references, and academic mentions.